In the last decades, the web and online services have revolutionized the modern world. However, by increasing our dependence on online services, as a result, online security threats are also increasing rapidly. One of the most common online security threats is a so-called Phishing attack, the purpose of which is to mimic a legitimate website such as online banking, e-commerce or social networking website in order to obtain sensitive data such as user-names, passwords, financial and health-related information from potential victims. The problem of detecting phishing websites has been addressed many times using various methodologies from conventional classifiers to more complex hybrid methods.
- having_IP_Address
- URL_Length
- Shortining_Service
- having_At_Symbol
- double_slash_redirecting
- Prefix_Suffix
- having_Sub_Domain
- SSLfinal_State
- Domain_registeration_length
- Favicon
- port
- HTTPS_token
- Request_URL
- URL_of_Anchor
- Links_in_tags
- SFH
- Submitting_to_email
- Abnormal_URL
- Redirect
- on_mouseover
- RightClick
- popUpWidnow
- Iframe
- age_of_domain
- DNSRecord
- web_traffic
- Page_Rank
- Google_Index
- Links_pointing_to_page
- Statistical_report
- Result
UCI: https://archive.ics.uci.edu/ml/datasets/phishing+websites.
But I would reccomend using csv file available in this repository for ease of use. Download a copy of the repository and you will be good to go.
a. Studying the data in tabular form
b. Creating Basic plots and infering relations between data features visually
a. Logistic Regression
b. K-Nearest Neighbours
c. Decision Trees and Random Forest classifier
d. Support Vector Machine