An Approach to the Identification and Prediction of Phishing Websites Using Classification Algorithms Based on Web Pages Characteristics

Document Type : Research Paper

Authors

Abstract

Today, the most important risk and
challenge addressed in e-commerce and e-banking is the risk of online fraud and phishing attacks.
Phishing attacks have
been one of widely used tools for attackers to steal
passwords and electronic
codes of users in e-exchanges
in cyberspace. In this type of fraud, fraudulent or
phisher sends emails with various claims
to the victim and directs the victim to his fake
pages by several different techniques. Then, he attempts
to steal sensitive information of users such as passwords. Web pages, emails and
phishing addresses have features that can be used
to detect these attacks. In this paper, an approach will be presented to identify and predict
phishing web sites using classification algorithms
based on web page features which has less error rate than other techniques
to deal with phishing attacks,
especially the similar techniques based on data mining algorithms. In the proposed approach,
the usable features in the detection of
phishing pages are weighted based
on the effect to detect these
attacks and then a pattern will be elicited to identify these attacks by applying classification algorithms on the relevant datasets. our approach can detect journal phishing attacks and has low error rate than previous approaches.

Keywords


 
[1]. Wikipedia (July 2014), “Phishing”, Online Document Available at:
[2]. San Martino A and Perramon X (2010), “Phishing Secrets: History, Effects, and Countermeasures”, International Journal of Network Security, 11(3), 163-171.
 [3]. McRae C.M and Vaughn R.B (2007), “Phighting the Phisher: Using Web Bugs and Honeytokens to Investigate the Source of Phishing Attacks”, Proceedings of the 40th Annual Hawaii International Conference on System Sciences (IEEE), 1-7, Waikoloa.
[4]. Agarwal N, Renfro S and Bejar A (2009), “Yahoo Sign-In Seal and Current Anti-Phishing Solutions”, eCrime Researchers Summit, 1-4.
 [5]. Aburrous M, Hossain M. A, Dahal, K and Thabatah, F (2010), "Intelligent Phishing Detection System for E-Banking Using Fuzzy Data Mining”, Expert Systems with Applications, no. 37, 7913–7921.
 [6]. Shreeram V, Suban M, Shanthi P and Manjula K (2010), “Anti-phishing detection of phishing attacks using genetic algorithm”, IEEE International Conference on Communication Control and Computing Technologies (ICCCCT), 447 – 450, Ramanathapuram, 7-9 Oct.
[7]. Chen J and Guo C (2006), “Online Detection and Prevention of Phishing Attacks”, First International Conference on Communications and Networking (IEEE), 1 – 7, China, 25-27 Oct.
[8]. Atighetchi M and Pal P (2009), “Attribute-based Prevention of Phishing Attacks”, Eighth International Symposium on Network Computing and Applications (IEEE), 266 – 269, Cambridge, MA, 9-11 July.
[9]. Dunlop M, Groat S and Shelly D (2010), “Gold Phish: Using Images for Content-Based Phishing Analysis”, the Fifth International Conference on Internet Monitoring and Protection (IEEE), 123 – 128, Barcelona, 9-15 May.
[10]. Mishra M, Gaurav and Jain A (2012), “A Preventive Anti-Phishing Technique using Code word”, International Journal of Computer Science and Information Technologies, 3(3), 2012, 4248 – 4250.
[11]. Sanglerdsinlapachai N and Rungsawang A (2010), “Using Domain Top-page Similarity Feature in Machine Learning-Based Web Phishing Detection”, Third International Conference on Knowledge Discovery and Data Mining (IEEE), 187 – 190, Phuket, 9-10 Jan.
 [12]. Liu G, Qiu B and Wenyin L (2010), “Automatic Detection of Phishing Target from Phishing Webpage”, International Conference on Pattern Recognition (IEEE), 4153-4156, Istanbul, 23-26 Aug.
 [13]. Reddy V.P, Radha V and Jindal M (2011), “Client Side protection from Phishing attack”, International Journal of Advanced Engineering Sciences and Technologies, 3(1), 39-45.
 [14]. Khonji M, Jones A and Iraqi Y (2011), “A Novel Phishing Classification Based On URL Features’, GCC Conference and Exhibition (IEEE), 221 – 224, Dubai, 19-22 Feb.
 [15]. Ruth Ramya K, Priyanka K, Anusha K, Jyosthna Devi CH and Siva Prasad Y.A (2011), “An Effective Strategy for Identifying Phishing Websites using Class-Based Approach”, International Journal of Scientific & Engineering Research, 2(12), 1-7.
[16]. Damodaram A, Phil M.C and Valarmathi M.L (2012), “Phishing website detection and optimization using Modified bat algorithm”, International Journal of Engineering Research and Applications, 2(1), 870-876.
[17]. Aburrous M, Hossain M.A, Dahal K and Thabtah F (2010), “Associative Classification Techniques for predicting e-Banking Phishing Websites”, International Conference on Multimedia Computing and Information Technology (IEEE), 9 – 12, Sharjah, 2-4 March.
[18]. Damodaram R and Valarmathi M.L (2011), “Fake Website Detection: Association Classification Algorithm with Ant Colony Optimization Technique”, International Journal of Advanced Research in Computer Science, 2(1), 568-577.
[19]. Abdelhamid N, Ayesh A and Thabtah F (2014), “Phishing detection based Associative Classification data mining”, Expert Systems with Applications, No. 41, 5948–5959.
[20]. COMODO (Aug 2014), “Comod Web Inspector”, Online Tools Available at:
[21]. Phishtank (Aug 2014), “Comod Web Inspector”, Online Document Available at:
[22]. Millersmiles (Aug 2014), “Comod Web Inspector”, Online Document Available at:
[23]. Patil N, Lathi R and Chitre V (2012), “Customer Card Classification Based on C5.0 & CART Algorithms”, International Journal of Engineering Research, 2(4), 164-167.
[24]. StatSoft Inc (Aug 2014), “Popular Decision Tree: Classification and Regression Trees”, Online Document Available at:
[25]. Suknovic M, Delibasic B, Jovanovic M, Vukicevic M, Becejski-Vujaklija D and Obradovic Z (2011), “Reusable components in decision tree induction algorithms”, Computational Statistics, 27(1), 127-148.
[26]. IBM (April 2015), “Clementine 11.1- Data Mining”, Online Document Available at: http://www-304.ibm.com/partnerworld/gsd/solutiondetails.do?solution=10387&expand=true
[27]. Lim T.S, Loh W.Y and Shih Y.S (2000), “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms”, Machine Learning, No. 40, 203-229.
[28]. Kali Linux (April 2015), “Tools Included in the Set Package”, Online Document Available at: http://tools.kali.org/information-gathering/set