Phishing Email and URL Detection using Machine learning and Deep learning

M, Somesha

Please use this identifier to cite or link to this item: https://idr.l3.nitk.ac.in/jspui/handle/123456789/17720

Title:	Phishing Email and URL Detection using Machine learning and Deep learning
Authors:	M, Somesha
Supervisors:	Pais, Alwyn Roshan
Issue Date:	2023
Publisher:	National Institute Of Technology Karnataka Surathkal
Abstract:	The research thesis attempts to address the issue of email phishing, which poses a se- rious risk to businesses and corporations. Through the use of social engineering strate- gies, email phishing assaults persuade users to divulge personal data that can be ex- ploited to access their digital assets. Despite the presence several defenses, the Anti- Phishing Working Group survey reveals that the present approaches to phishing attack detection are still insufficient and ineffective. This underlines the requirement for a more effective system to identify phishing emails and offer greater protection against such assaults to the end user. There exist many machine learning based techniques to detect phishing emails. Also, they use a large number of heuristics to classify the email. To overcome the dis- advantages of existing schemes, we have presented an efficient word embedding cum machine learning framework to classify the emails. The presented technique uses only four email header based heuristics (i.e. From, Return-path, Subject, and Message-ID). The model achieved a significant accuracy of 99.50% using FastText-CBOW algorithm in combination with the Random Forest classifier. Although machine learning based techniques achieved significant accuracy, it is ad- visable to use deep learning models whenever we have sufficient data. We have pre- sented an efficient deep learning model called ”DeepEPhishNet” for the classification of emails. The presented model based on FastText-SkipGram with Deep Neural Network (DNN) achieved a significant accuracy of 99.52%, TPR of 99.38%, TNR of 99.92%, F-Score of 99.68%, Precision of 99.97%, and MCC of 98.71%. The above methods make use of only four email header based heuristics for the classification. To study the contribution of the email body in the detection of phishing emails, we have presented an efficient model using transformers. The presented model achieved an accuracy of 99.51% using open source datasets. The body of the email might contain phishing URLs, which may lead to a phishing attack. In order to overcome this, we have presented an efficient deep learning basedmodel for phishing URL detection. The accuracy achieved for the DNN, LSTM, and CNN are 99.52%, 99.57%, and 99.43% respectively. Overall, this research thesis presents efficient techniques for detecting phishing emails and URLs using word embedding, deep learning, and machine learning clas- sifiers.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/17720
Appears in Collections:	1. Ph.D Theses

Files in This Item:

File	Description	Size	Format
187105-CO004-Somesha M.pdf		8.95 MB	Adobe PDF	View/Open

Show full item record