Contact Us Search Paper

Feature Extraction aligned Email Classification based on Imperative Sentence Selection through Deep Learning

Nashit Ali1, Anum Fatima1, Hureeza Shahzadi2, Aman Ullah3, Kemal Polat4, *

Corresponding Author:

Kemal Polat

Affiliation(s):

1. Department of Computer Science, COMSATS University Islamabad, Vehari Campus, Vehari 61100, Pakistan

2. Department of Computational Science & Engineering, National University of Sciences and Technology, Islamabad.

3. School of Computer Science and Engineering, Central South University, Changsha, 410083, China

4. Department of Electrical and Electronics Engineering, Bolu Abant Izzet Baysal University, Bolu 14280, Turkey

*Corresponding Author: Email: [email protected] (Kemal Polat)


Abstract:

Most commonly used channel for communication among peoples is emails. In this era where everyone is so busy in their routine and work, it is very difficult to check all email when one receives huge amount of emails. Previous research has done work on email categorization in which they have mostly done spam filtration. The problem with spam filtration is that sometimes person mistakenly mark an important email received from high authority as spam and according to previous research, this email will be filtered as spam that can cause a great threat for job of an employee. In this research, we are introducing a methodology which classifies email text into three categories i.e. order, request and general on basis of imperative sentences. This research use Word2Wec for words conversion into vector and use two approaches of deep learning i.e. Convolutional neural network and Recurrent neural network for email classification. We conduct experiment on Dataset collected from Personal Gmail account and Enron which consists of 1000 emails. The experiment result show that RNN gives better accuracy than CNN. We also compare our methods with previously used method Fuzzy ANN results and Our proposed methods CNN and RNN gives better results than Fuzzy ANN. This research has also included different experimental result in which CNN and RNN applied on different ratios of training and testing dataset. These experiment show that increasing in the ratio of training dataset results in increasing accuracy of algorithm.

Keywords:

Convolutional neural network, Email Categorization, Imperative Sentences, Recurrent neural network, Spam filtration

Downloads: 221 Views: 1631
Cite This Paper:

Nashit Ali, Anum Fatima, Hureeza Shahzadi, Aman Ullah, Kemal Polat (2021). Feature Extraction aligned Email Classification based on Imperative Sentence Selection through Deep Learning. Journal of Artificial Intelligence and Systems, 3, 93–114. https://doi.org/10.33969/AIS.2021.31007.

References:

[1] I. THE RADICATI GROUP, "Email Statistics Report, 2016-2020," A    TECHNOLOGY MARKET RESEARCH FIRM, USA, 2016. 

[2] S. W. a. C. L. Sidner, "Email Overload: Exploring Personal Information formation," CHI, pp. 276-283, 1996.

[3] L. S. R. G. R. M. A. M. A. A.-G. GHULAM MUJTABA, "Email Classification Research Trends: Review and Open Issues," Digital Object Identifier, 2017. 

[4] A. N. K. M. K. G. P. G. Akash Kumar Singh, "Email Classification using NLP & Machine Learning Techniques," IJSRD - International Journal for Scientific Research & Development, vol. 6, no. 3, 2018.

[5] A. A. T. I. J. S. B. &. J. A. Hanif Bhuiyan, "A Survey of Existing E-Mail Spam Filtering Methods Considering Machine Learning Techniques," Global Journal of Computer Science and Technology, vol. 18, no. 2, pp. 0975-4172, 2018.

[6] D. R. S. R Manikandan, "Machine learning algorithms for text-documents classification: A review," International Journal of Academic Research and Development, vol. 3, no. 2, pp. 384-389, 2018.

[7] S. L. &. I. Lee, "Email Sentiment Analysis Through k-Means Labeling and Support Vector Machine Classification," Cybernetics and Systems, vol. 49, no. 3, pp. 181-199, 2018.

[8] P. K. B. ESHA BANSAL, "A SURVEY OF VARIOUS MACHINE LEARNING ALGORITHMS ON EMAIL SPAMMING," International Journal of Advances in Electronics and Computer Science, vol. 4, no. 3, pp. 2393-2835, 2017. 

[9] P. V. a. S. R. D. Karthika Renuka, "An Ensembled Classifier for Email Spam Classification in Hadoop Environment," Applied Mathematics & Information Sciences, vol. 11, no. 4, pp. 1123-1128, 2017.

[10] P. T. V. Shradhanjali, "E-Mail Spam Detection and Classification Using SVM and Feature Extraction," International Journal of Advance Research, Ideas and Innovations in Technology, vol. 3, no. 3, pp. 1491-1495, 2017.

[11] O. L. W. O. A. M. A. Abubakr H. Ombabi, "Deep Learning Framework based on Word2Vec and CNN for Users Interests Classification," in Sudan Conference on Computer Science and Information Technology (SCCSIT), 2017.

[12] D. H. A. Priti Kulkarni, "Comparative analysis of classifiers for header based emails classification using supervised learning," International Research Journal of Engineering and Technology (IRJET), vol. 3, no. 3, pp. 1939-1944, 2016.

[13] A. G. d. P. V. S. L. F. G. C. Rogerio Bonatti, "Effect of Part-of-Speech and Lemmatization Filtering in Email Classification for Automatic Reply," in The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence Knowledge Extraction from Text:, 2016.

[14] M. A. a. M. Foqaha, "EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PARTICLE SWARM OPTIMIZATION," International Journal of Network Security & Its Applications (IJNSA), vol. 8, no. 4, pp. 17-28, 2016.

[15] E. G. C. a. R. Saini, "Implementation of Improved KNN algorithm for Email Spam Detection," International Journal of Trend in Research and Development, vol. 3, no. 5, pp. 479-483, 2016.

[16] P. G. H. Parhat Parveen, "Spam Mail Detection using Classification," International Journal of Advanced Research in Computer and Communication Engineering, vol. 6, no. 6, pp. 347-349, 2016.

[17] N. K. S. Ali Shafigh Aski a, "Proposed efficient algorithm to filter spam using machine learning techniques," Pacific Science Review A: Natural Science and Engineering, pp. 145-149, 2016.

[18] V. R. Kumar Ravi, "A survey on opinion mining and sentiment analysis: Tasks, approaches and applications," Knowledge-Based Systems, pp. 14-46, 2015.

[19] J. J. Eberhardt, "Bayesian Spam Detection," Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal, vol. 2, no. 1, 2015.

[20] I. A. Izzat Alsmadi, "Clustering and classification of email contents," Journal of King Saud University –Computer and Information Sciences, pp. 46-57, 2015. 

[21] A. K. S. A. F. M. K. Muhammad Zubair Asghar, "A Review of Feature Extraction in Sentiment Analysis," Journal of Basic and Applied Scientific Research, 2014. 

[22] K. K. Gopi Sanghani, "Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update," Expert Systems With Applications, vol. 115, p. 287–299, 2019.

[23] S. Sanjeev Dhawan, "An enhanced mechanism of spam and category detection using Neuro-SVM," Procedia Computer Science , vol. 132 , p. 429–436, 2018.

[24] N. I. G. ,. A. A. S. Amany A. Naem, "Antlion optimization and boosting classifier for spam email detection," Future Computing and Informatics Journal , vol. 3, pp. 436-442, 2018.

[25] J. S. B. ,. H. C. ,. M. A. ,. A. O. A. ,. O. E. A. Emmanuel Gbenga Dada, "Machine learning for email spam filtering: review, approaches and open research problems," Heliyon, vol. 5, 2019.

[26] O. Halvani, " “Enron Authorship Verification Corpus”, Mendeley Data, v2," (2018. [Online]. Available: https://data.mendeley.com/datasets/n77w7mygwg/2/files/4c220a60-b725-4c58-ac12-9c81b0300bce.

[27] M. T. O. Reshmi Gopalakrishna Pillai, "Detection of Stress and Relaxation Magnitudes for Tweets," The Sixth International Workshop on Natural Language Processing for Social Media, pp. 23-27, 2018.

[28] D. POWERS, "EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37-63, 2011.

[29] A. &. D. M. Fraser, Computational Linguistics, 2007.

[30] Reeker, "Performance Metrics for Intelligent Systems," 2007.

[31] Carletta, "Computational Linguistics," 1996, pp. 249-254.

[32] Hutchinson, Research in Nursing & Health, (1993).