Exploring and Comparing Various Machine Deep Learning Technique algorithms to Detect Domain Generation Algorithms of Malicious Variants

Preetham Aravamudu

Abstract


Domain Generation Algorithm (DGA) is used as the main source of script in different groups of malwares, which generates the domain names of points and will further be used for command and control servers. The security measures usually identifies the malware but the domain name algorithms will be updating themselves in order to avoid the less efficient older security detection methods. The reason being the older detection methods does not use either the machine learning or deep learning algorithms to detect the DGAs. Thus, the impact of incorporating the machine learning and deep learning techniques to detect the DGA is well discussed. As a result, they can create a huge number of domains to avoid debar and henceforth, block the hackers and zombie systems with the older methods itself. The main purpose of this research work is to compare and analyse by implementing various machine learning algorithms that suits the respective dataset yielding better results. In this research paper, the obtained dataset is pre-processed and the respective data is processed by different machine learning algorithms such as Random forest, Support Vector Machine (SVM), Naive Bayes classifier, H20 AutoML, Convolutional Neural Network (CNN), Long Short Memory Neural Network (LSTM) for the classification. It is observed and understood that the LSTM provides a better classification efficiency of 98% and the H20 AutoML method giving the least efficiency of 75%.

References


Debra Anderson, Thane Frivold, and Alfonso Valdes. Next-generation intrusion detection expert system (nides): A summary. 1995.

Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

Nwokedi Idika and Aditya P Mathur. A survey of malware detection techniques. Purdue University, 48:2007–2, 2007.

Jiangtao Ren, Sau Dan Lee, Xianlu Chen, Ben Kao, Reynold Cheng, and David Cheung. Naive bayes classification of uncertain data. In 2009 Ninth IEEE International Conference on Data Mining, pages 944–949. IEEE, 2009.

Reza Sharifnya and Mahdi Abadi. A novel reputation system to detect dga-based botnets. In ICCKE 2013, pages 417–423. IEEE, 2013.

R Vinayakumar, KP Soman, Prabaharan Poornachandran, S Akarsh, and Mohamed Elhoseny. Improved dga domain names detection and categorization using deep learning architectures with classical machine learning algorithms. In Cybersecurity and Secure Information Systems, pages 161–192. Springer, 2019.

Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, and Daniel Grant. Predicting domain generation algorithms with long short-term memory networks. CoRR, abs/1611.00791, 2016.

Minsoo Yeo, Y Koo, Y Yoon, T Hwang, J Ryu, J Song, and Cheolsoo Park. Flow-based malware detection using convolutional neural network. In 2018 International Conference on Information Networking (ICOIN), pages 910–913. IEEE, 2018.

Ying Zhang, Yongzheng Zhang, and Jun Xiao. Detecting the dga-based malicious domain names. In International Conference on Trustworthy Computing and Services, pages 130–137. Springer, 2013.

Shaofang Zhou, Lanfen Lin, Junkun Yuan, Feng Wang, Zhaoting Ling, and Jia Cui. Cnn-based dga detection with high coverage. In 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), pages 62–67. IEEE, 2019.




DOI: https://doi.org/10.11591/csit.v3i1.p%25p

Refbacks

  • There are currently no refbacks.


Computer Science and Information Technologies
ISSN: 2722-323X, e-ISSN: 2722-3221

CSIT Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.