Clustering Techniques for Anomaly Detection in Cybersecurity Logs: A Survey

Authors

  • R V. V. Naga Sri Pavani Undergraduate Student, Department of Computer Science and Engineering, Pragati Engineering College, Surampalem, Andhra Pradesh, India
  • S. K. Sankar Assistant Professor, Department of Computer Science and Engineering, Pragati Engineering College, Surampalem, Andhra Pradesh, India

Keywords:

Clustering algorithms, Clustering-based anomaly detection, Cybersecurity, Hierarchical clustering, Threats

Abstract

A pivotal issue in the current digitally interconnected world is cyber security because networks and systems are continuously threatened by more advanced attacks. Checking cyber security logs is one of the main ways of detecting such threats, as the said documents capture the activities occurring on the system and can help detect unusual actions. The huge size, the dimensionality, as well as the unstructured data make it impractical to analyse the logs manually or on a rule-based approach that is usually ineffective in identifying new or latent threats. The paper is a review article that is based on the detection of anomalies in security log data using advanced clustering algorithms. The addressed problem is that it is very hard to correctly locate anomalies, which can be clues of cyber-attacks, in large and unlabeled datasets and that conventional detection methods cannot be scaled or generalized. Compared to other supervised approaches to learning, where labelled data are used, anomaly detection in the context of clustering is unsupervised and therefore applicable to real-world security situations, where novel attack patterns are often discovered. The task of the present survey is to analyze and contrast various clustering methods, both classic, such as K-Means, DBSCAN and hierarchical clustering, and newer algorithms, such as OPTICS, fuzzy C-means, deep clustering algorithms. These techniques cluster the data by similarity allowing outliers to be found and possibly be signifying an ill intent. These algorithms are reviewed on some parameters such as accuracy in detection, efficiencies, scalability towards large size of input, and robustness to noise or completeness which can be a common problem in log analysis. One of the findings that the study has emphasized is a trade-off which exists between precision of detection and false positives. The sensitivity may be so high that it has the potential to flood security analysts with alerts, and low sensitivity may miss important threats. This survey will give an account of the use of the clustering algorithm as an anomaly detection model in cybersecurity log information. It inherently tweaks out what the existing methods are today, strong and weak, and points out how there is a demand of better ways of doing this that are more adaptive, explainable, and efficient. Clustering-based anomaly detection is a potentially good research and development in the creation of pro-active and intelligent defense mechanisms based on connecting the gap between data science methods and practices as far as efficient cybersecurity requirements are concerned. 

References

W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. Nazri, “Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system,” Expert Systems with Applications, vol. 67, pp. 296–303, Jan. 2017, doi: https://doi.org/10.1016/j.eswa.2016.09.041

J. Audibert, P. Michiardi, F. Guyard, S. Marti, and M. A. Zuluaga, “USAD,” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Aug. 2020, doi: https://doi.org/10.1145/3394486.3403392

E. Bigdeli, M. Mohammadi, B. Raahemi, and S. Matwin, “Incremental anomaly detection using two-layer cluster-based structure,” Information Sciences, vol. 429, pp. 315–331, Mar. 2018, doi: https://doi.org/10.1016/j.ins.2017.11.023

J. Rao, T. Qian, S. Qi, Y. Wu, Q. Liao, and X. Wang, “Student Can Also be a Good Teacher,” Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3383–3387, Oct. 2021, doi: https://doi.org/10.1145/3459637.3482194

C. Retiti, H. Lee, D. Choi, J. Lim, K. Bok, and J. Yoo, “An Abnormal Pattern Detection Scheme Based on GCN and DBSCAN in a Large-Scale Graph,” International Journal of Contents, vol. 18, no. 4, pp. 1–9, Dec. 2022, doi: https://doi.org/10.5392/ijoc.2022.18.4.001

R. A. Ariyaluran Habeeb, F. Nasaruddin, A. Gani, I. A. Targio Hashem, E. Ahmed, and M. Imran, “Real-time big data processing for anomaly detection: A Survey,” International Journal of Information Management, vol. 45, pp. 289–307, Apr. 2019, doi: https://doi.org/10.1016/j.ijinfomgt.2018.08.006

H. Hindy et al., “A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems,” IEEE Access, vol. 8, pp. 104650–104675, 2020, doi: https://doi.org/10.1109/access.2020.3000179

M. A. Khan and J. Kim, “Toward Developing Efficient Conv-AE-Based Intrusion Detection System Using Heterogeneous Dataset,” Electronics, vol. 9, no. 11, p. 1771, Oct. 2020, doi: https://doi.org/10.3390/electronics9111771

F. Kilincer, F. Ertam, and A. Sengur, “Machine Learning Methods for Cyber Security Intrusion Detection: Datasets and Comparative Study,” Computer Networks, vol. 188, p. 107840, Jan. 2021, doi: https://doi.org/10.1016/j.comnet.2021.107840

S. Zhou, H. Xu, “A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions,” ACM Computing Surveys, vol. 57, no. 3, pp. 1–38, Nov. 2024, doi: https://doi.org/10.1145/3689036

X. Larriva-Novo, M. Vega-Barbas, V. A. Villagrá, D. Rivera, M. Álvarez-Campana, and J. Berrocal, “Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets,” Applied Sciences, vol. 10, no. 10, p. 3430, May 2020, doi: https://doi.org/10.3390/app10103430

H. Liu and B. Lang, “Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey,” Applied Sciences, vol. 9, no. 20, p. 4396, Oct. 2019, doi: https://doi.org/10.3390/app9204396

X. Ma, J. Wu, S. Xue, J. Yang, “A Comprehensive Survey on Graph Anomaly Detection with Deep Learning,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2021, doi: https://doi.org/10.1109/tkde.2021.3118815

N. Moustafa and J. Slay, “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set,” Information Security Journal: A Global Perspective, vol. 25, no. 1–3, pp. 18–31, Jan. 2016, doi: https://doi.org/10.1080/19393555.2015.1125974

M. Munir, S. A. Siddiqui, M. A. Chattha, A. Dengel, and S. Ahmed, “FuseAD: Unsupervised Anomaly Detection in Streaming Sensor Data by Fusing Statistical and Deep Learning Models,” Sensors, vol. 19, no. 11, p. 2451, May 2019, doi: https://doi.org/10.3390/s19112451

Published

2025-11-21

Issue

Section

Articles