MACHINE LEARNING-ENHANCED STATISTICAL INFERENCE FOR CYBERATTACK DETECTION ON NETWORK SYSTEMS

Md.Kamrul Khan; Md Omar Faruq

doi:10.63125/sw7jzx60

Authors

Md.Kamrul Khan M.Sc in Mathematics, Jagannath University, Dhaka; Bangladesh Author
Md Omar Faruq Master of Science in Cybersecurity Operations, Webster University, Missouri, USA Author

DOI:

https://doi.org/10.63125/sw7jzx60

Keywords:

Machine learning, statistical inference, cybersecurity, anomaly detection, network systems

Abstract

This study presents a comprehensive systematic review of 126 peer-reviewed publications on the integration of machine learning and statistical inference for cyberattack detection in network systems. The primary objective is to critically evaluate how adaptive computational models, when combined with probabilistic reasoning frameworks, enhance detection accuracy, interpretability, and operational efficiency in dynamic and evolving cyber threat landscapes. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, the research process ensured transparency, methodological rigor, and replicability during literature identification, screening, and synthesis. The reviewed studies encompass a broad spectrum of machine learning paradigms—supervised, unsupervised, hybrid, and deep learning architectures—integrated with statistical inference methods such as Bayesian updating, likelihood estimation, hypothesis testing, probabilistic calibration, and statistical drift detection. Evidence consistently demonstrates that these integrated frameworks achieve superior true positive rates, reduced false positives, and greater resilience against zero-day and polymorphic attacks compared to traditional rule-based or signature-based systems. Notably, the studies highlight the pivotal role of dataset quality, diversity, and timeliness, with optimal results achieved when recent, representative data are combined with statistical preprocessing, dimensionality reduction, and adaptive feature selection techniques. Operational challenges in real-time deployment—such as minimizing latency, optimizing computational resources, and sustaining adaptability—are effectively addressed through innovations like lightweight statistical screening layers, adaptive thresholding, and distributed processing. Comparative experimental results further validate that integrated approaches deliver measurable improvements not only in technical detection metrics but also in scalability, cross-domain applicability, and human interpretability. This review concludes that the convergence of machine learning and statistical inference constitutes a mature, high-impact methodology for modern cybersecurity defense. However, it also identifies critical research gaps, including the absence of standardized performance benchmarks and limited validation in large-scale, real-world network environments. Addressing these gaps will be essential to ensuring the scalability, robustness, and long-term operational relevance of such integrated detection systems.