Two-Stage Ensemble Machine Learning for Network Intrusion Detection

Jimson A. Olaybar; Patrick D. Cerna

doi:https://doi.org/10.14445/22315381/IJETT-V74I5P130

Research Article | Open Access | Download PDF

Volume 74 | Issue 5 | Year 2026 | Article Id. IJETT-V74I5P130 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I5P130

Two-Stage Ensemble Machine Learning for Network Intrusion Detection

Jimson A. Olaybar, Patrick D. Cerna

Received	Revised	Accepted	Published
17 Dec 2025	13 Jan 2026	11 Mar 2026	30 May 2026

Citation :

Jimson A. Olaybar, Patrick D. Cerna, "Two-Stage Ensemble Machine Learning for Network Intrusion Detection," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 5, pp. 484-494, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I5P130

Abstract

This paper introduces a two-stage ensemble machine learning architecture of network Intrusion Detection Systems (IDS), which is developed to improve the accuracy of detection and reliability of classification in a more complicated cyberspace. The proposed model operates in two phases: Stage A involves binary classification to differentiate between benign traffic and malicious activity with the help of a calibrated stacking ensemble of Random Forest, Gradient Boosting, and XGBoost classifiers; Stage B involves the use of a multi-class attack categorization through a Random Forest classifier that will be trained only on attack samples. The CIC-IDS2017 dataset was used to evaluate the system and includes more than 2.8 million records of network traffic, with varied attack scenarios. Preprocessing involved normalization of features, filling in of missing values, and screening of 78 flow-based numerical features. As a result of the experiments, the two-stage ensemble obtained 99.92% accuracy in binary classification and 99.83% accuracy in multi-class classification on 14 types of attacks. The model scored close to the optimum ROC-AUC ( 0.99987) and was able to reduce the bias of class imbalance using probability estimation and threshold optimization. It was compared and found that the proposed system was superior to the existing methods of ensemble and deep learning techniques in accuracy and computation efficiency. The results prompt the future prospects of multi-level ensemble learning to enhance the performance of IDS with regard to modern network infrastructures. The further developments in the field will focus on adaptive learning to unknown threats, and implementation together with real-time network defenses.

Keywords

Ensemble Learning, Intrusion Detection, Machine Learning, Network Security, Random Forest.

References

[1] Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” Proceedings of the 4^th International Conference on Information Systems Security and Privacy ICISSP, Funchal, Madeira, Portugal, vol. 1, pp. 108-116.
[CrossRef] [Google Scholar] [Publisher Link]

[2] Nour Moustafa, and Jill Slay, “UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set),” 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, pp. 1-6, 2015.
[CrossRef] [Google Scholar] [Publisher Link]

[3] Mahbod Tavallaee et al., “A Detailed Analysis of the KDD CUP 99 Data Set,” 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, pp. 1-6, 2009.
[CrossRef] [Google Scholar] [Publisher Link]

[4] Leo Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[CrossRef] [Google Scholar] [Publisher Link]

[5] Jerome H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.
[CrossRef] [Google Scholar] [Publisher Link]

[6] Tianqi Chen, and Carlos Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the 22^nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mini, Association for Computing Machinery, New York, NY, United States, pp. 785-794, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[7] David H. Wolpert, “Stacked Generalization,” Neural Networks, vol. 5, no. 2, pp. 241-259, 1992.
[CrossRef] [Google Scholar] [Publisher Link]

[8] Alexandru Niculescu-Mizil, and Rich Caruana, “Predicting Good Probabilities with Supervised Learning,” Proceedings of the 22^nd International Conference on Machine Learning, Association for Computing Machinery, New York, NY, United States, pp. 625-632, 2005.
[CrossRef] [Google Scholar] [Publisher Link]

[9] John C. Platt, “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods,” Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61-74, 1999.
[Google Scholar]

[10] Bianca Zadrozny, and Charles Elkan, “Transforming Classifier Scores into Accurate Multi-Class Probability Estimates,” Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Minin, Association for Computing Machinery, New York, NY, United States, pp. 694-699, 2002.
[CrossRef] [Google Scholar] [Publisher Link]

[11] Anna L. Buczak, and Erhan Guven, “A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection,” IEEE Communications Surveys and Tutorials, vol. 18, no. 2, pp. 1153-1176, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[12] Markus Ring et al., “A Survey of Network-based Intrusion Detection Datasets,” Computers and Security, vol. 86, pp. 147-167, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[13] Ansam Khraisat et al., “Survey of Intrusion Detection Systems: Techniques, Datasets and Challenges,” Cybersecurity, vol. 2, no. 1, pp. 1-22, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[14] Bianca Zadrozny, and Charles Elkan, “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers,” ICMI, vol. 1, no. 5, pp. 1-8, 2001.
[Google Scholar]

[15] Sebastián García, Alejandro Zunino, and Marcelo Campo, “Survey on Network-based Botnet Detection Methods,” Security and Communication Networks, vol. 7, no. 5, pp. 878-903, 2014.
[CrossRef] [Google Scholar] [Publisher Link]

[16] Shadi Aljawarneh, Monther Aldwairi, and Muneer Bani Yassein, “Anomaly-based Intrusion Detection System through Feature Selection Analysis and Building Hybrid Efficient Model,” Journal of Computational Science, vol. 25, pp. 152-160, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[17] R. Vinayakumar et al., “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, pp. 41525-41550, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[18] Sydney Mambwe Kasongo, and Yanxia Sun, “A Deep Learning Method with Wrapper based Feature Extraction for Wireless Intrusion Detection System,” Computers and Security, vol. 92, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[19] Robin Sommer, and Vern Paxson, “Outside the Closed World: On using Machine Learning for Network Intrusion Detection,” 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 305-316, 2010.
[CrossRef] [Google Scholar] [Publisher Link]

[20] Chuanlong Yin et al., “A Deep Learning Approach for Intrusion Detection using Recurrent Neural Networks,” IEEE Access, vol. 5, pp. 21954-21961, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[21] Mohiuddin Ahmed, Abdun Naser Mahmood, and Jiankun Hu, “A Survey of Network Anomaly Detection Techniques,” Journal of Network and Computer Applications, vol. 60, pp. 19-31, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[22] Yisroel Mirsky et al., “Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection,” arXiv preprint, pp. 1-15, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[23] Nathan Shone et al., “A Deep Learning Approach to Network Intrusion Detection,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 41-50, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[24] Ahmad Javaid et al., “A Deep Learning Approach for Network Intrusion Detection System,” Eai Endorsed Transactions on Security and Safety, vol. 3, no. 9, pp. 1-6, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[25] W. Haider et al., “Generating Realistic Intrusion Detection System Dataset based on Fuzzy Qualitative Modeling,” Journal of Network and Computer Applications, vol. 87, pp. 185-192, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[26] Anna Sperotto et al., “An Overview of IP Flow-based Intrusion Detection,” IEEE Communications Surveys and Tutorials, vol. 12, no. 3, pp. 343-356, 2010.
[CrossRef] [Google Scholar] [Publisher Link]

[27] Wenye Wang, and Zhuo Lu, “Cyber Security in the Smart Grid: Survey and Challenges,” Computer Networks, vol. 57, no. 5, pp. 1344-1371, 2013.
[CrossRef] [Google Scholar] [Publisher Link]

[28] Mohammad Almseidin et al., “Evaluation of Machine Learning Algorithms for Intrusion Detection System,” 2017 IEEE 15^th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia, pp. 000277-000282, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[29] Jasmin Kevric, Samed Jukic, and Abdulhamit Subasi, “An Effective Combining Classifier Approach using Tree Algorithms for Network Intrusion Detection,” Neural Computing and Applications, vol. 28, no. S1, pp. 1051-1058, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[30] Varun Chandola, Arindam Banerjee, and Vipin Kumar, “Anomaly Detection: A Survey,” ACM Computing Surveys (CSUR), vol. 41, no. 3, pp. 1-58, 2009.
[CrossRef] [Google Scholar] [Publisher Link]

[31] Rosemarie Y. Saligue, and Emannuel T. Saligue, “Real-World Traffic Analysis in Pisonet using DTW and Anomaly Detection,” 2025 7^th International Conference on Innovative Data Communication Technologies and Application (ICIDCA), Coimbatore, India, pp. 99-104, 2025.
[CrossRef] [Google Scholar] [Publisher Link]