APPLYING MACHINE LEARNING METHODS TO DETECT ATTACKS IN A CORPORATE NETWORK BASED ON FLOW FEATURES
DOI:
https://doi.org/10.28925/2663-4023.2026.33.1167Keywords:
attack detection, flow features, Decision Tree, Random Forest, DDoS, botnet, web attacks, error matrix, precision.Abstract
Detecting malicious network activity in corporate information resources using statistical traffic flow characteristics is a practically important task, since detection effectiveness is determined not only by overall accuracy but also by the ratio of false alarms to missed attacks, which directly affects the workload of security operators and the level of residual risk for an organization. This paper presents an approach to attack detection in network connection streams based on flow features using tree-based machine learning methods and analyzes their behavior across different threat classes within a single reproducible experimental protocol. The experimental study employs the CSE-CIC-IDS2018 dataset with features extracted by CICFlowMeter and formulates a binary classification problem of benign versus attack for three malicious activity scenarios covering botnet activity, volumetric DDoS attacks (HOIC, LOIC-UDP), and web attacks (Brute Force-Web, Brute Force-XSS, SQL Injection). A comparison of Decision Tree and Random Forest models is implemented with class balancing and fixed train–test split parameters to ensure a consistent evaluation across different attack types. Performance is assessed using the confusion matrix and derived metrics for the attack class, including precision and recall, as well as an analysis of absolute FP and FN values, which are most informative in the presence of rare attacks. The obtained results demonstrate an almost complete separation between benign and attack classes for Bot and DDoS, which is consistent with the presence of pronounced traffic patterns and high class separability in the feature space. For web attacks, a fundamentally different error profile is observed: the Decision Tree achieves higher detection completeness at the cost of an increased number of false alarms and reduced alert precision, whereas the Random Forest produces substantially more precise alerts while increasing the number of missed attacks. It is shown that the choice of a detection method should account for the attack type, class imbalance, and the acceptable trade-off between false alarms and missed detections, and that result interpretation should rely on metrics that reflect operational consequences for corporate network monitoring systems.
Downloads
References
Haidur, H. I., Hakhov, S. O., Dmitriiev, V. Y., & Bondarenko, N. V. (2021). Detection of traffic anomalies in organizational information systems using machine learning methods based on categorical field prediction algorithms. Telecommunications and Information Technologies. https://tit.dut.edu.ua/index.php/telecommunication/article/view/2402
Savchenko, T. V., Lutska, N. M., Vlasenko, L. O., & Tomenko, N. D. (2025). Analysis of the effectiveness of network traffic anomaly detection based on machine learning models. Cybersecurity: Education, Science, Technique. https://csecurity.kubg.edu.ua/index.php/journal/article/view/898
Sarhan, M., Layeghy, S., & Portmann, M. (2022). Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Array. https://www.sciencedirect.com/science/article/abs/pii/S2214579622000533
Sarhan, M., Layeghy, S., & Portmann, M. (2021). Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. arXiv. https://arxiv.org/abs/2104.07183
Canadian Institute for Cybersecurity. (2018). CSE-CIC-IDS2018 dataset. https://www.unb.ca/cic/datasets/ids-2018.html
Amazon Web Services. (n.d.). A realistic cyber defense dataset (CSE-CIC-IDS2018). Registry of Open Data on AWS. https://registry.opendata.aws/cse-cic-ids2018/
Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy (ICISSP 2018). https://www.scitepress.org/papers/2018/66398/66398.pdf
Breiman, L. (2001). Random forests. Machine Learning. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann. https://dl.acm.org/doi/abs/10.5555/152181
Lashkari, A. H. (n.d.). CICFlowMeter [Computer software]. GitHub. https://github.com/ahlashkari/CICFlowMeter
scikit-learn DecisionTreeClassifier documentation
scikit-learn RandomForestClassifier documentation
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Дар'я Шулімова

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.