ЗАСТОСУВАННЯ МЕТОДІВ МАШИННОГО НАВЧАННЯ ДЛЯ ВИЯВЛЕННЯ АТАК У КОРПОРАТИВНІЙ МЕРЕЖІ НА ОСНОВІ FLOW-ОЗНАК

Daria Shulimova

doi:10.28925/2663-4023.2026.33.1167

Authors

Daria Shulimova State University of Information and Communication Technologies https://orcid.org/0009-0002-9557-990X

DOI:

https://doi.org/10.28925/2663-4023.2026.33.1167

Keywords:

attack detection, flow features, Decision Tree, Random Forest, DDoS, botnet, web attacks, error matrix, precision.

Abstract

Detecting malicious network activity in corporate information resources using statistical traffic flow characteristics is a practically important task, since detection effectiveness is determined not only by overall accuracy but also by the ratio of false alarms to missed attacks, which directly affects the workload of security operators and the level of residual risk for an organization. This paper presents an approach to attack detection in network connection streams based on flow features using tree-based machine learning methods and analyzes their behavior across different threat classes within a single reproducible experimental protocol. The experimental study employs the CSE-CIC-IDS2018 dataset with features extracted by CICFlowMeter and formulates a binary classification problem of benign versus attack for three malicious activity scenarios covering botnet activity, volumetric DDoS attacks (HOIC, LOIC-UDP), and web attacks (Brute Force-Web, Brute Force-XSS, SQL Injection). A comparison of Decision Tree and Random Forest models is implemented with class balancing and fixed train–test split parameters to ensure a consistent evaluation across different attack types. Performance is assessed using the confusion matrix and derived metrics for the attack class, including precision and recall, as well as an analysis of absolute FP and FN values, which are most informative in the presence of rare attacks. The obtained results demonstrate an almost complete separation between benign and attack classes for Bot and DDoS, which is consistent with the presence of pronounced traffic patterns and high class separability in the feature space. For web attacks, a fundamentally different error profile is observed: the Decision Tree achieves higher detection completeness at the cost of an increased number of false alarms and reduced alert precision, whereas the Random Forest produces substantially more precise alerts while increasing the number of missed attacks. It is shown that the choice of a detection method should account for the attack type, class imbalance, and the acceptable trade-off between false alarms and missed detections, and that result interpretation should rely on metrics that reflect operational consequences for corporate network monitoring systems.

Downloads

Download data is not yet available.

References

Haidur, H. I., Hakhov, S. O., Dmitriiev, V. Y., & Bondarenko, N. V. (2021). Detection of traffic anomalies in organizational information systems using machine learning methods based on categorical field prediction algorithms. Telecommunications and Information Technologies. https://tit.dut.edu.ua/index.php/telecommunication/article/view/2402

Savchenko, T. V., Lutska, N. M., Vlasenko, L. O., & Tomenko, N. D. (2025). Analysis of the effectiveness of network traffic anomaly detection based on machine learning models. Cybersecurity: Education, Science, Technique. https://csecurity.kubg.edu.ua/index.php/journal/article/view/898

Sarhan, M., Layeghy, S., & Portmann, M. (2022). Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Array. https://www.sciencedirect.com/science/article/abs/pii/S2214579622000533

Sarhan, M., Layeghy, S., & Portmann, M. (2021). Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. arXiv. https://arxiv.org/abs/2104.07183

Canadian Institute for Cybersecurity. (2018). CSE-CIC-IDS2018 dataset. https://www.unb.ca/cic/datasets/ids-2018.html

Amazon Web Services. (n.d.). A realistic cyber defense dataset (CSE-CIC-IDS2018). Registry of Open Data on AWS. https://registry.opendata.aws/cse-cic-ids2018/

Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy (ICISSP 2018). https://www.scitepress.org/papers/2018/66398/66398.pdf

Breiman, L. (2001). Random forests. Machine Learning. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf

Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann. https://dl.acm.org/doi/abs/10.5555/152181

Lashkari, A. H. (n.d.). CICFlowMeter [Computer software]. GitHub. https://github.com/ahlashkari/CICFlowMeter

scikit-learn DecisionTreeClassifier documentation

scikit-learn RandomForestClassifier documentation

APPLYING MACHINE LEARNING METHODS TO DETECT ATTACKS IN A CORPORATE NETWORK BASED ON FLOW FEATURES

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

index

Language

Make a Submission

counter

Information

Developed By

Current Issue