METHOD FOR DETECTING ATTACKS ON CORPORATE WEB APPLICATIONS BASED ON GRADIENT BOOSTING
DOI:
https://doi.org/10.28925/2663-4023.2025.31.1062Keywords:
web attacks, attack detection, network traffic, HTTP, machine learning, gradient boosting, LightGBM, XGBoostAbstract
The paper addresses the problem of detecting web attacks in the network traffic of corporate web applications under the predominance of encrypted connections, where inspection of request contents is limited and flow-level and behavioral characteristics play a key role. An approach is proposed based on gradient boosting decision tree ensembles and aimed at classifying network flows as benign or malicious. The experimental evaluation is conducted on the CSE-CIC-IDS2018 dataset; the pipeline includes data cleaning and normalization, as well as class balancing through controlled reduction of benign flows to mitigate the impact of class imbalance. LightGBM and XGBoost models with standard configurations for binary classification on tabular features are used for attack detection; evaluation is performed on a held-out test set using common performance metrics and confusion-matrix analysis. The obtained results indicate very strong class separability and a low number of classification errors, while XGBoost provides a slightly better trade-off between attack detection completeness and the false alarm rate compared to LightGBM. Feature contributions to model decisions are analyzed; the most informative predictors are packet-length statistics, inter-arrival time measures, traffic directionality ratios, and transport-layer attributes reflecting the structure and dynamics of network interactions. At the same time, the near-ceiling metric values observed under controlled conditions may be partially influenced by dataset construction specifics and by features associated with traffic-generation scenarios; therefore, the findings are interpreted with explicit validity considerations. The practical value of the work lies in a reproducible flow-based web attack detection pipeline, a comparison of two gradient boosting implementations, and recommendations for further generalization checks using alternative validation schemes and additional datasets.
Downloads
References
Applebaum, S., Gaber, T., & Ahmed, A. (2021). Signature-based and machine-learning-based web application firewalls: A short survey. Procedia Computer Science, 189, 358–367. https://doi.org/10.1016/j.procs.2021.05.105
OWASP Core Rule Set Project. (n.d.). False positives and tuning. CRS Documentation. https://coreruleset.org/docs/
Velan, P., Čermák, M., Čeleda, P., & Drašar, M. (2015). A survey of methods for encrypted traffic classification and analysis. International Journal of Network Management, 25(5), 355–374. https://doi.org/10.1002/nem.1901
Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 108–116. https://doi.org/10.5220/0006639801080116
Canadian Institute for Cybersecurity. (2018). IDS 2018 dataset. University of New Brunswick. https://www.unb.ca/cic/datasets/ids-2018.html
Registry of Open Data on AWS. (n.d.). A realistic cyber defense dataset (CSE-CIC-IDS2018). https://registry.opendata.aws/cse-cic-ids2018/
Sarhan, M., Layeghy, S., & Portmann, M. (2022). Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Big Data Research, 30, 100345. https://doi.org/10.1016/j.bdr.2022.100345
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.
OWASP. (2021). OWASP Top 10: 2021. https://owasp.org/Top10/
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Анна Бойко

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.