МЕТОД ВИЯВЛЕННЯ АТАК НА КОРПОРАТИВНІ ВЕБ-ДОДАТКИ НА ОСНОВІ ГРАДІЄНТНОГО БУСТИНГУ

Anna Boiko

doi:10.28925/2663-4023.2025.31.1062

Authors

Anna Boiko State University of Information and Communication Technologies https://orcid.org/0009-0001-3709-6283

DOI:

https://doi.org/10.28925/2663-4023.2025.31.1062

Keywords:

web attacks, attack detection, network traffic, HTTP, machine learning, gradient boosting, LightGBM, XGBoost

Abstract

The paper addresses the problem of detecting web attacks in the network traffic of corporate web applications under the predominance of encrypted connections, where inspection of request contents is limited and flow-level and behavioral characteristics play a key role. An approach is proposed based on gradient boosting decision tree ensembles and aimed at classifying network flows as benign or malicious. The experimental evaluation is conducted on the CSE-CIC-IDS2018 dataset; the pipeline includes data cleaning and normalization, as well as class balancing through controlled reduction of benign flows to mitigate the impact of class imbalance. LightGBM and XGBoost models with standard configurations for binary classification on tabular features are used for attack detection; evaluation is performed on a held-out test set using common performance metrics and confusion-matrix analysis. The obtained results indicate very strong class separability and a low number of classification errors, while XGBoost provides a slightly better trade-off between attack detection completeness and the false alarm rate compared to LightGBM. Feature contributions to model decisions are analyzed; the most informative predictors are packet-length statistics, inter-arrival time measures, traffic directionality ratios, and transport-layer attributes reflecting the structure and dynamics of network interactions. At the same time, the near-ceiling metric values observed under controlled conditions may be partially influenced by dataset construction specifics and by features associated with traffic-generation scenarios; therefore, the findings are interpreted with explicit validity considerations. The practical value of the work lies in a reproducible flow-based web attack detection pipeline, a comparison of two gradient boosting implementations, and recommendations for further generalization checks using alternative validation schemes and additional datasets.

Downloads

Download data is not yet available.

References

Applebaum, S., Gaber, T., & Ahmed, A. (2021). Signature-based and machine-learning-based web application firewalls: A short survey. Procedia Computer Science, 189, 358–367. https://doi.org/10.1016/j.procs.2021.05.105

OWASP Core Rule Set Project. (n.d.). False positives and tuning. CRS Documentation. https://coreruleset.org/docs/

Velan, P., Čermák, M., Čeleda, P., & Drašar, M. (2015). A survey of methods for encrypted traffic classification and analysis. International Journal of Network Management, 25(5), 355–374. https://doi.org/10.1002/nem.1901

Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 108–116. https://doi.org/10.5220/0006639801080116

Canadian Institute for Cybersecurity. (2018). IDS 2018 dataset. University of New Brunswick. https://www.unb.ca/cic/datasets/ids-2018.html

Registry of Open Data on AWS. (n.d.). A realistic cyber defense dataset (CSE-CIC-IDS2018). https://registry.opendata.aws/cse-cic-ids2018/

Sarhan, M., Layeghy, S., & Portmann, M. (2022). Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Big Data Research, 30, 100345. https://doi.org/10.1016/j.bdr.2022.100345

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.

OWASP. (2021). OWASP Top 10: 2021. https://owasp.org/Top10/

METHOD FOR DETECTING ATTACKS ON CORPORATE WEB APPLICATIONS BASED ON GRADIENT BOOSTING

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

index

Language

Make a Submission

counter

Information

Developed By

Current Issue