METHODOLOGY FOR COMPLEX FEATURE OPTIMIZATION IN CYBERATTACK DETECTION SYSTEMS
DOI:
https://doi.org/10.28925/2663-4023.2026.32.1204Keywords:
cybersecurity, Intrusion Detection Systems (IDS), feature optimization, feature selection, feature extraction, NSL-KDD dataset, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), machine learning, data preprocessing, dimensionality reduction.Abstract
This article addresses one of the most pressing challenges in modern cybersecurity - the necessity of enhancing the efficiency of intelligent Intrusion Detection Systems (IDS) amidst rapid digitalization and an increasingly complex threat landscape. The authors provide a substantive justification that traditional signature-based methods are becoming insufficient against AI-driven attacks, necessitating a transition to machine learning techniques. However, the high dimensionality of network traffic and the presence of numerous redundant, correlated, or noisy features create a "curse of dimensionality" effect. This leads to a critical increase in computational overhead, delayed real-time system response, and reduced classification accuracy due to model overfitting. The relevance of this work is underscored by the need to develop systemic data preprocessing approaches, specifically demonstrated using the benchmark NSL-KDD dataset.
The object of the study is the process of optimizing input data for cyber-attack classifiers. The author proposes and details a four-stage methodology for comprehensive feature optimization. The methodology is based on a hybrid combination of various approaches: preprocessing (cleaning, normalization, and standardization); feature selection (application of filter methods such as Pearson correlation and Mutual Information (MI), embedded methods, and wrapper methods); and feature extraction (utilizing dimensionality reduction techniques such as Principal Component Analysis (PCA) and LDA/ULDA (Linear Discriminant Analysis)), which allows for the transformation of the original space into a smaller set of uncorrelated components.
The scientific novelty of the work lies in the systemic integration of statistical filters with ensemble learning methods for fine-tuning models to the specific characteristics of network traffic. The article provides a mathematical justification for each method, specifically through Shannon entropy and the Gini index. It is demonstrated that for the NSL-KDD dataset, using only 12-15 of the most relevant features allows for maintaining classification accuracy at the level of 98–99%, significantly outperforming models trained on the full set (41 features) in terms of training and inference speed. Special attention is given to the advantages of the ULDA method in addressing multicollinearity. The authors conclude that the proposed methodology serves as a universal tool for IDS optimization, achieving a balance between accuracy, speed, and system robustness. Future research directions are identified: adapting models to imbalanced data, utilizing non-linear deep learning-based autoencoders, and investigating the resilience of selected features against adversarial attacks.
Downloads
References
Yevseiev, S. P., Zakovorotnyi, O. Y., Milov, O. V., Kuchuk, H. A., Haluza, O. A., Koval, M. V., Voitko, O. V., & Hryshchuk, R. V. (2024). Methodology for synthesizing models of intelligent management systems and security of critical infrastructure objects. Novyi Svit-2000.
Lukova-Chuyko, N. V., Toliupa, S. V., Nakonechnyi, V. S., & Brailovsky, M. M. (2021). Intrusion detection systems and functional resilience of distributed information systems to cyber threats. Format.
Lande, D. V., Subach, I. Y., & Boyarynova, Y. E. (2018). Fundamentals of the theory and practice of data mining in the field of cybersecurity. ISZZI KPI.
Brailovskyi, M. M., Zybin, S. V., Kobozeva, A. A., Khoroshko, V. O., & Khokhlachova, Y. E. (2021). Analysis of cybersecurity of information systems. FOP Yamchynskyi O. V.
Abubakar, A. I., Chiroma, H., Muaz, A. S., & Ila, L. B. (2015). A review of the advances in cybersecurity benchmark datasets for evaluating data-driven intrusion detection systems. Procedia Computer Science, 62, 221–227.
Bajaj, K., & Arora, A. (2013). Dimension reduction in intrusion detection features using discriminative machine learning approach. IJCSI International Journal of Computer Science Issues, 10, 324–328.
Zhang, F., & Wang, D. (2013). An effective feature selection approach for network intrusion detection. In 2013 IEEE Eighth International Conference on Networking, Architecture and Storage (pp. 307–311). IEEE.
Wahba, Y., Elsalamouny, E., & Eltaweel, G. (2015). Improving the performance of multi-class intrusion detection systems using feature reduction. IJCSI International Journal of Computer Science Issues, 12(3), 355–368.
Tesfahun, A., & Bhaskari, D. L. (2013). Intrusion detection using random forests classifier with SMOTE and feature reduction. In 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies (pp. 127–132).
Dhafian, B., Ahmad, I., & Al-Ghamid, A. (2015). An overview of the current classification techniques in intrusion detection. In International Conference on Security and Management (pp. 82–88).
Desale, K. S., & Ade, R. (2015). Genetic algorithm-based feature selection approach for effective intrusion detection system. In 2015 International Conference on Computer Communication and Informatics (pp. 1–6).
Ganapathy, S., et al. (2013). Intelligent feature selection and classification techniques for intrusion detection in networks: A survey. EURASIP Journal on Wireless Communications and Networking, 2013(1), 271.
Zargari, S., & Voorhris, D. (2012). Feature selection in the corrected KDD dataset. In 2012 International Conference on Emerging Intelligent Data and Web Technologies (pp. 174–180).
Aparicio-Navarro, F., Kyriakopoulos, K. G., & Parish, D. J. (2014). Automatic dataset labelling and feature selection for intrusion detection systems. In 2014 IEEE Military Communications Conference (MILCOM) (pp. 46–51). IEEE.
Relan, N. G., & Patil, D. R. (2015). Implementation of network intrusion detection system using variant of decision tree algorithm. In 2015 IEEE International Conference on Nascent Technologies in the Engineering Field (pp. 1–5).
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Сергій Толюпа, Андрій Кулько

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.