ENSEMBLE OF CLUSTERIZATION ALGORITHMS WITH VARIOUS METRICS FOR ANALYSIS OF BANK NETWORK TRAFFIC
DOI:
https://doi.org/10.28925/2663-4023.2024.26.790Keywords:
cybersecurity; cluster analysis; ensemble algorithm; distance metrics; coassociative matrix; banking networks; network traffic analysis; probabilistic modelsAbstract
The relevance of the study is due to the need to increase the accuracy of the analysis of bank network traffic, represented by heterogeneous data, including server logs, network connections, user behavioral data and traffic telemetry. In the context of increasing data volumes and the complexity of the structure of corporate networks of Ukrainian banks, traditional analysis methods lose their effectiveness, making the application of machine learning methods, including ensemble clustering algorithms, relevant. Modern research in this area focuses on the development of adaptive segmentation methods that use probabilistic models and dynamic weighting of algorithms to increase the accuracy of data partitioning. The paper proposes an ensemble KA method that uses variations of the K-means algorithm with different distance metrics. The method is based on a probabilistic model that takes into account latent classes and dynamically adjustable algorithm weights, and for the consistency of the partitioning, a coassociative matrix is used that reflects the frequency of pairs of objects falling into one cluster. The novelty of the research lies in the development of a mechanism for adaptive weighting of ensemble algorithms based on quality metrics (ARI, NMI), which allows to increase the accuracy of clustering of bank network traffic. The proposed method compensates for the shortcomings of individual algorithms and takes into account the features of different types of data (packets, connections, users, devices). The practical significance of the work is confirmed by the possibility of applying the method to analyze banking network traffic, detect anomalies, optimize routing and increase cybersecurity of the banking network, and the developed approach will allow to take into account the complex data structure and adapt to dynamic changes in the bank's network environment.
Downloads
References
Kuchuk, G. A. (2005). Method for studying fractal network traffic. Information Processing Systems, (5), 74–84.
Babenko, T. V. (2013). Study of network traffic entropy as an indicator of DDOS attacks. Scientific Bulletin of the National Mining University, (2), 86–89.
Kuchuk, G. A., Mozhayev, O. O., & Vorobyev, O. V. (2006). Method for predicting fractal traffic. Radioelectronic and computer systems, (6), 181–188.
Ren, Y., Domeniconi, C., Zhang, G., & Yu, G. (2017). Weighted-object ensemble clustering: methods and analysis. Knowledge and Information Systems, 51(2), 661–689. https://doi.org/10.1007/s10115-016-0988-y
Zheng, L., Li, T., & Ding, C. (2010). Hierarchical ensemble clustering. In: 2010 IEEE International Conference on Data Mining, 1199–1204.
Katrych, D. S. (2021). The problem of detecting anomalous network traffic flows based on clustering of network connections. All-Ukrainian scientific and practical conference of students, postgraduates and young scientists. Mathematical methods of computer modeling and cyber security, 222–224.
Ghosh, J., & Acharya, A. (2011). Cluster ensembles. Wiley interdisciplinary reviews: Data mining and knowledge discovery, 1(4), 305–315.
Berikov, V. (2011). A latent variable pairwise classification model of a clustering ensemble. In Multiple Classifier Systems: 10th International Workshop, MCS 2011, Naples, Italy, 279–288.
Figuera, P., Cuzzocrea, A., & García Bringas, P. (2023). Probability Density Function for Clustering Validation. In International Conference on Hybrid Artificial Intelligence Systems, 133–144.
Arvapally, R.S., & Liu, X. (2012). Analyzing credibility of arguments in a web-based intelligent argumentation system for collective decision support based on K-means clustering algorithm. Knowledge Management Research & Practice, 10(4), 326–341.
Tshimanga, R.M., Bola, G.B., Kabuya, P.M., Nkaba, L., Neal, J., Hawker, L., & Wagener, T. (2022). Towards a framework of catchment classification for hydrologic predictions and water resources management in the ungauged basin of the Congo River. Congo Basin hydrology, climate, and biogeochemistry: a foundation for the future, 469–498. https://doi.org/10.1002/9781119657002.ch24
Shuch, H. P., & Bowyer, S. (2011). SERENDIP: The Berkeley SETI Program. Searching for Extraterrestrial Intelligence. SETI Past, Present, and Future, 99–105.
Topchy, A.P., Law, M.H., Jain, A.K., & Fred, A.L. (2004). Analysis of consensus partition in cluster ensemble. In Fourth IEEE International Conference on Data Mining (ICDM’04), 225–232.
Tsai, C.F. (2014). Combining cluster analysis with classifier ensembles to predict financial distress. Information Fusion, 16, 46–58. https://doi.org/10.1016/j.inffus.2011.12.001
Vega–Pons, S., & Ruiz–Shulcloper, J., (2009). Clustering Ensemble Method for Heterogeneous Partitions. CIARP’09, 481–488.
Bissarinov Baituma, & Begaidarova Aida. (2023). Optimizing Network Architecture for Big Data Transmission. Research Reviews, (4). https://ojs.scipub.de/index.php/RR/article/view/2508.
Bisarinov, B. J. (2020). Special features of the research of the big data technology. Scientific Journal-Bulletin of KazATC, 114(3), 253–258.
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Денис Редько, Альона Десятко

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.