АНСАМБЛЕВА КЛАСТЕРИЗАЦІЯ МЕРЕЖЕВОГО ТРАФІКУ НА ОСНОВІ КОНСЕНСУСНОГО ПІДХОДУ

Denys Redko; Alona Desiatko; Baituma Bissarinov

doi:10.28925/2663-4023.2026.33.1122

Authors

Denys Redko State University of Trade and Economics https://orcid.org/0009-0003-5827-264X
Alona Desiatko State University of Trade and Economics https://orcid.org/0000-0002-2284-3418
Baituma Bissarinov Al-Farabi Azadli National University, Almaty University of Energy https://orcid.org/0000-0002-2218-0749

DOI:

https://doi.org/10.28925/2663-4023.2026.33.1122

Keywords:

network traffic, big data, data analysis methods, clustering, collective decisions, ensemble models.

Abstract

In the context of the rapid growth of network traffic volumes and the complexity of corporate information systems (IS), the development of effective methods for data analysis and clustering is of particular relevance. Modern approaches to network traffic processing require a comprehensive consideration of numerous parameters and characteristics, which necessitates the improvement of existing cluster analysis methods. The paper proposes an improved approach to ensemble clustering of network traffic and a method for constructing a consistent similarity matrix for integrating the results of different clustering algorithms based on exponential dependence in order to enhance the differences in the weights of the algorithms, which significantly increases the accuracy of the final clustering. The software system implemented in the Python language as part of the study combines several clustering methods, which allows achieving significantly greater stability of the results through the use of a consensus approach, the effectiveness of which has been confirmed by the results of large-scale computational experiments. During the research and computational experiments, it was demonstrated that the key advantage of the developed approach is increased resistance to outliers compared to traditional methods (KMeans, DBSCAN) used for cluster analysis of the company's network traffic, as well as more balanced clustering for complex multidimensional data. A dependence for calculating algorithm weights using the exponential function is also proposed, which allows for a comprehensive approach to integrating the results of different clustering methods. The developed software solution significantly expands the methods of network traffic analysis and provides an effective practical toolkit for increasing the productivity of corporate information systems. The proposed approach can be successfully adapted to solve a wide range of data analysis problems that require processing large volumes of multidimensional information.

Downloads

Download data is not yet available.

Author Biographies

Denys Redko, State University of Trade and Economics

Postgraduate student of the Department of Software Engineering and Cybersecurity

Alona Desiatko, State University of Trade and Economics

Doctor of Philosophy in Computer Science, Associate Professor of the Department of Software Engineering and Cybersecurity

Baituma Bissarinov, Al-Farabi Azadli National University, Almaty University of Energy

Doctor of Philosophy in Computer Science, Department of Information Systems

References

Takyi, K., Bagga, A., & Goopta, P. (2018, August). Clustering techniques for traffic classification: A comprehensive review. In Proceedings of the 7th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO 2018) (pp. 224-230). IEEE. https://doi.org/10.1109/ICRITO.2018.8748772

Li, J., Zhang, H., Tang, D., & Lin, C. (2021, September). Traffic classification using cluster analysis. In Proceedings of the International Conference on Computer Information Science and Artificial Intelligence (CISAI 2021) (pp. 463-467). IEEE. https://doi.org/10.1109/CISAI54367.2021.00094

Rodríguez-Rodríguez, J. E., García, V. H. M., & Usaquén, M. A. O. (2018). Corporate networks traffic analysis for knowledge management based on random interactions clustering algorithm. In Knowledge management in organizations (KMO 2018) (pp. 523-536). Springer. https://doi.org/10.1007/978-3-319-95204-8_44

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281-297).

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129-137. https://doi.org/10.1109/TIT.1982.1056489

Cheeseman, P. C., & Stutz, J. C. (1996). Bayesian classification (AutoClass): Theory and results. In Advances in knowledge discovery and data mining (pp. 153-180).

Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), 141-182. https://doi.org/10.1023/A:1009783824328

Guha, S., Rastogi, R., & Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. ACM SIGMOD Record, 27(2), 73-84. https://doi.org/10.1145/276305.276312

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD 1996) (pp. 226-231).

Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD Record, 28(2), 49-60. https://doi.org/10.1145/304181.304187

Subramani, K., Velkov, A., Ntoutsi, I., Kröger, P., & Kriegel, H.-P. (2011, December). Density-based community detection in social networks. In Proceedings of the IEEE International Conference on Internet Multimedia Systems Architecture and Application (IMSAA 2011) (pp. 1-8). IEEE. https://doi.org/10.1109/IMSAA.2011.6156334

Zander, S., Nguyen, T., & Armitage, G. (2005, November). Automated traffic classification and application identification using machine learning. In Proceedings of the IEEE Conference on Local Computer Networks (LCN 2005) (pp. 250-257). IEEE. https://doi.org/10.1109/LCN.2005.35

McGregor, A., Hall, M., Lorier, P., & Brunskill, J. (2004). Flow clustering using machine learning techniques. In Passive and active network measurement (PAM 2004) (pp. 205-214). Springer. https://doi.org/10.1007/978-3-540-24668-8_21

Erman, J., Mahanti, A., Arlitt, M., Cohen, I., & Williamson, C. (2007). Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation, 64(9-12), 1194-1213. https://doi.org/10.1016/j.peva.2007.06.014

Wang, Y., Xiang, Y., Zhang, J., Zhou, W., Wei, G., & Yang, L. T. (2013). Internet traffic classification using constrained clustering. IEEE Transactions on Parallel and Distributed Systems, 25(11), 2932-2943. https://doi.org/10.1109/TPDS.2013.307

Wang, P., Lin, S. C., & Luo, M. (2016, June). A framework for QoS-aware traffic classification using semi-supervised machine learning in SDNs. In Proceedings of the IEEE International Conference on Services Computing (SCC 2016) (pp. 760-765). IEEE. https://doi.org/10.1109/SCC.2016.105

ENSEMBLE CLUSTERING OF NETWORK TRAFFIC BASED ON A CONSENSUS APPROACH

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Denys Redko, State University of Trade and Economics

Alona Desiatko, State University of Trade and Economics

Baituma Bissarinov, Al-Farabi Azadli National University, Almaty University of Energy

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

index

Language

Make a Submission

counter

Information

Developed By

Current Issue