ENSEMBLE CLUSTERING OF NETWORK TRAFFIC BASED ON A CONSENSUS APPROACH
DOI:
https://doi.org/10.28925/2663-4023.2026.33.1122Keywords:
network traffic, big data, data analysis methods, clustering, collective decisions, ensemble models.Abstract
In the context of the rapid growth of network traffic volumes and the complexity of corporate information systems (IS), the development of effective methods for data analysis and clustering is of particular relevance. Modern approaches to network traffic processing require a comprehensive consideration of numerous parameters and characteristics, which necessitates the improvement of existing cluster analysis methods. The paper proposes an improved approach to ensemble clustering of network traffic and a method for constructing a consistent similarity matrix for integrating the results of different clustering algorithms based on exponential dependence in order to enhance the differences in the weights of the algorithms, which significantly increases the accuracy of the final clustering. The software system implemented in the Python language as part of the study combines several clustering methods, which allows achieving significantly greater stability of the results through the use of a consensus approach, the effectiveness of which has been confirmed by the results of large-scale computational experiments. During the research and computational experiments, it was demonstrated that the key advantage of the developed approach is increased resistance to outliers compared to traditional methods (KMeans, DBSCAN) used for cluster analysis of the company's network traffic, as well as more balanced clustering for complex multidimensional data. A dependence for calculating algorithm weights using the exponential function is also proposed, which allows for a comprehensive approach to integrating the results of different clustering methods. The developed software solution significantly expands the methods of network traffic analysis and provides an effective practical toolkit for increasing the productivity of corporate information systems. The proposed approach can be successfully adapted to solve a wide range of data analysis problems that require processing large volumes of multidimensional information.
Downloads
References
Takyi, K., Bagga, A., & Goopta, P. (2018, August). Clustering techniques for traffic classification: A comprehensive review. In Proceedings of the 7th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO 2018) (pp. 224-230). IEEE. https://doi.org/10.1109/ICRITO.2018.8748772
Li, J., Zhang, H., Tang, D., & Lin, C. (2021, September). Traffic classification using cluster analysis. In Proceedings of the International Conference on Computer Information Science and Artificial Intelligence (CISAI 2021) (pp. 463-467). IEEE. https://doi.org/10.1109/CISAI54367.2021.00094
Rodríguez-Rodríguez, J. E., García, V. H. M., & Usaquén, M. A. O. (2018). Corporate networks traffic analysis for knowledge management based on random interactions clustering algorithm. In Knowledge management in organizations (KMO 2018) (pp. 523-536). Springer. https://doi.org/10.1007/978-3-319-95204-8_44
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281-297).
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129-137. https://doi.org/10.1109/TIT.1982.1056489
Cheeseman, P. C., & Stutz, J. C. (1996). Bayesian classification (AutoClass): Theory and results. In Advances in knowledge discovery and data mining (pp. 153-180).
Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), 141-182. https://doi.org/10.1023/A:1009783824328
Guha, S., Rastogi, R., & Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. ACM SIGMOD Record, 27(2), 73-84. https://doi.org/10.1145/276305.276312
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD 1996) (pp. 226-231).
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD Record, 28(2), 49-60. https://doi.org/10.1145/304181.304187
Subramani, K., Velkov, A., Ntoutsi, I., Kröger, P., & Kriegel, H.-P. (2011, December). Density-based community detection in social networks. In Proceedings of the IEEE International Conference on Internet Multimedia Systems Architecture and Application (IMSAA 2011) (pp. 1-8). IEEE. https://doi.org/10.1109/IMSAA.2011.6156334
Zander, S., Nguyen, T., & Armitage, G. (2005, November). Automated traffic classification and application identification using machine learning. In Proceedings of the IEEE Conference on Local Computer Networks (LCN 2005) (pp. 250-257). IEEE. https://doi.org/10.1109/LCN.2005.35
McGregor, A., Hall, M., Lorier, P., & Brunskill, J. (2004). Flow clustering using machine learning techniques. In Passive and active network measurement (PAM 2004) (pp. 205-214). Springer. https://doi.org/10.1007/978-3-540-24668-8_21
Erman, J., Mahanti, A., Arlitt, M., Cohen, I., & Williamson, C. (2007). Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation, 64(9-12), 1194-1213. https://doi.org/10.1016/j.peva.2007.06.014
Wang, Y., Xiang, Y., Zhang, J., Zhou, W., Wei, G., & Yang, L. T. (2013). Internet traffic classification using constrained clustering. IEEE Transactions on Parallel and Distributed Systems, 25(11), 2932-2943. https://doi.org/10.1109/TPDS.2013.307
Wang, P., Lin, S. C., & Luo, M. (2016, June). A framework for QoS-aware traffic classification using semi-supervised machine learning in SDNs. In Proceedings of the IEEE International Conference on Services Computing (SCC 2016) (pp. 760-765). IEEE. https://doi.org/10.1109/SCC.2016.105
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Денис Редько, Альона Десятко, Бісарінов Байтума

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.