АНАЛІЗ МЕТОДІВ ПРОГНОЗУВАННЯ ВНУТРІШНІХ ЗАГРОЗ НА ОСНОВІ АНАЛІЗУ ДАНИХ СОЦІАЛЬНОЇ МЕРЕЖІ TWITTER

Tamara Radivilova; Ihor Dobrynin; Vadym Pantelieiev; Dmytro Fisenko; Artem Mazepa; Volodymyr Bilodid

doi:10.28925/2663-4023.2025.28.818

Authors

Tamara Radivilova Kharkiv National University of Radio Electronics https://orcid.org/0000-0001-5975-0269
Ihor Dobrynin Kharkiv National University of Radio Electronics https://orcid.org/0000-0001-8910-2609
Vadym Pantelieiev Kharkiv National University of Radio Electronics https://orcid.org/0009-0008-7824-8782
Dmytro Fisenko Kharkiv National University of Radio Electronics https://orcid.org/0009-0006-9966-5329
Artem Mazepa Kharkiv National University of Radio Electronics https://orcid.org/0009-0002-9977-5932
Volodymyr Bilodid van Kozhedub Kharkiv National Air Force University https://orcid.org/0009-0002-8976-2310

DOI:

https://doi.org/10.28925/2663-4023.2025.28.818

Keywords:

Natural Language Processing, Long Short-Term Memory, Random Forest, internal threats, forecast quality assessment, forecasting

Abstract

Internal threats are a significant problem for the effective functioning of organizations. Researchers usually identify internal threats by analyzing user behavior and traffic. Most existing works focus on methods for detecting internal threats rather than predicting them. In this work, the main focus was on methods for predicting internal threats using social network data analysis. This article analyzes the effectiveness of the following prediction methods: dictionary-based sentiment analysis, machine learning (Random Forest), recurrent neural networks (LSTM), transformers (BERT), a hybrid approach (NLP CBOW + graph neural networks), and a hybrid approach (NLP CBOW + LPA). To ensure the reliability of the results, the effectiveness of the forecasting methods was evaluated using several key metrics: Precision, Recall, F1-score, and ROC-AUC, training time, and inference time. For the analysis, experiments were conducted on the Sentiment140 dataset, which contains 1.6 million tweets labeled as positive or negative. Python programming language libraries were used to process data and build prediction models. 80% of the data was used to train the models, and 20% was used for prediction. The results of the analysis showed that the fastest method for predicting incidents is tone analysis based on dictionaries. It can be used to analyze changes in sentiment within an organization through online analysis of social networks. Each organization can set its own scale for responding to internal incidents. The analysis also found that hybrid approaches that comprehensively analyze both text content and the structure of social connections demonstrate the highest accuracy rates for predicting internal incidents. However, these approaches require significant computing resources, large amounts of training data, and more time for training, and have low interpretability.

Downloads

Download data is not yet available.

References

Casolaro, A., Capone, V., Iannuzzo, G., & Camastra, F. (2023). Deep learning for time series forecasting: Advances and open problems. Information, 14(11), 598.. https://doi.org/10.3390/info14110598

Hernández, R., Gutiérrez, I., & Castro, J. (2025). Social Network Analysis: A Novel Paradigm for Improving Community Detection. International Journal of Computational Intelligence Systems, 18(1), 87. https://doi.org/10.1007/s44196-025-00812-9

Rudchenko, D.V. (2017). Vyyavlennya spilʹnot ta yikh lideriv v sotsialʹnykh merezhakh dlya zabezpechennya bezpeky [Identification of communities and their leaders in social networks to ensure security], Information processing systems, 4(150), 128–131. http://dx.doi.org/10.30748/soi.2017.150.26

Lyudmyla, K., Vitalii, B., & Tamara, R. (2017, October). Fractal time series analysis of social network activities. In 2017 4th International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S&T). 456–459). IEEE. https://doi.org/10.1109/INFOCOMMST.2017.8246438

Huang, D., Song, J., & He, Y. (2024). Community detection algorithm for social network based on node intimacy and graph embedding model. Engineering Applications of Artificial Intelligence, 132, 107947. https://doi.org/10.1007/978-3-031-67195-1_79

Kirichenko, L., Radivilova, T., & Anders, C. (2017). Detecting cyber threats through social network analysis: short survey. SocioEconomic Challenges, 1(1), 20—34.

Kyriazos, T., & Poga, M. (2024). Application of machine learning models in social sciences: managing nonlinear relationships. Encyclopedia, 4(4), 1790–1805. https://doi.org/10.3390/encyclopedia4040118

Rawat, R., Mahor, V., Chirgaiya, S., & Rathore, A.S. (2021). Applications of Social Network Analysis to Managing the Investigation of Suspicious Activities in Social Media Platforms. In: Daimi, K., Peoples, C. (eds) Advances in Cybersecurity Management. Springer Cham, 315–335. https://doi.org/10.1007/978-3-030-71381-2_15

Li, W., & Law, K. L. E. (2024). Deep Learning Models for Time Series Forecasting: A Review. IEEE Access, 12, 92306–92327. https://doi.org/10.1109/ACCESS.2024.3422528

Kirichenko, L., Radivilova, T., Zinkevich, I. (2018). Comparative Analysis of Conversion Series Forecasting in E-commerce Tasks. In: Shakhovska N., Stepashko V. (eds) Advances in Intelligent Systems and Computing II. CSIT 2017. Advances in Intelligent Systems and Computing, 689, 230–242. https://doi.org/10.1007/978-3-319-70581-1_16

Mulesa, O., Povkhan, I., Radivilova, T., & Baranovskyi, O. (2021). Devising a method for constructing the optimal model of time series forecasting based on the principles of competition. Eastern-European Journal of Enterprise Technologies, 5 (4 (113)), 6–11. https://doi.org/10.15587/1729-4061.2021.240847

Mulesa, O., Batyuk, A., Geche, F., Melnyk, O., Palinchak, M., & Radivilova, T. (2021). Information technology for time series forecasting based on the evolutionary method of the forecasting scheme synthesis. In IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), 258–261. IEEE. https://doi.org/10.1109/CSIT52700.2021.9648639

Wan, A., Chang, Q., Khalil, A. B., & He J. (2023). Short-term power load forecasting for combined heat and power using CNN-LSTM enhanced by attention mechanism. Energy, 282, 128274. https://doi.org/10.1016/j.energy.2023.128274

Alotaibi, M. A. (2022). Machine Learning Approach for Short-Term Load Forecasting Using Deep Neural Network. Energies, 15(17), 6261. https://doi.org/10.3390/en15176261

Kirichenko, L., Radivilova, T., & Bulakh, V. (2019). Machine learning in classification time series with fractal properties. Data, 4(1(5)), 1-13. https://doi.org/10.3390/data4010005

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., & Sun, L. (2023). Transformers in time series: A survey. (2022). arXiv preprint arXiv:2202.07125.

Su, X., Xue, S., Liu, F., Wu, J., Yang, J., Zhou, C., Hu, W., Paris, C., Nepal, S., Jin, D., Sheng, Q., & Yu, P. (2024). A comprehensive survey on community detection with deep learning. IEEE Transactions on Neural Networks and Learning Systems, 35(4), 4682–4702. https://doi.org/10.1109/TNNLS.2021.3137396.

Zhang, Y., Zhou, X., Zhang, Y., Li, S., & Liu, S. (2025). Improving time series forecasting in frequency domain using a multi resolution dual branch mixer with noise insensitive ArcTanLoss. Scientific Reports, 15(1), 12557. https://doi.org/10.1038/s41598-025-95529-2

Cai, J., Hao, J., Yang, H., Yang, Y., Zhao, X., Xun, Y., & Zhang, D. (2024). A new community detection method for simplified networks by combining structure and attribute information. Expert Systems with Applications, 246(123103). https://doi.org/10.1016/j.eswa.2023.123103

Yuliansyah, H., Othman, Z. A., Bakar, A. A. (2020). Taxonomy of Link Prediction for Social Network Analysis: A Review. IEEE Access, 8, 183470–183487. https://doi.org/10.1109/ACCESS.2020.3029122

Wu, L., Chen, Y., Shen, K., Guo, X., Gao, H., Li, S., & Long, B. (2023). Graph neural networks for natural language processing: A survey. Foundations and Trends in Machine Learning, 16(2), 119–328. https://doi.org/10.48550/arXiv.2106.06090

22 Zhang, W., He, P., Qin, C., Yang, F., & Liu, Y. (2024). A graph attention network-based model for anomaly detection in multivariate time series. The Journal of Supercomputing, 80(6), 8529–8549. https://doi.org/10.1007/s11227-023-05772-5

Kirichenko, L., Radivilova, T., & Ryzhanov, V.(2022). Applying Visibility Graphs to Classify Time Series. In: Babichev S., Lytvynenko V. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2021. Lecture Notes on Data Engineering and Communications Technologies, 77, 397–409. https://doi.org/10.1007/978-3-030-82014-5_26

St-Aubin, P., & Agard, B. (2022). Precision and Reliability of Forecasts Performance Metrics. Forecasting, 4(4), 882–903. https://doi.org/10.3390/forecast4040048

AUC ROC Curve in Machine Learning. 2025. URL: https://www.geeksforgeeks.org/auc-roc-curve/

Shih, S. Y., Sun, F. K., Lee, H. Y. (2019). “Temporal pattern attention for multivariate time series forecasting”. Machine Learning, 108, 1421–1441. https://doi.org/10.1007/s10994-019-05815-0

Sentiment140 dataset with 1.6 million tweets. 2018. URL: https://www.kaggle.com/datasets/kazanova/sentiment140/data

ANALYSIS OF METHODS FOR PREDICTING INTERNAL THREATS BASED ON THE ANALYSIS OF DATA FROM THE SOCIAL NETWORK TWITTER

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

index

Language

Make a Submission

counter

Information

Developed By

Current Issue