INFORMATION RETRIEVAL AND DEANONYMIZATION IN THE TASKS OF EARLY DETECTION OF POTENTIAL ATTACKS ON CRITICAL INFRASTRUCTURE

Authors

DOI:

https://doi.org/10.28925/2663-4023.2024.26.694

Keywords:

information retrieval, cybersecurity, deanonymization, vector space model, critical infrastructure, cybercriminal, TF-IDF algorithm, пошук інформації, кібербезпека, деанонімізація, модель векторного поля, критична інфраструктура, кіберзлочинність, tf-idf алгоритм

Abstract

Information about cyberattacks that attackers plan to carry out against critical infrastructure facilities is partly distributed on malicious information сhannels, chats or sites. Investigation of information materials and their analysis can provide an understanding of the stages of attack planning and their prevention. Part of this problem is to provide information search and analysis tools to detect linguistic patterns, similarities in text data, which are capable of deanonymizing cybercriminals and establishing relationships between published data. This work proposes a new model and a corresponding prototype of the system, based on the vector space model and the TF-IDF algorithm. The system is designed to analyze publicly available text data (both internet and darknet), and differs with a probabilistic approach to analyzing the identifiers of the information publisher. The proposed system also focuses on identifying latent connections between anonymous accounts by analyzing unique stylistic and linguistic traits. It leverages these traits to trace patterns in communication, uncovering hidden associations among cybercriminal entities. Experiments conducted based on the analysis of real chats, including chats of cybercriminals, demonstrate the potential of the system for detecting identifiers and determining stylistic features. If a sufficiently complete set of data is available and a list of target words is available, it is possible to analyze the stages of preparing an attack, malicious individuals or groups involved in it. The results underline the significance of integrating advanced linguistic analysis techniques with probabilistic models to enhance investigative capabilities against evolving cyber threats.

Downloads

Download data is not yet available.

References

Nandan, A. B. (2021). Cybercrimes and Its Alarming Escalation during Recent Times: An International Legal Perspective. International Journal of Law Management & Humanities, 4(4), 2413.

Takey, Y. S., Tatikayala, S. G., Samavedam, S. S., Lakshmi Eswari, P. R., & Patil, M. U. (2021). Real Time early Multi Stage Attack Detection. IEEE Xplore. https://doi.org/10.1109/ICACCS51430.2021.9441956

Teendifferent. (2022). Information Gathering In Cyber Security: Definition, Types, Tools & Techniques. Medium. https://medium.com/@teendifferent/information-gathering-in-cyber-security-definition-types-tools-techniques-ae59cb394bf6

Chawki, M. (2010). Anonymity in cyberspace: finding the balance between privacy and security. International Journal of Technology Transfer and Commercialisation, 9(3), 183. https://doi.org/10.1504/ijttc.2010.030209

Chawki, M. & Wahab M. S. A. (2006). Identity Theft in Cyberspace: Issues and Solutions. Lex Electronica, 11(1).

Gröndahl, T., & Asokan, N. (2019). Text Analysis in Adversarial Settings. ACM Computing Surveys, 52(3), 1–36. https://doi.org/10.1145/3310331

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.

Rüdian, S., Pinkwart, N. & Liu, Z. (2018). I know who you are: Deanonymization using Facebook Likes. Workshops der INFORMATIK, 109–118.

Seorld. (2024). PHP Facebook-Crawler. Seorld.com. https://seorld.com/blog/social-media/facebook

Simioni, M., Gladyshev, P., Habibnia, B., & Nunes de Souza, P. R. (2021). Monitoring an anonymity network: Toward the deanonymization of hidden services. DFRWS APAC, 1–8

Boldyrikhin, N. V., Altunin, F. A., Svizhenko, A. A., Sosnovsky, I. A., & Yengibaryan, I. A. (2021). Deanonymization of users based on correlation analysis. Journal of Physics: Conference Series, 2131(2), 022083. https://doi.org/10.1088/1742-6596/2131/2/022083

Beato, F., Conti, M., & Preneel, B. (2013). Friend in the Middle (FiM): Tackling de-anonymization in social networks. IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops). https://doi.org/10.1109/percomw.2013.6529495

Peng W., Li F., Zou X., & Wu J. (2014). A Two-Stage Deanonymization Attack against Anonymized Social Networks. IEEE Transactions on Computers, 63(2), 290–303. https://doi.org/10.1109/tc.2012.202

Jiang, H., Yu, J., Cheng, X., Zhang, C., Gong, B., & Yu, H. (2022). Structure-Attribute-Based Social Network Deanonymization With Spectral Graph Partitioning. IEEE Transactions on Computational Social Systems, 9(3), 902–913. https://doi.org/10.1109/tcss.2021.3082901

Miller, M. (2022). TF-IDF: Is It A Google Ranking Factor? Search Engine Journal. https://www.searchenginejournal.com/ranking-factors/tf-idf/

Chalyi, O. (2023). Information Retrieval as A Way to Search for Common Features in The Text. XXIV International R&D Online Conference for Students and Emerging Researchers “Science and Technology of the XXI Century”, 1(57), 16–18.

Richardson, L. (2024). Beautiful Soup Documentation. Crummy.com. https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Guido Van Rossum, & Drake, F. L. (2011). The Python language reference manual: for Python version 3.2. Network Theory Ltd.

Scikit, L. (2024). TfidfVectorizer — scikit-learn 0.20.3 documentation. Scikit-learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Bakustarver. (2024). GitHub. https://github.com/bakustarver/ukr-dictionaries-list-opensource

Bilodid, I. C. (2024). Dictionary of the Ukrainian language in 11 volumes. ukrlit.org. http://ukrlit.org/slovnyk/slovnyk_ukrainskoi_movy_v_11_tomakh

Brown-uk. (2024). GitHub. https://github.com/brown-uk/dict_uk/blob/master/data/dict/names-anim.lst

Raymond, E. (2024). the Jargon File. Netmeister. https://www.netmeister.org/news/jargon.html

Raymond, E. (2024). The Original Hacker’s Dictionary. Netmeister https://www.dourish.com/goodies/jargon.html

Chalyi, O. (2024). An Evaluation of General-Purpose AI Chatbots: A Comprehensive Comparative Analysis. InfoScience Trends, 1(1), 52–66. https://doi.org/10.61186/ist.202401.01.07

Downloads


Abstract views: 1

Published

2024-12-19

How to Cite

Chalyi, O., & Stopochkina, I. (2024). INFORMATION RETRIEVAL AND DEANONYMIZATION IN THE TASKS OF EARLY DETECTION OF POTENTIAL ATTACKS ON CRITICAL INFRASTRUCTURE. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 2(26), 305–322. https://doi.org/10.28925/2663-4023.2024.26.694