INFORMATION RETRIEVAL AND DEANONYMIZATION IN THE TASKS OF EARLY DETECTION OF POTENTIAL ATTACKS ON CRITICAL INFRASTRUCTURE
DOI:
https://doi.org/10.28925/2663-4023.2024.26.694Keywords:
information retrieval, cybersecurity, deanonymization, vector space model, critical infrastructure, cybercriminal, TF-IDF algorithm, пошук інформації, кібербезпека, деанонімізація, модель векторного поля, критична інфраструктура, кіберзлочинність, tf-idf алгоритмAbstract
Information about cyberattacks that attackers plan to carry out against critical infrastructure facilities is partly distributed on malicious information сhannels, chats or sites. Investigation of information materials and their analysis can provide an understanding of the stages of attack planning and their prevention. Part of this problem is to provide information search and analysis tools to detect linguistic patterns, similarities in text data, which are capable of deanonymizing cybercriminals and establishing relationships between published data. This work proposes a new model and a corresponding prototype of the system, based on the vector space model and the TF-IDF algorithm. The system is designed to analyze publicly available text data (both internet and darknet), and differs with a probabilistic approach to analyzing the identifiers of the information publisher. The proposed system also focuses on identifying latent connections between anonymous accounts by analyzing unique stylistic and linguistic traits. It leverages these traits to trace patterns in communication, uncovering hidden associations among cybercriminal entities. Experiments conducted based on the analysis of real chats, including chats of cybercriminals, demonstrate the potential of the system for detecting identifiers and determining stylistic features. If a sufficiently complete set of data is available and a list of target words is available, it is possible to analyze the stages of preparing an attack, malicious individuals or groups involved in it. The results underline the significance of integrating advanced linguistic analysis techniques with probabilistic models to enhance investigative capabilities against evolving cyber threats.
Downloads
References
Nandan, A. B. (2021). Cybercrimes and Its Alarming Escalation during Recent Times: An International Legal Perspective. International Journal of Law Management & Humanities, 4(4), 2413.
Takey, Y. S., Tatikayala, S. G., Samavedam, S. S., Lakshmi Eswari, P. R., & Patil, M. U. (2021). Real Time early Multi Stage Attack Detection. IEEE Xplore. https://doi.org/10.1109/ICACCS51430.2021.9441956
Teendifferent. (2022). Information Gathering In Cyber Security: Definition, Types, Tools & Techniques. Medium. https://medium.com/@teendifferent/information-gathering-in-cyber-security-definition-types-tools-techniques-ae59cb394bf6
Chawki, M. (2010). Anonymity in cyberspace: finding the balance between privacy and security. International Journal of Technology Transfer and Commercialisation, 9(3), 183. https://doi.org/10.1504/ijttc.2010.030209
Chawki, M. & Wahab M. S. A. (2006). Identity Theft in Cyberspace: Issues and Solutions. Lex Electronica, 11(1).
Gröndahl, T., & Asokan, N. (2019). Text Analysis in Adversarial Settings. ACM Computing Surveys, 52(3), 1–36. https://doi.org/10.1145/3310331
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Rüdian, S., Pinkwart, N. & Liu, Z. (2018). I know who you are: Deanonymization using Facebook Likes. Workshops der INFORMATIK, 109–118.
Seorld. (2024). PHP Facebook-Crawler. Seorld.com. https://seorld.com/blog/social-media/facebook
Simioni, M., Gladyshev, P., Habibnia, B., & Nunes de Souza, P. R. (2021). Monitoring an anonymity network: Toward the deanonymization of hidden services. DFRWS APAC, 1–8
Boldyrikhin, N. V., Altunin, F. A., Svizhenko, A. A., Sosnovsky, I. A., & Yengibaryan, I. A. (2021). Deanonymization of users based on correlation analysis. Journal of Physics: Conference Series, 2131(2), 022083. https://doi.org/10.1088/1742-6596/2131/2/022083
Beato, F., Conti, M., & Preneel, B. (2013). Friend in the Middle (FiM): Tackling de-anonymization in social networks. IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops). https://doi.org/10.1109/percomw.2013.6529495
Peng W., Li F., Zou X., & Wu J. (2014). A Two-Stage Deanonymization Attack against Anonymized Social Networks. IEEE Transactions on Computers, 63(2), 290–303. https://doi.org/10.1109/tc.2012.202
Jiang, H., Yu, J., Cheng, X., Zhang, C., Gong, B., & Yu, H. (2022). Structure-Attribute-Based Social Network Deanonymization With Spectral Graph Partitioning. IEEE Transactions on Computational Social Systems, 9(3), 902–913. https://doi.org/10.1109/tcss.2021.3082901
Miller, M. (2022). TF-IDF: Is It A Google Ranking Factor? Search Engine Journal. https://www.searchenginejournal.com/ranking-factors/tf-idf/
Chalyi, O. (2023). Information Retrieval as A Way to Search for Common Features in The Text. XXIV International R&D Online Conference for Students and Emerging Researchers “Science and Technology of the XXI Century”, 1(57), 16–18.
Richardson, L. (2024). Beautiful Soup Documentation. Crummy.com. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Guido Van Rossum, & Drake, F. L. (2011). The Python language reference manual: for Python version 3.2. Network Theory Ltd.
Scikit, L. (2024). TfidfVectorizer — scikit-learn 0.20.3 documentation. Scikit-learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Bakustarver. (2024). GitHub. https://github.com/bakustarver/ukr-dictionaries-list-opensource
Bilodid, I. C. (2024). Dictionary of the Ukrainian language in 11 volumes. ukrlit.org. http://ukrlit.org/slovnyk/slovnyk_ukrainskoi_movy_v_11_tomakh
Brown-uk. (2024). GitHub. https://github.com/brown-uk/dict_uk/blob/master/data/dict/names-anim.lst
Raymond, E. (2024). the Jargon File. Netmeister. https://www.netmeister.org/news/jargon.html
Raymond, E. (2024). The Original Hacker’s Dictionary. Netmeister https://www.dourish.com/goodies/jargon.html
Chalyi, O. (2024). An Evaluation of General-Purpose AI Chatbots: A Comprehensive Comparative Analysis. InfoScience Trends, 1(1), 52–66. https://doi.org/10.61186/ist.202401.01.07
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Олексій Чалий, Ірина Стьопочкіна
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.