EXTRACTION OF CYBERSECURITY OBJECTS FROM ARRAYS OF ELECTRONIC TEXT DOCUMENTS ON THE INTERNET AND SOCIAL NETWORKS
DOI:
https://doi.org/10.28925/2663-4023.2024.26.663Keywords:
cyberwar, cybersecurity, Internet, open electronic sources, social networks, text analysis, cybersecurity objectsAbstract
The modern world is characterized by the rapid development of information technology (IT) and global interaction in cyberspace. This progress, despite its benefits, has also led to the emergence of new threats and challenges in the field of cybersecurity. Cyberwarfare, which has become a real problem for states, organizations and individual users, requires the development of effective methods for detecting and analyzing cybersecurity targets. One of the key aspects in the fight against cyber threats is the ability to extract factual data about cybersecurity objects from large amounts of textual information. Traditional text analysis methods have their limitations, especially when working with large and complex text data. In this regard, the use of modern IT, which allows processing and analyzing textual information with high accuracy and efficiency, becomes relevant. The article presents methods for extracting cybersecurity objects from electronic text documents using regular expressions and detecting cybersecurity objects based on the analysis of arrays of Cyrillic texts. The first methodology detects factual data from text documents using regular expressions, which allows for the accurate identification of geographic names, company names, and other important concepts. The second method is designed to analyze Cyrillic texts to recognize named cybersecurity entities, which simplifies the extraction procedure and increases the accuracy of the result. Each methodology complements each other, creating an overall integrated system that more effectively solves the task of extracting and analyzing cybersecurity objects compared to currently available solutions. The algorithms of the proposed methods are described, the practical implementation of which allows processing and analysing textual information with high accuracy and efficiency, which is an important step in the development of information technology for computer intelligence from open electronic sources and social networks.
Downloads
References
Yi, F., Jiang, B., Wang, L., & Wu J. (2020). Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning. IEEE Access, 8, 63214–63224. https://doi.org/10.1109/ACCESS.2020.2984582
Halbouni, A., Gunawan, T. S., Habaebi, M. H., Halbouni, M., Kartiwi, M. & Ahmad, R. (2022). Machine Learning and Deep Learning Approaches for CyberSecurity: A Review. IEEE Access, 10, 19572–19585. https://doi.org/10.1109/ACCESS.2022.3151248
Subach, I., Gerasimov, B., & Sergeev, O. (2006) Extraction of informative phrases from primary electronic documents in information retrieval systems. USiM, 1, 26–29.
Bayer, M., Kuehn, P., Shanehsaz, R., Reuter, C. (2024). CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain. ACM Transactions on Privacy and Security, 27(2(18)), 1–20. https://doi.org/10.1145/3652594
Hassanin, M., & Moustafa, N. (2024). A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions. arXiv preprint arXiv:2405.14487.
Gao, C., et al. (2021). A review on cyber security named entity recognition. Front. Inform. Technol. Electron. Eng. 22, 1153–1168.
Hanks, C., Maiden, M., Ranade, P., Finin, T., & Joshi, A. (2022). Recognizing and extracting cybersecurity entities from text. In: Workshop on Machine Learning for Cybersecurity, International Conference on Machine Learning.
Alam, Md T., Bhusal, D., Park, Y., Rastogi, N. (2022). CyNER: A Python Library for Cybersecurity Named Entity Recognition. arXiv preprint arXiv:2204.05754. https://doi.org/10.48550/arXiv.2204.05754
Ghasiya, P., & Okamura K. (2021). Investigating Cybersecurity News Articles by Applying Topic Modeling Method. International Conference on Information Networking (ICOIN), 432–438. https://doi.org/10.1109/ICOIN50884.2021.9333952
Lande, D., Puchkov, O., & Subach, I. (2022). Method of Detecting Cybersecurity Objects Based on OSINT Technology. In: Selected Papers of the XXII International Scientific and Practical Conference “Information Technologies and Security“ (ITS 2022), vol. 3503, 115–124.
Lande, D. V., Subach, I. Y., & Sobolev, A. M. (2019). Computer program for content monitoring of social networks on cybersecurity issues (Certificate of copyright registration for work No. 92744) CyberAggregator.
Lande, D. V., Subach, I. Y., & Sobolev A. M. (2021). Computer program (mapping service) for storing, issuing and researching geoinformation (Certificate of copyright registration for work No. 105772) GeoAggregator.
Hulak, H. M., Zhiltsov, O. B., Kyrychok, R. V., Korshun, N. V., & Skladannyi, P. M. (2024). Information and cyber security of the enterprise. Textbook. Lviv: Publisher Marchenko T. V.
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Олександр Пучков, Дмитро Ланде, Ігор Субач
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.