COMPARATIVE ANALYSIS OF METHODS, TECHNOLOGIES, SERVICES, AND PLATFORMS FOR SPEECH RECOGNITION IN INFORMATION SECURITY SYSTEMS

Authors

DOI:

https://doi.org/10.28925/2663-4023.2024.25.468486

Keywords:

Natural Language Processing; audio data; speech recognition; authentication; deep learning; machine learning; text processing; cybersecurity; information security.

Abstract

The article provides a comprehensive comparative analysis of methods, technologies, and modern approaches to the use of speech recognition and natural language processing (NLP) technologies in the context of national security and information security. The key aspects of the use of technologies for monitoring communications, detecting suspicious activity and application in the field of intelligence and counterintelligence, the role in ensuring cybersecurity, the possibilities of biometric identification by voice, ethical and legal aspects, and technological challenges are considered. The problem statement focuses on the challenges associated with the widespread adoption of speech recognition and NLP technologies, in particular, the lack of accuracy of algorithms, which creates risks to the reliability of security systems. The author also emphasizes the importance of addressing ethical and legal issues related to the privacy of citizens and the possible misuse of technologies for mass surveillance. The paper provides examples of systems for cybersecurity purposes, such as mass listening and analysis systems, targeted monitoring systems, social media analysis platforms, biometric identification systems, and others. The results section of the study presents a high-level structure of threat protection systems that covers threat channels and levels of protection. The complexity of modern threats that can integrate into several channels simultaneously, in particular using voice information, is considered. The author details the place and role of voice information in the structure of threat protection, emphasizing the importance of integrating various systems and platforms to ensure comprehensive security. Two approaches to building a security system that works with voice information are considered: aggregation of the maximum possible information from existing systems and creation of a system for each specific problem. A comparative analysis of these approaches is carried out, their advantages and disadvantages are identified, and the limitations and risks of using voice recognition methods are described, including the reliability and accuracy of technologies, the availability of data for training models, the cost of implementation, issues of confidentiality and privacy, data security, use in military and intelligence activities, ethical issues, and the risks of voice fraud and artificial voices.

Downloads

Download data is not yet available.

References

Dasgupta, S., Piplai, A., Kotal, A., & Joshi, A. (2020). A Comparative Study of Deep Learning based Named Entity Recognition Algorithms for Cybersecurity. In 2020 IEEE International Conference on Big Data, 2596–2604. https://doi.org/10.1109/BigData50022.2020.9378482.

Romanovskyi, O., et al. (2021). Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition. In Lecture Notes on Data Engineering and Communications Technologies (pp. 25–36). Springer International Publishing. https://doi.org/10.1007/978-3-030-80472-5_3

Tan, H., et al. (2022). Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey. Electronics. https://doi.org/10.3390/electronics11142183

Iosifova, O., Iosifov, I., Rolik, O., & Sokolov, V. (2020). Techniques Comparison for Natural Language Processing. In Proceedings of the 2nd International Workshop on Modern Machine Learning Technologies and Data Science (No. I, vol. 2631, pp. 57–67).

Iosifov, I. Iosifova, O., Sokolov, V., Skladannyi, P., & Sukaylo, I. (2021). Natural Language Technology to Ensure the Safety of Speech Information. In Proceedings of the Workshop on Cybersecurity Providing in Information and Telecommunication Systems II (Vol. 3187, no. 1, pp. 216–226).

Iosifov, I., Iosifova, O., & Sokolov, V. (2020). Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches. In 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PICST) (Vol. 1, pp. 335–337). IEEE. https://doi.org/10.1109/picst51311.2020.9468084

Iosifova, O., Iosifov, I., Sokolov, V., Romanovskyi, O., & Sukaylo, I. (2021). Analysis of Automatic Speech Recognition Methods. In Proceedings of the Workshop on Cybersecurity Providing in Information and Telecommunication Systems (Vol. 2923, pp. 252–257).

Romanovskyi, O., et al. (2022). Prototyping Methodology of End-to-End Speech Analytics Software. In Proceedings of the 4th International Workshop on Modern Machine Learning Technologies and Data Science (Vol. 3312, pp. 76–86).

Mahdavifar, S., & Ghorbani, A. (2019). Application of Deep Learning to Cybersecuri-ty: A Survey. Neurocomputing, 347, 149–176. https://doi.org/10.1016/j.neucom.2019.02.056

Sedkowski, W., & Bierczyński, K. (2022). Perceived Severity of Vulnerability in Cybersecurity: Cross Linguistic Variegation. In 2022 IEEE International Carnahan Conference on Security Technology (pp. 1–4). https://doi.org/10.1109/iccst52959.2022.9896488

Mounnan, O., Manad, O., Boubchir, L., Mouatasim, A., & Daachi, B. (2022). Deep Learning-Based Speech Recognition System using Blockchain for Biometric Access Control. In 2022 9th International Conference on Software Defined Systems (SDS) (pp. 1–2). https://doi.org/10.1109/SDS57574.2022.10062921

Chen, Y., et al. (2021). SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems. ACM Transactions on Privacy and Security, 25, 1–31. https://doi.org/10.1145/3510582

Poulter, C. (2020). Voice Recognition Software—Nuance Dragon Naturally Speaking. Occupational Medicine, 70(1), 75–76. https://doi.org/10.1093/occmed/kqz128

Wang, H. H. (2021). Speech Recorder and Translator using Google Cloud Speech-to-Text and Translation. Journal of IT in Asia, 9(1), 11–28. https://doi.org/10.33736/jita.2815.2021

The Cloud and Microsoft Azure Fundamentals. (2019). Microsoft Azure Infrastructure Services for Architects, Portico, 1–46.. https://doi.org/10.1002/9781119596608.ch1

Chen, L., et al. (2018). IBM Watson: Cognitive Computing in Healthcare and Beyon, AI Magazine [dataset]. In CRAN: Contributed Packages. The R Foundation. https://doi.org/10.32614/cran.package.aws.transcribe

Pickering, J. (2024). Cosegmentation in the IBM Text-to-Speech System. Speech and Hearing. https://doi.org/10.25144/22372

Povey, D., et al. (2011). The Kaldi Speech Recognition Toolkit. In IEEE Workshop on Automatic Speech Recognition and Understanding.

Hannun, A., et al. (2014). Deep Speech: Scaling up end-to-end speech recognition (Version 2). arXiv. https://doi.org/10.48550/arXiv.1412.5567

Lee, A., Kawahara, T. (2009). Recent Development of Open-Source Speech Recognition Engine Julius. In Asia-Pacific Signal and Information Processing Association, Annual Summit and Conference (pp. 131–137).

Huggins-Daines, D., et al. (2006). Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices. In 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings (Vol. 1, pp. I-185–I-188). IEEE. https://doi.org/10.1109/icassp.2006.1659988

Recognition of Citizens’ Voice with Social Media. (2019). https://doi.org/10.4135/9781526486882

Agnitio Launches Voice Authentication for Android. (2012). Biometric Technology Today, 2012(5), 12. https://doi.org/10.1016/s0969-4765(12)70094-2

Beyond the Standard Model of Verbal Probing. (2005). Cognitive Interviewing, 87–101. https://doi.org/10.4135/9781412983655.n6

Kulke, L., Feyerabend, D., & Schacht, A. (2020). A Comparison of the Affectiva iMotions Facial Expression Analysis Software with EMG for Identifying Facial Expressions of Emotion. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.00329

Vocapia Research SAS. (2024). VoxSigma Speech to Text Software Suite. https://www.vocapia.com/voxsigma-speech-totext.html

Ash, T., Francis, R., & Williams, W. (2018). The Speechmatics Parallel Corpus Filtering System for WMT18. In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers (pp. 853–859). https://doi.org/10.18653/v1/w18-6472

Iosifov, I., Iosifova, O., Romanovskyi, O., Sokolov, V., & Sukailo, I. (2022). Transferability Evaluation of Speech Emotion Recognition Between Different Languages. In Lecture Notes on Data Engineering and Communications Technologies (pp. 413–426). Springer International Publishing. https://doi.org/10.1007/978-3-031-04812-8_35

Downloads


Abstract views: 121

Published

2024-09-25

How to Cite

Ievgen, I., & Sokolov, V. (2024). COMPARATIVE ANALYSIS OF METHODS, TECHNOLOGIES, SERVICES, AND PLATFORMS FOR SPEECH RECOGNITION IN INFORMATION SECURITY SYSTEMS. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 1(25), 468–486. https://doi.org/10.28925/2663-4023.2024.25.468486

Most read articles by the same author(s)

1 2 > >>