COMPARISON OF DIGITAL SIGNAL PROCESSING METHODS AND DEEP LEARNING MODELS IN VOICE AUTHENTICATION

Authors

DOI:

https://doi.org/10.28925/2663-4023.2024.25.140160

Keywords:

біометричні технології; голосова автентифікація; цифрова обробка сигналів; мел-частотні кепстральні коефіцієнти; кодування з лінійним предиктором; глибинне навчання; нейронні мережі.

Abstract

This paper addresses the issues of traditional authentication methods, such as the use of passwords, which often prove to be unreliable due to various vulnerabilities. The main drawbacks of these methods include the loss or theft of passwords, their weak resistance to various types of attacks, and the complexity of password management, especially in large systems. Biometric authentication methods, particularly those based on physical characteristics such as voice, present a promising alternative as they offer a higher level of security and user convenience. Biometric authentication systems have advantages over traditional methods because the voice is a unique characteristic for each person, making it substantially more challenging to forge or steal. However, there are challenges regarding the accuracy and reliability of such systems. Specifically, voice biometric systems can encounter issues related to changes in voice due to health, emotional state, or the surrounding environment. The primary objective of this paper is to compare contemporary deep learning models with traditional digital signal processing methods used for speaker recognition. For this study, text-dependent methods (Mel-Frequency Cepstral Coefficients — MFCC, Linear Predictive Coding — LPC) and text-independent methods (ECAPA-TDNN - Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network, ResNet - Residual Neural Network) were selected to compare their effectiveness in voice biometric authentication tasks. The experiment involved implementing biometric authentication systems based on each of the described methods and evaluating their performance on a specially collected dataset. Additionally, the paper provides a detailed examination of audio signal preprocessing methods used in voice authentication systems to ensure optimal performance in speaker recognition tasks, including noise reduction using spectral subtraction, energy normalization, enhancement filtering, framing, and windowing.

Downloads

Download data is not yet available.

References

Samuel, F. A., Titilayo, A. O., Abiodun, A. O., Modupe, A. O., Oyeladun, M. B., Mayowa, I. R., & Samuel, A. M. (2021). Voice recognition system for door access control using mobile phone. International Journal of Science and Engineering Applications, 10(9), 132–139. https://doi.org/10.7753/ijsea1009.1004

Amjad Hassan Khan, M. K., & Aithal, P. S. (2022). Voice Biometric Systems for User Identification and Authentication – A Literature Review. International Journal of AppliedEngineering and Management Letters (IJAEML), 6(1), 198–209. https://doi.org/10.5281/zenodo.6471040

Abe, B. C., Araromi, H. O., Shokenu, E. S., Idowu, P. O., Babatunde, J. D., Adeagbo, M. A., & Oluwole, I. H. (2022). Biometric Access Control Using Voice and Fingerprint. Engineering And Technology Journal, 7(7), 1376–1382. https://doi.org/10.47191/etj/v7i7.08

Chen, X., Li, Z., Setlur, S., & Xu, W. (2022). Exploring racial and gender disparities in voice biometrics. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-06673-y

Inamdar, F. M., Ambesange, S., Mane, R., Hussain, H., Wagh, S., & Lakhe, P. (2023). Voice Cloning Using Artificial Intelligence and Machine Learning: A review. Journal of Advanced Zoology, 44(S7), 419–427. https://doi.org/10.17762/jaz.v44is7.2721

Dalvi, J., et al. (2022). A survey on face recognition systems. arXiv preprint.

Win, K., Li, K., Chen, J., Viger, P. (2020). Fingerprint classification and identification algorithms for criminal investigation: A survey. Future Generation Computer Systems, 110, 758–771. https://doi.org/10.1016/j.future.2019.10.019

Daugman, J. (2002). How iris recognition works. Proceedings International Conference on Image Processing. https://doi.org/10.1109/ICIP.2002.1037952

Poddar, A., Sahidullah, Md., Saha, G. (2017). Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics. 7(2), 91–101. https://doi.org/10.1049/iet-bmt.2017.0065. ISSN 2047-4938

Childers, D. G., Hand, M., Larar-Silent, M. J. (1989). Voiced/Unvoiced/Mixed Excitation (Four Way), Classification of Speech. IEEE Trans. On ASSP, 37(11).

Upadhyay, N., & Karmakar, A. (2015). Speech Enhancement using Spectral Subtraction-type Algorithms: A Comparison and Simulation Study. Procedia Computer Science, 54, 574–584. https://doi.org/10.1016/j.procs.2015.06.066

Jakovljević, N., Janev, M., Pekar, D., & Mišković, D. (2008). Energy Normalization in Automatic Speech Recognition. In Lecture Notes in Computer Science, 341–347. https://doi.org/10.1007/978-3-540-87391-4_44

Hviyuzova, D., & Belitskiy, A. (2021). Development of a filter amplifier of the signal pre-processing device for the passive listening mode of the hydroacoustic complex (НАС). E3S Web of Conferences, 266, 04013. https://doi.org/10.1051/e3sconf/202126604013

Introduction to Speech Processing. (n. d.). https://speechprocessingbook.aalto.fi/Representations/Windowing.html

Junqua, J.-C., Mak, B., Reaves, B. (1994). A robust algorithm for word boundary detection in presence of noise. IEEE Trans. on Speech and Audio Processing, 2, 406– 412.

Junqua, J.-C., Mak, B., Reaves, B. (1994). A robust algorithm for word boundary detection in presence of noise. IEEE Trans. on Speech and Audio Processing, 2, 406–412.

Liu, Y., Qian, Y., Chen, N., Fu, T., Zhang, Y., & Yu, K. (2015). Deep feature for text-dependent speaker verification. Speech Communication, 73, 1–13. https://doi.org/10.1016/j.specom.2015.07.003

Heigold, G., Moreno, I., Bengio, S., & Shazeer, N. (2016). End-to-end text-dependent speaker verification. https://doi.org/10.1109/icassp.2016.7472652

Xu, M., Duan, L. Y., Cai, J., Chia, L. T., Xu, C., & Tian, Q. (2004). HMM-Based Audio Keyword Generation. In Lecture Notes in Computer Science, 566–574. https://doi.org/10.1007/978-3-540-30543-9_71

Wijoyo, S. (2011). Speech Recognition Using Linear Predictive Coding and Artificial Neural Network for Controlling Movement of Mobile Robot. http://fportfolio.petra.ac.id/user_files/97-031/E091%20full%20paper-Thiang%20-%20ICIEE%202011.pdf

Desplanques, B., Thienpondt, J., & Demuynck, K. (2020). ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. https://doi.org/10.21437/interspeech.2020-2650

Jakubec, M., Lieskovska, E., & Jarina, R. (2021). Speaker Recognition with ResNet and VGG Networks, 31st International Conference Radioelektronika (RADIOELEKTRONIKA), 1–5. https://doi.org/10.1109/RADIOELEKTRONIKA52220.2021.9420202

Downloads


Abstract views: 171

Published

2024-09-25

How to Cite

Ruda, K., Sabodashko, D., Mykytyn, H., Shved, M., Borduliak, S., & Korshun, N. (2024). COMPARISON OF DIGITAL SIGNAL PROCESSING METHODS AND DEEP LEARNING MODELS IN VOICE AUTHENTICATION. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 1(25), 140–160. https://doi.org/10.28925/2663-4023.2024.25.140160

Most read articles by the same author(s)