EVALUATION OF THE SCALABILITY OF VOICE EMBEDDING MODELS IN BIOMETRIC SPEAKER VERIFICATION SYSTEMS

Authors

DOI:

https://doi.org/10.28925/2663-4023.2025.31.1042

Keywords:

voice biometrics; scalability; speaker verification; embeddings; authentication; ECAPA-TDNN; Pyannote; WavLM.

Abstract

The rapid expansion of digital platforms in the financial sector, public administration, e-commerce, and service systems has created a growing demand for highly reliable and scalable user authentication technologies. In this context, biometric methods — particularly voice-based authentication systems — demonstrate significant potential due to their natural ease of interaction, minimal hardware requirements, and seamless integration into voice-driven interfaces. However, the increasing number of users and the diversity of usage scenarios introduce new challenges for researchers and developers. Modern systems must ensure high accuracy in real time, maintain stable performance as data volumes grow, and provide resilience against cyberattacks, including those involving synthetic or manipulated speech. A critical requirement is the ability of models to generate compact, invariant, and robust voice embeddings that enable efficient comparison and classification within large-scale databases. This paper presents a comparative analysis of the scalability of contemporary neural architectures for speaker verification, with emphasis on their performance, computational complexity, and behavior as the number of enrolled users increases. The study examines model optimization techniques, indexed embedding-based search methods, and the role of representative multilingual corpora in enhancing accuracy under conditions of acoustic and linguistic variability. Particular attention is given to protection against spoofing attacks and the use of specialized synthetic speech detection methods as an essential component of scalable voice biometric systems. The results highlight the need for a comprehensive approach to designing modern voice authentication systems, in which architectural engineering decisions are combined with requirements for information security, high performance, and adaptability in the rapidly evolving landscape of digital services.

Downloads

Download data is not yet available.

References

Biostatistics.io. (n.d.). Implementing biometrics for large-scale applications: Overcoming 6 challenges. https://biostatistics.io/qa/implementing-biometrics-for-large-scale-applications-overcoming-6-challenges

Ruda, K. (2025). Study of the scalability of biometric authentication systems based on voice embeddings. Social Development and Security, 15(1), 161–170. https://doi.org/10.33445/sds.2025.15.1.15

Brydinskyi, V., Khoma, Y., Sabodashko, D., Podpora, M., Khoma, V., Konovalov, A., & Kostiak, M. (2024). Comparison of modern deep learning models for speaker verification. Applied Sciences, 14(4), Article 1329. https://doi.org/10.3390/app14041329

Thienpondt, J., & Demuynck, K. (2023). ECAPA2: A hybrid neural network architecture and training strategy for robust speaker embeddings. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 1–8). IEEE. https://doi.org/10.1109/ASRU57964.2023.10389750

Deng, F., Huang, R., Jiang, P., & Deng, L. (2025). Dense-Fusion2Net: A more efficient and lightweight short speech speaker recognition system with time-frequency channel attention. Scientific Reports, 15, 9601. https://doi.org/10.1038/s41598-025-93873-x

Sharma, R., Govind, D., Mishra, J., Dubey, A. K., Deepak, K. T., & Prasanna, S. R. M. (2024). Milestones in speaker recognition. Artificial Intelligence Review, 57, Article 58. https://doi.org/10.1007/s10462-023-10688-w

Chen, G., et al. (2023). Towards understanding and mitigating audio adversarial examples for speaker recognition. IEEE Transactions on Dependable and Secure Computing, 20(5), 3970–3987. https://doi.org/10.1109/TDSC.2022.3220673

Chen, Z., & Xu, S. (2023). Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning. EURASIP Journal on Audio, Speech, and Music Processing, 2023, Article 33. https://doi.org/10.1186/s13636-023-00299-2

RudderAnalytics. (n.d.). Building a robust speaker verification system for secure voice authentication. Medium. https://medium.com/@rudderanalytics/voice-based-security-implementing-a-robust-speaker-verification-system-12c5fd98f1c1

Sharif-Noughabi, M., Razavi, S. M., & Mohamadzadeh, S. (2025). Improving the performance of speaker recognition system using optimized VGG convolutional neural network and data augmentation. International Journal of Engineering, 38(10), 2414–2425. https://doi.org/10.5829/ije.2025.38.10a.17

Amazon Science Blog. (n.d.). On-device speech processing makes Alexa faster, lower bandwidth. https://www.amazon.science/blog/on-device-speech-processing-makes-alexa-faster-lower-bandwidth

Google Research. (n.d.). An overview of speech recognition techniques. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42535.pdf

Hugging Face. (2023). ua-polit-tiny [Dataset]. https://huggingface.co/datasets/vbrydik/ua-polit-tiny

Alice Biometrics. (2023). Defining the core accuracy metrics of biometric systems. https://alicebiometrics.com/en/defining-the-core-accuracy-metrics-of-biometric-systems

Downloads


Abstract views: 13

Published

2025-12-16

How to Cite

Ruda, K., Kos, I., & Akhmedova, A. (2025). EVALUATION OF THE SCALABILITY OF VOICE EMBEDDING MODELS IN BIOMETRIC SPEAKER VERIFICATION SYSTEMS. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 3(31), 528–540. https://doi.org/10.28925/2663-4023.2025.31.1042