МНОЖИНА КРИТЕРІЇВ ЕФЕКТИВНОСТІ ФОРМУВАННЯ БАЗ ДАНИХ ЕМОЦІЙНО ЗАБАРВЛЕНИХ ГОЛОСОВИХ СИГНАЛІВ

Ivan Dychka; Ihor Tereikovskyi; Andrii Samofalov; Lyudmila Tereykovska; Vitaliy Romankevich

doi:10.28925/2663-4023.2023.21.6574

Authors

Ivan Dychka National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0000-0002-3446-3076
Ihor Tereikovskyi National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0000-0003-4621-9668
Andrii Samofalov National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0009-0002-1205-5044
Lyudmila Tereykovska Kyiv National University of Construction and Architecture https://orcid.org/0000-0002-8830-0790
Vitaliy Romankevich National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0000-0003-4696-5935

DOI:

https://doi.org/10.28925/2663-4023.2023.21.6574

Keywords:

database; emotion recognition; voice signal; efficiency criterion

Abstract

A significant number of created databases of emotional speech in different languages testifies to the great interest of the research community in the problems of synthesis of emotional voice signals and recognition of emotions in the human voice. In our time, devices that use a voice interface for user interaction are gaining significant use, which is especially important in certain robotic systems.

As a basis for the creation of computer systems for recognizing emotions in a person's voice, neural networks are usually used, and for their training, sufficiently large databases of emotional voice signals are required. The main approach used in the creation of such databases is the involvement of actors to reproduce a given range of emotions in their utterances, and, accordingly, the use of specialized equipment for recording and analyzing the received audio data. However, this approach requires significant time and resource costs, which does not allow generating significant volumes of emotional voice expressions in a reasonable period of time.

Therefore, in order to evaluate the effectiveness of the formation of databases of emotional voice signals, a list of criteria is given, according to which the means of forming emotional databases were evaluated. The results of the evaluation allow us to reasonably claim that the known means of forming emotional databases of human voice signals have a certain number of shortcomings. In order to increase the efficiency of the means of forming databases of emotional voice signals, it is advisable to provide the possibility of forming databases without the involvement of professional actors, the presence of spontaneous expressions, not only predetermined ones, the presence of polyphonic expressions, namely dialogues, and the presence of opportunities for evaluating time and computing resources, which are necessary for the formation of database elements.

Downloads

Download data is not yet available.

References

Ekman, P. (2005). Basic Emotions. In Handbook of Cognition and Emotion (p. 45–60). John Wiley & Sons, Ltd. https://doi.org/10.1002/0470013494.ch3

Bachorowski, J.-A., & Owren, M. J. (1995). Vocal Expression of Emotion: Acoustic Properties of Speech Are Associated With Emotional Intensity and Context. Psychological Science, 6(4), 219–224. https://doi.org/10.1111/j.1467-9280.1995.tb00596.x

Hirschberg, J. (2006). Pragmatics and Intonation. In The Handbook of Pragmatics (eds L.R. Horn and G. Ward). https://doi.org/10.1002/9780470756959.ch23

Tereykovska, L. (2023). Methodology of automated recognition of the emotional state of listeners of the distance learning system [Dissertation, Kyiv National University of Construction and Architecture]. Institutional repository of National transport university. http://www.ntu.edu.ua/nauka/oprilyudnennya-disertacij/

Kominek, J., & Black, A. (2004). The CMU Arctic speech databases. SSW5-2004. https://www.lti.cs.cmu.edu/sites/default/files/CMU-LTI-03-177-T.pdf (date of access: 01.06.2023)

Zhou, K., Sisman, B., Liu, R., & Li, H. (2022). Emotional voice conversion: Theory, databases and ESD. Speech Communication, 137, 1–18. https://doi.org/10.1016/j.specom.2021.11.006

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Interspeech 2005. ISCA. https://doi.org/10.21437/interspeech.2005-446

Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE, 13(5), Стаття e0196391. https://doi.org/10.1371/journal.pone.0196391

James, J., Tian, L., & Inez Watson, C. (2018). An Open Source Emotional Speech Corpus for Human Robot Interaction Applications. In Interspeech 2018. ISCA. https://doi.org/10.21437/interspeech.2018-1349

10) Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: an Italian Emotional Speech Database. У Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 3501–3504, Reykjavik, Iceland. European Language Resources Association (ELRA).

MULTIPLE EFFECTIVENESS CRITERIA OF FORMING DATABASES OF EMOTIONAL VOICE SIGNALS

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

index

Language

Make a Submission

counter

Information

Developed By

Current Issue