The interaction of two types of modality of a system for processing audiovisual information in the problem of evaluating the emotional state of users of dialogue information systems was studied. In order to enhance the precision of an estimation in real time, it is proposed to use an audio modality for the purpose of detecting speech segments of increased emotionality. As an indicator of the degree of speech emotionality, the intensity of the flow of vowel sounds in a user’s speech signal at input to the information system is used. A method has been developed for measuring this indicator from the empirical probability of the occurrence of vowel sounds in the user’s a speech signal. An example is presented for practical implementation of the method in soft real time. A full-scale experiment using the authors’ software was posed and presented. The advantages of the proposed method are shown: high speed of operation and high sensitivity to the change in the level of speech emotionality of users. Results obtained are intended for developers of advanced information systems with an audiovisual user interface.
Similar content being viewed by others
References
S. K. Davis, M. Morningstar, M. A. Dirks, and R. Qualter, Person. Individ. Differ., 160, 109938 (2020), https://doi.org/10.1016/j.paid.2020.109938.
J. M. Arana, E. Gordillo, J. Darias, and L. Mestas, Comp. Human Behav., 104, 106156 (2020), https://doi.org/10.1016/j.chb.2019.106156.
L. V. Savchenko and A. V. Savchenko, Measur. Techn., 64, No. 4 (2021), https://doi.org/10.1007/s11018-021-01935-z.
F. A. Shaqra, R. Duwairi, and M. Al-Ayyoub, Proc. Comp. Sci., 151, 37–44 (2019), https://doi.org/10.1016/j.procs.2019.04.009.
A. V. Savchenko and V. V. Savchenko, Izmer. Tekhn., No. 11, 60–66 (2021), https://doi.org/10.32446/0368-1025it.2021-11-60-66.
N. Srinivas, G. Pradhan, and P. K. Kumar, Integration, 63, 185–195 (2018), https://doi.org/10.1016/j.vlsi.2018.07.005.
R. Rammohan, N. Dhanabalsamy, V. Dimov, and F. L. Eidelman, J. Allergy Clin. Immunol., 139, No. 2, ab250 (2017), https://doi.org/10.1016/j.jaci.2016.12.804.
M. B. Akgay and K. Oguz, Speech Commun., 116, 56–76 (2020), https://doi.org/10.1016/j.specom.2019.12.001.
M. Bourguignon, N. Molinaro, M. Lizarazu, et al., Neuroimage, 216, 116788 (2020), https://doi.org/10.1016/j.neuroimage.2020.
D. B. Cardona, N. Nedjah, and L. M. Mourelle, Neurocomputing, 265, 78–90 (2017), https://doi.org/10.1016/j.neucom.2016.09.140.
S. Cui, E. Li, and X. Kang, IEEE Int. Conf. on Multimedia and Expo (ICME), London UK, July 6–10, 2020, pp. 1–6, 10.1109/ICME46284.2020.9102765.
H. B. Kashani, A. Sayadiyan, and H. Sheikhzadeh, Speech Commun., 91, 28–48 (2017), https://doi.org/10.1016/j.specom.2017.04.008.
D. Yongda, L. Fang, and X. Huang, Computers & Electr. Eng., 72, 443–454 (2018), https://doi.org/10.1016/j.compeleceng.2018.09.014.
F. R. Akbulut, H. G. Perros, and M. Shahzad, Comp. Meth. Progr. Biomed., 195, 105571 (2020), https://doi.org/10.1016/j.cmpb.2020.105571.
B. Stasak, J. Epps, and R. Goecke, Comp. Speech & Lang., 53, 140–155 (2019), https://doi.org/10.1016/j.csl.2018.08.001.
T. Asada, R. Adachi, S. Takada, et al., Proc. Int. Conf. on Artificial Life and Robotics, Beppu, Oita, Japan, Jan. 13–16, 2020, ALife Robotics Corp. (2020), Vol. 2, pp. 398–402, https://doi.org/10.5954/ICAROB.2020.OS16-3.
D. S. Juan, M. Senoussaoui, E. Granger, et al., Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition (2019), https://arxiv.org/abs/1907.03196v1 [cs.CVj.
A. A. Borovkov, Mathematical Statistics, Lan’, St. Petersburg (2010).
A. Kumar, S. Shahnawazuddin, and G. Pradhan, Circ. Syst. Signal Proc., 36, 2315–2340 (2017), https://doi.org/10.1007/s00034-016-0409-1.
V. V. Savchenko, Radioelectr. Commun. Syst., 63, 532–542 (2020), https://doi.org/10.3103/S0735272720100039.
A. V. Savchenko, V. V. Savchenko, and L. V. Savchenko, Optimiz. Lett., No. 7 (2021), https://doi.org/10.1007/s11590-021-01790-5.
Ç. Candan, Signal Proc., 166, 107256 (2020), https://doi.org/10.1016/j.sigpro.2019.107256.
V. V. Savchenko, “Solving the problem of multiple comparisons in problems of automatic signal recognition at the output of the voice communication path,” Elektrosvyaz, No. 12, 22–27 (2017).
V. V. Savchenko and A. V. Savchenko, J. Communic. Technol. Electron., 65, No. 11, 1311–1317 (2020), https://doi.org/10.1134/S1064226920110157.
S. Kullback, Information Theory and Statistics, Dover Publ., New York (1997).
V. V. Savchenko, J. Communic. Technol. Electron., 64, No. 6, 590–596 (2019), https://doi.org/10.1134/S1064226919060093.
R. M. Gray, A. Buzo, A. H. Gray, and Y. Matsuyama, IEEE T. Signal Proc., 28, No. 4, 367–377 (1980), https://doi.org/10.1109/TASSP.1980.1163421.
V. V. Savchenko and A. V. Savchenko, Radioelectr. Commun. Syst., 62, 276–286 (2019), l0.3103/S0735272719050042.
S. L. Marple, Digital Spectral Analysis with Applications, Dover Publ., Mineola, NY (2019), 2nd ed.
O. Perepelkina, E. Kazimírova, and M. Konstantinova, Proc. Int. Conf. on Speech and Computer (SPECOM 2018), Germany, Sept. 18–22, 2018, Springer, Cham, (2018), pp. 501–510, https://doi.org/10.1007/978-3-319-99579-3_52.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Izmeritel’naya Tekhnika, No. 3, pp. 65–72, March, 2022.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Savchenko, A.V., Savchenko, V.V. Method for Measurement the Intensity of Speech Vowel Sounds Flow for Audiovisual Dialogue Information Systems. Meas Tech 65, 219–226 (2022). https://doi.org/10.1007/s11018-022-02072-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11018-022-02072-x