Method for Measurement the Intensity of Speech Vowel Sounds Flow for Audiovisual Dialogue Information Systems

Savchenko, A. V.; Savchenko, V. V.

doi:10.1007/s11018-022-02072-x

Method for Measurement the Intensity of Speech Vowel Sounds Flow for Audiovisual Dialogue Information Systems

Published: 05 September 2022

Volume 65, pages 219–226, (2022)
Cite this article

Measurement Techniques Aims and scope

A. V. Savchenko¹ &
V. V. Savchenko²

63 Accesses
Explore all metrics

The interaction of two types of modality of a system for processing audiovisual information in the problem of evaluating the emotional state of users of dialogue information systems was studied. In order to enhance the precision of an estimation in real time, it is proposed to use an audio modality for the purpose of detecting speech segments of increased emotionality. As an indicator of the degree of speech emotionality, the intensity of the flow of vowel sounds in a user’s speech signal at input to the information system is used. A method has been developed for measuring this indicator from the empirical probability of the occurrence of vowel sounds in the user’s a speech signal. An example is presented for practical implementation of the method in soft real time. A full-scale experiment using the authors’ software was posed and presented. The advantages of the proposed method are shown: high speed of operation and high sensitivity to the change in the level of speech emotionality of users. Results obtained are intended for developers of advanced information systems with an audiovisual user interface.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perceived Length of Czech High Vowels in Relation to Formant Frequencies Evaluated by Automatic Speech Recognition

Issues in Formant Analysis of Emotive Speech Using Vowel-Like Region Onset Points

Interplay of Visual and Acoustic Cues of Irony Perception: A Case Study of Actor’s Speech

References

S. K. Davis, M. Morningstar, M. A. Dirks, and R. Qualter, Person. Individ. Differ., 160, 109938 (2020), https://doi.org/10.1016/j.paid.2020.109938.
Article Google Scholar
J. M. Arana, E. Gordillo, J. Darias, and L. Mestas, Comp. Human Behav., 104, 106156 (2020), https://doi.org/10.1016/j.chb.2019.106156.
Article Google Scholar
L. V. Savchenko and A. V. Savchenko, Measur. Techn., 64, No. 4 (2021), https://doi.org/10.1007/s11018-021-01935-z.
F. A. Shaqra, R. Duwairi, and M. Al-Ayyoub, Proc. Comp. Sci., 151, 37–44 (2019), https://doi.org/10.1016/j.procs.2019.04.009.
Article Google Scholar
A. V. Savchenko and V. V. Savchenko, Izmer. Tekhn., No. 11, 60–66 (2021), https://doi.org/10.32446/0368-1025it.2021-11-60-66.
N. Srinivas, G. Pradhan, and P. K. Kumar, Integration, 63, 185–195 (2018), https://doi.org/10.1016/j.vlsi.2018.07.005.
Article Google Scholar
R. Rammohan, N. Dhanabalsamy, V. Dimov, and F. L. Eidelman, J. Allergy Clin. Immunol., 139, No. 2, ab250 (2017), https://doi.org/10.1016/j.jaci.2016.12.804.
Article Google Scholar
M. B. Akgay and K. Oguz, Speech Commun., 116, 56–76 (2020), https://doi.org/10.1016/j.specom.2019.12.001.
Article Google Scholar
M. Bourguignon, N. Molinaro, M. Lizarazu, et al., Neuroimage, 216, 116788 (2020), https://doi.org/10.1016/j.neuroimage.2020.
Article Google Scholar
D. B. Cardona, N. Nedjah, and L. M. Mourelle, Neurocomputing, 265, 78–90 (2017), https://doi.org/10.1016/j.neucom.2016.09.140.
Article Google Scholar
S. Cui, E. Li, and X. Kang, IEEE Int. Conf. on Multimedia and Expo (ICME), London UK, July 6–10, 2020, pp. 1–6, 10.1109/ICME46284.2020.9102765.
H. B. Kashani, A. Sayadiyan, and H. Sheikhzadeh, Speech Commun., 91, 28–48 (2017), https://doi.org/10.1016/j.specom.2017.04.008.
Article Google Scholar
D. Yongda, L. Fang, and X. Huang, Computers & Electr. Eng., 72, 443–454 (2018), https://doi.org/10.1016/j.compeleceng.2018.09.014.
Article Google Scholar
F. R. Akbulut, H. G. Perros, and M. Shahzad, Comp. Meth. Progr. Biomed., 195, 105571 (2020), https://doi.org/10.1016/j.cmpb.2020.105571.
Article Google Scholar
B. Stasak, J. Epps, and R. Goecke, Comp. Speech & Lang., 53, 140–155 (2019), https://doi.org/10.1016/j.csl.2018.08.001.
Article Google Scholar
T. Asada, R. Adachi, S. Takada, et al., Proc. Int. Conf. on Artificial Life and Robotics, Beppu, Oita, Japan, Jan. 13–16, 2020, ALife Robotics Corp. (2020), Vol. 2, pp. 398–402, https://doi.org/10.5954/ICAROB.2020.OS16-3.
D. S. Juan, M. Senoussaoui, E. Granger, et al., Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition (2019), https://arxiv.org/abs/1907.03196v1 [cs.CVj.
A. A. Borovkov, Mathematical Statistics, Lan’, St. Petersburg (2010).
Google Scholar
A. Kumar, S. Shahnawazuddin, and G. Pradhan, Circ. Syst. Signal Proc., 36, 2315–2340 (2017), https://doi.org/10.1007/s00034-016-0409-1.
Article Google Scholar
V. V. Savchenko, Radioelectr. Commun. Syst., 63, 532–542 (2020), https://doi.org/10.3103/S0735272720100039.
Article Google Scholar
A. V. Savchenko, V. V. Savchenko, and L. V. Savchenko, Optimiz. Lett., No. 7 (2021), https://doi.org/10.1007/s11590-021-01790-5.
Ç. Candan, Signal Proc., 166, 107256 (2020), https://doi.org/10.1016/j.sigpro.2019.107256.
Article Google Scholar
V. V. Savchenko, “Solving the problem of multiple comparisons in problems of automatic signal recognition at the output of the voice communication path,” Elektrosvyaz, No. 12, 22–27 (2017).
V. V. Savchenko and A. V. Savchenko, J. Communic. Technol. Electron., 65, No. 11, 1311–1317 (2020), https://doi.org/10.1134/S1064226920110157.
Article Google Scholar
S. Kullback, Information Theory and Statistics, Dover Publ., New York (1997).
MATH Google Scholar
V. V. Savchenko, J. Communic. Technol. Electron., 64, No. 6, 590–596 (2019), https://doi.org/10.1134/S1064226919060093.
Article Google Scholar
R. M. Gray, A. Buzo, A. H. Gray, and Y. Matsuyama, IEEE T. Signal Proc., 28, No. 4, 367–377 (1980), https://doi.org/10.1109/TASSP.1980.1163421.
Article Google Scholar
V. V. Savchenko and A. V. Savchenko, Radioelectr. Commun. Syst., 62, 276–286 (2019), l0.3103/S0735272719050042.
S. L. Marple, Digital Spectral Analysis with Applications, Dover Publ., Mineola, NY (2019), 2nd ed.
O. Perepelkina, E. Kazimírova, and M. Konstantinova, Proc. Int. Conf. on Speech and Computer (SPECOM 2018), Germany, Sept. 18–22, 2018, Springer, Cham, (2018), pp. 501–510, https://doi.org/10.1007/978-3-319-99579-3_52.

Download references

Author information

Authors and Affiliations

National Research University Higher School of Economics, Nizhny Novgorod, Russia
A. V. Savchenko
Nizhny Novgorod State Linguistic University, Nizhny Novgorod, Russia
V. V. Savchenko

Authors

A. V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. V. Savchenko.

Additional information

Translated from Izmeritel’naya Tekhnika, No. 3, pp. 65–72, March, 2022.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Savchenko, A.V., Savchenko, V.V. Method for Measurement the Intensity of Speech Vowel Sounds Flow for Audiovisual Dialogue Information Systems. Meas Tech 65, 219–226 (2022). https://doi.org/10.1007/s11018-022-02072-x

Download citation

Received: 13 December 2021
Accepted: 10 February 2022
Published: 05 September 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11018-022-02072-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Method for Measurement the Intensity of Speech Vowel Sounds Flow for Audiovisual Dialogue Information Systems

Access this article

Similar content being viewed by others

Perceived Length of Czech High Vowels in Relation to Formant Frequencies Evaluated by Automatic Speech Recognition

Issues in Formant Analysis of Emotive Speech Using Vowel-Like Region Onset Points

Interplay of Visual and Acoustic Cues of Irony Perception: A Case Study of Actor’s Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Method for Measurement the Intensity of Speech Vowel Sounds Flow for Audiovisual Dialogue Information Systems

Access this article

Similar content being viewed by others

Perceived Length of Czech High Vowels in Relation to Formant Frequencies Evaluated by Automatic Speech Recognition

Issues in Formant Analysis of Emotive Speech Using Vowel-Like Region Onset Points

Interplay of Visual and Acoustic Cues of Irony Perception: A Case Study of Actor’s Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation