Skip to main content

End-to-end Modeling for Selection of Utterance Constructional Units via System Internal States

  • Chapter
  • First Online:
Increasing Naturalness and Flexibility in Spoken Dialogue Interaction

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 714))

  • 422 Accesses

Abstract

In order to make conversational agents or robots conduct human-like behaviors, it is important to design a model of the system internal states. In this paper, we address a model of favorable impression to the dialogue partner. The favorable impression is modeled to change according to user’s dialogue behaviors and also affect following dialogue behaviors of the system, specifically selection of utterance constructional units. For this modeling, we propose a hierarchical structure of logistic regression models. First, from the user’s dialogue behaviors, the model estimates the level of user’s favorable impression to the system and also the level of the user’s interest in the current topic. Then, based on the above results, the model predicts the system’s favorable impression to the user. Finally, the model determines selection of utterance constructional units in the next system turn. We train each of the logistic regression models individually with a small amount of annotated data of favorable impression. Afterward, the entire multi-layer network is fine-tuned with a larger amount of dialogue behavior data. An experimental result shows that the proposed method achieves higher accuracy on the selection of the utterance constructional units, compared with methods that do not take into account the system internal states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anagnostopoulos CN, Iliou T, Giannoukos I (2015) Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif Intell Rev 43(2):155–177

    Article  Google Scholar 

  2. Bates J (1994) The role of emotion in believable agents. Commun ACM 37(7):122–125

    Article  Google Scholar 

  3. Becker C, Kopp S, Wachsmuth I (2004) Simulating the emotion dynamics of a multimodal conversational agent. In: ADS, pp. 154–165

    Google Scholar 

  4. Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int. 5(9):341–345

    Google Scholar 

  5. Bunt H, Alexandersson J, Carletta J, Choe JW, Fang AC, Hasida K, Lee K, Petukhova V, Popescu-Belis A, Romary L, et al (2010) Towards an ISO standard for dialogue act annotation. In: LREC, pp. 2548–2555

    Google Scholar 

  6. Den Y, Koiso H, Maruyama T, Maekawa K, Takanashi K, Enomoto M, Yoshida N (2010) Two-level annotation of utterance-units in japanese dialogs: an empirically emerged scheme. In: LREC, pp. 1483–1486

    Google Scholar 

  7. Inoue K, Milhorat P, Lala D, Zhao T, Kawahara T (2016) Talking with erica, an autonomous android. In: SIGDIAL, pp 212–215

    Google Scholar 

  8. Ishi CT, Ishiguro H, Hagita N (2012) Evaluation of formant-based lip motion generation in tele-operated humanoid robots. In: IROS, pp 2377–2382

    Google Scholar 

  9. Jurafsky D, Ranganath R, McFarland D (2009) Extracting social meaning: identifying interactional style in spoken conversation. In: NAACL, pp 638–646

    Google Scholar 

  10. Kawahara T (2018) Spoken dialogue system for a human-like conversational robot ERICA. In: IWSDS

    Google Scholar 

  11. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR

    Google Scholar 

  12. Pentland AS (2010) Honest signals: how they shape our world. MIT press, Cambridge (2010)

    Google Scholar 

  13. Picard RW (1997) Affective computing, vol 252. MIT press, Cambridge

    Google Scholar 

  14. Sakai K, Ishi CT, Minato T, Ishiguro H (2015) Online speech-driven head motion generating system and evaluation on a tele-operated robot. In: ROMAN, pp 529–534

    Google Scholar 

  15. Schuller B, Köhler N, Müller R, Rigoll G (2006) Recognition of interest in human conversational speech. In: ICSLP, pp 793–796

    Google Scholar 

  16. Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M, Weninger F, Eyben F, Marchi E, et al (2013) The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Interspeech, pp 148–152

    Google Scholar 

  17. Sinclair JM, Coulthard M (1975) Towards an analysis of discourse: the English used by teachers and pupils. Oxford University Press, Oxford

    Google Scholar 

  18. Wang WY, Biadsy F, Rosenberg A, Hirschberg J (2013) Automatic detection of speaker state: lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification. Comput Speech Lang 27(1):168–189

    Article  Google Scholar 

  19. Wu CH, Lin JC, Wei WL (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans Signal Inf Process 3:1–18

    Article  Google Scholar 

  20. Young S, Gašić M, Thomson B, Williams JD (2013) Pomdp-based statistical spoken dialog systems: a review. Proc IEEE 101(5):1160–1179

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by JST ERATO Grant Number JPMJER1401, Japan. The authors would like to thank Professor Graham Wilcock for his insightful advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koji Inoue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka, K., Inoue, K., Nakamura, S., Takanashi, K., Kawahara, T. (2021). End-to-end Modeling for Selection of Utterance Constructional Units via System Internal States. In: Marchi, E., Siniscalchi, S.M., Cumani, S., Salerno, V.M., Li, H. (eds) Increasing Naturalness and Flexibility in Spoken Dialogue Interaction. Lecture Notes in Electrical Engineering, vol 714. Springer, Singapore. https://doi.org/10.1007/978-981-15-9323-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-9323-9_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-9322-2

  • Online ISBN: 978-981-15-9323-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics