Skip to main content

Sequential Decision Making in Spoken Dialog Management

  • Chapter
  • First Online:
Building Dialogue POMDPs from Expert Dialogues

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 492 Accesses

Abstract

This chapter includes two major sections. In Sect. 3.1, we introduce sequential decision making and study the supporting mathematical framework for it. We describe the Markov decision process (MDP) and the partially observable MDP (POMDP) frameworks, and present the well-known algorithms for solving them. In Sect. 3.2, we introduce spoken dialog systems (SDSs). Then, we study the related work of sequential decision making in spoken dialog management. In particular, we study the related research on application of the POMDP framework for spoken dialog management. Finally, we review the user modeling techniques that have been used for dialog POMDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that here we assume that the PBVI is performed on a fixed set of random points similar to the PERSEUS algorithm, the PBVI algorithm proposed by Spaan and Vlassis (2005).

References

  • Ai, H., & Litman, D. J. (2007). Knowledge consistent user simulations for dialog systems. In Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH’07), Antwerp.

    Google Scholar 

  • Atrash, A., & Pineau, J. (2010). A Bayesian method for learning POMDP observation parameters for robot interaction management systems. In The POMDP Practitioners Workshop.

    Google Scholar 

  • Bellman, R. (1957a). Dynamic programming. Princeton: Princeton University Press.

    Google Scholar 

  • Bellman, R. (1957b). A Markovian decision process. Journal of Mathematics and Mechanics, 6(6), 679–684

    Google Scholar 

  • Bonet, B., & Geffner, H. (2003). Faster heuristic search algorithms for planning with uncertainty and full feedback. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco, Mexico.

    Google Scholar 

  • Cassandra, A., Kaelbling, L., & Littman, M. (1995). Acting optimally in partially observable stochastic domains. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI’95), Seattle, Washington.

    Google Scholar 

  • Chandramohan, S., Geist, M., Lefevre, F., & Pietquin, O. (2011). User simulation in dialogue systems using inverse reinforcement learning. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH’11), Florence.

    Google Scholar 

  • Clark, H., & Brennan, S. (1991). Grounding in communication. Perspectives on Socially Shared Cognition, 13(1991), 127–149.

    Article  Google Scholar 

  • Cuayáhuitl, H., Renals, S., Lemon, O., & Shimodaira, H. (2005). Human-computer dialogue simulation using hidden Markov models. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’05), San Juan, PR.

    Google Scholar 

  • Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’07), Hyderabad.

    Google Scholar 

  • Dibangoye, J. S., Shani, G., Chaib-draa, B., & Mouaddib, A. (2009). Topological order planner for POMDPs. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’09), Pasadena, CA.

    Google Scholar 

  • Doshi, F., & Roy, N. (2007). Efficient model learning for dialog management. In Proceedings of the 2nd ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI’07), Arlington, VA.

    Google Scholar 

  • Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299–318.

    Article  Google Scholar 

  • Doshi-Velez, F., Pineau, J., & Roy, N. (2012). Reinforcement learning with limited reinforcement: Using bayes risk for active learning in pomdps. Artificial Intelligence, 187. Elesiver, 115–132

    Google Scholar 

  • Eckert, W., Levin, E., & Pieraccini, R. (1997). User modeling for spoken dialogue system evaluation. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’97), Santa Barbara, CA (pp. 80–87).

    Google Scholar 

  • Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering Review, 24(4), 375–408.

    Article  Google Scholar 

  • Gašić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Yu, K., et al. (2008). Training and evaluation of the HIS POMDP dialogue system in noise. In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue (SIGdial’08), Columbus, OH.

    Google Scholar 

  • Georgila, K., Henderson, J., & Lemon, O. (2005). Learning user simulations for information state update dialogue systems. In Proceedings of the 6th Annual Conference of the International Speech Communication Association (INTERSPEECH’05), Lisbon.

    Google Scholar 

  • Georgila, K., Henderson, J., & Lemon, O. (2006). User simulation for spoken dialogue systems: Learning and evaluation. In Proceedings of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH’06), Pittsburgh, PA.

    Google Scholar 

  • Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94.

    MathSciNet  MATH  Google Scholar 

  • Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.

    Article  MathSciNet  MATH  Google Scholar 

  • Keizer, S., Gašić, M., Jurčíček, F., Mairesse, F., Thomson, B., Yu, K., et al. (2010). Parameter estimation for agenda-based user simulation. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 116–123). Tokyo, Japan: Association for Computational Linguistics.

    Google Scholar 

  • Kim, D., Kim, J., & Kim, K. (2011). Robust performance evaluation of POMDP-based dialogue systems. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 1029–1040.

    Article  Google Scholar 

  • Kim, D., Sim, H. S., Kim, K.-E., Kim, J. H., Kim, H., & Sung, J. W. (2008). Effects of user modeling on POMDP-based dialogue systems. In Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH’08), Brisbane.

    Google Scholar 

  • Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13, 556–562.

    Google Scholar 

  • Levin, E., & Pieraccini, R. (1997). A stochastic model of computer-human interaction for learning dialogue strategies. In Proceedings of 5th European Conference on Speech Communication and Technology (Eurospeech’97), Rhodes.

    Google Scholar 

  • Li, X., Cheung, W., Liu, J., & Wu, Z. (2007). A novel orthogonal nmf-based belief compression for POMDPs. In Proceedings of the 24th International Conference on Machine learning (ICML’07), Corvallis.

    Google Scholar 

  • Lison, P. (2013). Model-based bayesian reinforcement learning for dialogue management. In Proceedings of 14th Annual Conference of the International Speech Communication Association (INTERSPEECH’13), Lyon.

    Google Scholar 

  • Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14, 83–103.

    MathSciNet  MATH  Google Scholar 

  • Madani, O., Hanks, S., & Condon, A. (1999). On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI’99) and the 11th Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, Orlando, FL.

    Google Scholar 

  • Monahan, G. (1982). A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science, 28, 1–16.

    Article  MathSciNet  MATH  Google Scholar 

  • Papadimitriou, C., & Tsitsiklis, J. (1987). The complexity of Markov decision process. Mathematics of Operations Research, 12(3), 441–450.

    Article  MathSciNet  MATH  Google Scholar 

  • Paquet, S. (2006). Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments. Ph.D. thesis, Université Laval.

    Google Scholar 

  • Paquet, S., Tobin, L., & Chaib-draa, B. (2005). An online POMDP algorithm for complex multiagent environments. In Proceedings of the 4th International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS’05), Utrecht.

    Google Scholar 

  • Pieraccini, R., Levin, E., & Eckert, W. (1997). Learning dialogue strategies within Markov decision process framework. In Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU’97), Rhodes.

    Google Scholar 

  • Pietquin, O. (2004). A framework for unsupervised learning of dialogue strategies. Ph.D. thesis, Faculté Polytechnique de Mons.

    Google Scholar 

  • Pietquin, O. (2006). Consistent goal-directed user model for realistic man-machine task-oriented spoken dialogue simulation. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME’06), Toronto, ON (pp. 425–428).

    Google Scholar 

  • Pietquin, O., & Dutoit, T. (2006). A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 589–599.

    Article  Google Scholar 

  • Pineau, J. (2004). Tractable planning under uncertainty: Exploiting structure. Ph.D. thesis, Rutgers University.

    Google Scholar 

  • Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco.

    Google Scholar 

  • Png, S., & Pineau, J. (2011). Bayesian reinforcement learning for POMDP-based dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11), Prague.

    Google Scholar 

  • Png, S., Pineau, J., & Chaib-Draa, B. (2012). Building adaptive dialogue systems via bayes-adaptive pomdps. IEEE Journal of Selected Topics in Signal Processing, 6(8), 917–927.

    Article  Google Scholar 

  • Poupart, P., & Boutilier, C. (2002). Value-directed compression of POMDPs. In Advances in Neural Information Processing Systems 14 (NIPS’02), Vancouver, BC.

    Google Scholar 

  • Rieser, V., & Lemon, O. (2006). Cluster-based user simulations for learning dialogue strategies. In Proceedings of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH’06), Pittsburgh, PA.

    Google Scholar 

  • Rieser, V., & Lemon, O. (2011). Reinforcement learning for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language generation. Springer Science & Business Media.

    Book  Google Scholar 

  • Ross, S., Pineau, J., Chaib-draa, B., & Kreitmann, P. (2011). A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research, 12, 1729–1770.

    MathSciNet  MATH  Google Scholar 

  • Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Artificial Intelligence Research, 32(1), 663–704.

    MathSciNet  MATH  Google Scholar 

  • Roy, N., Gordon, J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.

    Article  MATH  Google Scholar 

  • Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL’00), Hong Kong.

    Google Scholar 

  • Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., & Young, S. (2007). Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (pp. 149–152). Association for Computational Linguistics.

    Google Scholar 

  • Schatzmann, J., Weilhammer, K., Stuttle, M., & Young, S. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review, 21(2), 97–126.

    Article  Google Scholar 

  • Schatzmann, J., & Young, S. (2009). The hidden agenda user simulation model. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 733–747.

    Article  Google Scholar 

  • Scheffler, K., & Young, S. (2000). Probabilistic simulation of human-machine dialogues. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’00) (Vol. 2, pp. 1217–1220).

    Google Scholar 

  • Smallwood, R., & Sondik, E. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071–1088.

    Article  MATH  Google Scholar 

  • Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), Banff, AB.

    Google Scholar 

  • Sondik, E. (1971). The optimal control of partially observable Markov processes. Ph.D. thesis, Stanford University.

    Google Scholar 

  • Spaan, M., & Spaan, N. (2004). A point-based POMDP algorithm for robot planning. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA’04), New Orleans, LA.

    Google Scholar 

  • Spaan, M., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24(1), 195–220.

    MATH  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

    Google Scholar 

  • Thomson, B. (2009). Statistical methods for spoken dialogue management. Ph.D. thesis, Department of Engineering, University of Cambridge.

    Google Scholar 

  • Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech and Language, 24(4), 562–588.

    Article  Google Scholar 

  • Traum, D. (1994). A computational theory of grounding in natural language conversation. Ph.D. thesis, University of Rochester.

    Google Scholar 

  • Watkins, C. J. C. H., & Dayan, P. (1992). Technical note Q-Learning. Machine Learning, 8, 279–292.

    MATH  Google Scholar 

  • Wierstra, D., & Wiering, M. (2004). Utile distinction hidden Markov models. In Proceedings of the Twenty-First International Conference on Machine Learning (p. 108). New York: ACM.

    Google Scholar 

  • Williams, J. D. (2006). Partially observable Markov decision processes for spoken dialogue management. Ph.D. thesis, Department of Engineering, University of Cambridge.

    Google Scholar 

  • Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21, 393–422.

    Article  Google Scholar 

  • Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160–1179.

    Article  Google Scholar 

  • Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001b). Planning and acting under uncertainty: A new model for spoken dialogue system. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI’01), Seattle, Washington.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 The Authors

About this chapter

Cite this chapter

Chinaei, H., Chaib-draa, B. (2016). Sequential Decision Making in Spoken Dialog Management. In: Building Dialogue POMDPs from Expert Dialogues. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-26200-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26200-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26198-0

  • Online ISBN: 978-3-319-26200-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics