Sequential Decision Making in Spoken Dialog Management

Chinaei, Hamidreza; Chaib-draa, Brahim

doi:10.1007/978-3-319-26200-0_3

Hamidreza Chinaei³ &
Brahim Chaib-draa⁴

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

492 Accesses

Abstract

This chapter includes two major sections. In Sect. 3.1, we introduce sequential decision making and study the supporting mathematical framework for it. We describe the Markov decision process (MDP) and the partially observable MDP (POMDP) frameworks, and present the well-known algorithms for solving them. In Sect. 3.2, we introduce spoken dialog systems (SDSs). Then, we study the related work of sequential decision making in spoken dialog management. In particular, we study the related research on application of the POMDP framework for spoken dialog management. Finally, we review the user modeling techniques that have been used for dialog POMDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that here we assume that the PBVI is performed on a fixed set of random points similar to the PERSEUS algorithm, the PBVI algorithm proposed by Spaan and Vlassis (2005).

References

Ai, H., & Litman, D. J. (2007). Knowledge consistent user simulations for dialog systems. In Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH’07), Antwerp.
Google Scholar
Atrash, A., & Pineau, J. (2010). A Bayesian method for learning POMDP observation parameters for robot interaction management systems. In The POMDP Practitioners Workshop.
Google Scholar
Bellman, R. (1957a). Dynamic programming. Princeton: Princeton University Press.
Google Scholar
Bellman, R. (1957b). A Markovian decision process. Journal of Mathematics and Mechanics, 6(6), 679–684
Google Scholar
Bonet, B., & Geffner, H. (2003). Faster heuristic search algorithms for planning with uncertainty and full feedback. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco, Mexico.
Google Scholar
Cassandra, A., Kaelbling, L., & Littman, M. (1995). Acting optimally in partially observable stochastic domains. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI’95), Seattle, Washington.
Google Scholar
Chandramohan, S., Geist, M., Lefevre, F., & Pietquin, O. (2011). User simulation in dialogue systems using inverse reinforcement learning. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH’11), Florence.
Google Scholar
Clark, H., & Brennan, S. (1991). Grounding in communication. Perspectives on Socially Shared Cognition, 13(1991), 127–149.
Article Google Scholar
Cuayáhuitl, H., Renals, S., Lemon, O., & Shimodaira, H. (2005). Human-computer dialogue simulation using hidden Markov models. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’05), San Juan, PR.
Google Scholar
Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’07), Hyderabad.
Google Scholar
Dibangoye, J. S., Shani, G., Chaib-draa, B., & Mouaddib, A. (2009). Topological order planner for POMDPs. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’09), Pasadena, CA.
Google Scholar
Doshi, F., & Roy, N. (2007). Efficient model learning for dialog management. In Proceedings of the 2nd ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI’07), Arlington, VA.
Google Scholar
Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299–318.
Article Google Scholar
Doshi-Velez, F., Pineau, J., & Roy, N. (2012). Reinforcement learning with limited reinforcement: Using bayes risk for active learning in pomdps. Artificial Intelligence, 187. Elesiver, 115–132
Google Scholar
Eckert, W., Levin, E., & Pieraccini, R. (1997). User modeling for spoken dialogue system evaluation. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’97), Santa Barbara, CA (pp. 80–87).
Google Scholar
Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering Review, 24(4), 375–408.
Article Google Scholar
Gašić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Yu, K., et al. (2008). Training and evaluation of the HIS POMDP dialogue system in noise. In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue (SIGdial’08), Columbus, OH.
Google Scholar
Georgila, K., Henderson, J., & Lemon, O. (2005). Learning user simulations for information state update dialogue systems. In Proceedings of the 6th Annual Conference of the International Speech Communication Association (INTERSPEECH’05), Lisbon.
Google Scholar
Georgila, K., Henderson, J., & Lemon, O. (2006). User simulation for spoken dialogue systems: Learning and evaluation. In Proceedings of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH’06), Pittsburgh, PA.
Google Scholar
Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94.
MathSciNet MATH Google Scholar
Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.
Article MathSciNet MATH Google Scholar
Keizer, S., Gašić, M., Jurčíček, F., Mairesse, F., Thomson, B., Yu, K., et al. (2010). Parameter estimation for agenda-based user simulation. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 116–123). Tokyo, Japan: Association for Computational Linguistics.
Google Scholar
Kim, D., Kim, J., & Kim, K. (2011). Robust performance evaluation of POMDP-based dialogue systems. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 1029–1040.
Article Google Scholar
Kim, D., Sim, H. S., Kim, K.-E., Kim, J. H., Kim, H., & Sung, J. W. (2008). Effects of user modeling on POMDP-based dialogue systems. In Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH’08), Brisbane.
Google Scholar
Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13, 556–562.
Google Scholar
Levin, E., & Pieraccini, R. (1997). A stochastic model of computer-human interaction for learning dialogue strategies. In Proceedings of 5th European Conference on Speech Communication and Technology (Eurospeech’97), Rhodes.
Google Scholar
Li, X., Cheung, W., Liu, J., & Wu, Z. (2007). A novel orthogonal nmf-based belief compression for POMDPs. In Proceedings of the 24th International Conference on Machine learning (ICML’07), Corvallis.
Google Scholar
Lison, P. (2013). Model-based bayesian reinforcement learning for dialogue management. In Proceedings of 14th Annual Conference of the International Speech Communication Association (INTERSPEECH’13), Lyon.
Google Scholar
Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14, 83–103.
MathSciNet MATH Google Scholar
Madani, O., Hanks, S., & Condon, A. (1999). On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI’99) and the 11th Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, Orlando, FL.
Google Scholar
Monahan, G. (1982). A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science, 28, 1–16.
Article MathSciNet MATH Google Scholar
Papadimitriou, C., & Tsitsiklis, J. (1987). The complexity of Markov decision process. Mathematics of Operations Research, 12(3), 441–450.
Article MathSciNet MATH Google Scholar
Paquet, S. (2006). Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments. Ph.D. thesis, Université Laval.
Google Scholar
Paquet, S., Tobin, L., & Chaib-draa, B. (2005). An online POMDP algorithm for complex multiagent environments. In Proceedings of the 4th International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS’05), Utrecht.
Google Scholar
Pieraccini, R., Levin, E., & Eckert, W. (1997). Learning dialogue strategies within Markov decision process framework. In Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU’97), Rhodes.
Google Scholar
Pietquin, O. (2004). A framework for unsupervised learning of dialogue strategies. Ph.D. thesis, Faculté Polytechnique de Mons.
Google Scholar
Pietquin, O. (2006). Consistent goal-directed user model for realistic man-machine task-oriented spoken dialogue simulation. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME’06), Toronto, ON (pp. 425–428).
Google Scholar
Pietquin, O., & Dutoit, T. (2006). A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 589–599.
Article Google Scholar
Pineau, J. (2004). Tractable planning under uncertainty: Exploiting structure. Ph.D. thesis, Rutgers University.
Google Scholar
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco.
Google Scholar
Png, S., & Pineau, J. (2011). Bayesian reinforcement learning for POMDP-based dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11), Prague.
Google Scholar
Png, S., Pineau, J., & Chaib-Draa, B. (2012). Building adaptive dialogue systems via bayes-adaptive pomdps. IEEE Journal of Selected Topics in Signal Processing, 6(8), 917–927.
Article Google Scholar
Poupart, P., & Boutilier, C. (2002). Value-directed compression of POMDPs. In Advances in Neural Information Processing Systems 14 (NIPS’02), Vancouver, BC.
Google Scholar
Rieser, V., & Lemon, O. (2006). Cluster-based user simulations for learning dialogue strategies. In Proceedings of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH’06), Pittsburgh, PA.
Google Scholar
Rieser, V., & Lemon, O. (2011). Reinforcement learning for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language generation. Springer Science & Business Media.
Book Google Scholar
Ross, S., Pineau, J., Chaib-draa, B., & Kreitmann, P. (2011). A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research, 12, 1729–1770.
MathSciNet MATH Google Scholar
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Artificial Intelligence Research, 32(1), 663–704.
MathSciNet MATH Google Scholar
Roy, N., Gordon, J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.
Article MATH Google Scholar
Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL’00), Hong Kong.
Google Scholar
Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., & Young, S. (2007). Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (pp. 149–152). Association for Computational Linguistics.
Google Scholar
Schatzmann, J., Weilhammer, K., Stuttle, M., & Young, S. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review, 21(2), 97–126.
Article Google Scholar
Schatzmann, J., & Young, S. (2009). The hidden agenda user simulation model. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 733–747.
Article Google Scholar
Scheffler, K., & Young, S. (2000). Probabilistic simulation of human-machine dialogues. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’00) (Vol. 2, pp. 1217–1220).
Google Scholar
Smallwood, R., & Sondik, E. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071–1088.
Article MATH Google Scholar
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), Banff, AB.
Google Scholar
Sondik, E. (1971). The optimal control of partially observable Markov processes. Ph.D. thesis, Stanford University.
Google Scholar
Spaan, M., & Spaan, N. (2004). A point-based POMDP algorithm for robot planning. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA’04), New Orleans, LA.
Google Scholar
Spaan, M., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24(1), 195–220.
MATH Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Google Scholar
Thomson, B. (2009). Statistical methods for spoken dialogue management. Ph.D. thesis, Department of Engineering, University of Cambridge.
Google Scholar
Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech and Language, 24(4), 562–588.
Article Google Scholar
Traum, D. (1994). A computational theory of grounding in natural language conversation. Ph.D. thesis, University of Rochester.
Google Scholar
Watkins, C. J. C. H., & Dayan, P. (1992). Technical note Q-Learning. Machine Learning, 8, 279–292.
MATH Google Scholar
Wierstra, D., & Wiering, M. (2004). Utile distinction hidden Markov models. In Proceedings of the Twenty-First International Conference on Machine Learning (p. 108). New York: ACM.
Google Scholar
Williams, J. D. (2006). Partially observable Markov decision processes for spoken dialogue management. Ph.D. thesis, Department of Engineering, University of Cambridge.
Google Scholar
Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21, 393–422.
Article Google Scholar
Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160–1179.
Article Google Scholar
Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001b). Planning and acting under uncertainty: A new model for spoken dialogue system. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI’01), Seattle, Washington.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Toronto, Toronto, ON, Canada
Hamidreza Chinaei
Université Laval, Quebec, QC, Canada
Brahim Chaib-draa

Authors

Hamidreza Chinaei
View author publications
You can also search for this author in PubMed Google Scholar
Brahim Chaib-draa
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chinaei, H., Chaib-draa, B. (2016). Sequential Decision Making in Spoken Dialog Management. In: Building Dialogue POMDPs from Expert Dialogues. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-26200-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-26200-0_3
Published: 09 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26198-0
Online ISBN: 978-3-319-26200-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics