Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

Vamplew, Peter; Dazeley, Richard; Barker, Ewan; Kelarev, Andrei

doi:10.1007/978-3-642-10439-8_35

Peter Vamplew²¹,
Richard Dazeley²¹,
Ewan Barker²¹ &
…
Andrei Kelarev²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5866))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1680 Accesses
13 Citations

Abstract

Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user’s preferences than can be achieved by methods based strictly on deterministic policies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Coello Coello, C.A.: Handling Preferences in Evolutionary Multiobjective Optimization: A Survey. In: 2000 Congress on Evolutionary Computation, vol. 1, pp. 30–37 (2000)
Google Scholar
Tesauro, G., Das, R., Chan, H., Kephart, J.O., Lefurgy, C., Levine, D.W., Rawson, F.: Managing power consumption and performance of computing systems using reinforcement learning. In: Neural Information Processing Systems (2007)
Google Scholar
Natarajan, S., Tadepalli, P.: Dynamic preferences in multi-criteria reinforcement learning. In: International Conference on Machine Learning, Bonn, Germany, pp. 601–608 (2005)
Google Scholar
Castelletti, A., Corani, G., Rizzolli, A., Soncinie-Sessa, R., Weber, E.: Reinforcement learning in the operational management of a water system. In: IFAC Workshop on Modeling and Control in Environmental Issues, Keio University, Yokohama, Japan, pp. 325–330 (2002)
Google Scholar
Vamplew, P., Yearwood, J., Dazeley, R., Berry, A.: On the Limitations of Scalarisation for Multiobjective Learning of Pareto Fronts. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 372–378. Springer, Heidelberg (2008)
Chapter Google Scholar
Gabor, Z., Kalmar, Z., Szepesvari, C.: Multi-criteria reinforcement learning. In: The Fifteenth International Conference on Machine Learning, pp. 197–205 (1998)
Google Scholar
Mannor, S., Shimkin, N.: The steering approach for multi-criteria reinforcement learning. In: Neural Information Processing Systems, Vancouver, Canada, pp. 1563–1570 (2001)
Google Scholar
Mannor, S., Shimkin, N.: A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research 5, 325–360 (2004)
MathSciNet Google Scholar
Shelton, C.R.: Importance sampling for reinforcement learning with multiple objectives, Massachusetts Institute of Technology AI Laboratory Tech Report No. 2001-003 (2001)
Google Scholar
Mahadevan, S., Ghavamzadeh, M., Theocharous, G., Rohanimanesh, K.: Hierarchical Approaches to Concurrency, Multiagency, and Partial Observability. In: Si, J., Barto, A., Powell, W., Wunsch, D. (eds.) Handbook of Learning and Adaptive Dynamic Programming, pp. 285–310. Wiley-IEEE (2004)
Google Scholar
Kelley, J.L., Namioka, I.: Linear topological spaces. Graduate Texts in Mathematics, vol. 36. Springer, Heidelberg (1976)
MATH Google Scholar
Barrett, L., Narayanan, S.: Learning All Optimal Policies with Multiple Criteria. In: Proceedings of the International Conference on Machine Learning (2008)
Google Scholar
Seidel, R.: Convex Hull Computations. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, pp. 361–376. CRC Press, Boca Raton (1997)
Google Scholar
Agrawal, G., Lewis, K., Chugh, K., Huang, C.-H., Parashar, S., Bloebaum, C.L.: Intuitive Visualization of Pareto Frontier for Multi-Objective Optimization in n-Dimensional Performance Space. In: 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Albany, NY (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Technology and Mathematical Sciences, University of Ballarat, University Drive, Mount Helen, Victoria, 3353, Australia
Peter Vamplew, Richard Dazeley, Ewan Barker & Andrei Kelarev

Authors

Peter Vamplew
View author publications
You can also search for this author in PubMed Google Scholar
Richard Dazeley
View author publications
You can also search for this author in PubMed Google Scholar
Ewan Barker
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Kelarev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Clayton School of Information Technology, Monash University, 3800, Clayton, VIC, Australia
Ann Nicholson
School of Computer Science and Information Technology, RMIT University, 3001, Melbourne, VIC, Australia
Xiaodong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vamplew, P., Dazeley, R., Barker, E., Kelarev, A. (2009). Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks. In: Nicholson, A., Li, X. (eds) AI 2009: Advances in Artificial Intelligence. AI 2009. Lecture Notes in Computer Science(), vol 5866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10439-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-10439-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10438-1
Online ISBN: 978-3-642-10439-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics