Defensive Universal Learning with Experts

Poland, Jan; Hutter, Marcus

doi:10.1007/11564089_28

Jan Poland²¹ &
Marcus Hutter²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3734))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

2108 Accesses
7 Citations

Abstract

This paper shows how universal learning can be achieved with expert advice. To this aim, we specify an experts algorithm with the following characteristics: (a) it uses only feedback from the actions actually chosen (bandit setup), (b) it can be applied with countably infinite expert classes, and (c) it copes with losses that may grow in time appropriately slowly. We prove loss bounds against an adaptive adversary. From this, we obtain a master algorithm for “reactive” experts problems, which means that the master’s actions may influence the behavior of the adversary. Our algorithm can significantly outperform standard experts algorithms on such problems. Finally, we combine it with a universal expert class. The resulting universal learner performs – in a certain sense – almost as well as any computable strategy, for any online decision problem. We also specify the (worst-case) convergence speed, which is very slow.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

de Farias, D.P., Megiddo, N.: How to combine expert (and novice) advice when actions impact the environment? In: Advances in Neural Information Processing Systems (NIPS) 16. MIT Press, Cambridge (2004)
Google Scholar
Hannan, J.: Approximation to Bayes risk in repeated plays. In: Dresher, M., Tucker, A.W., Wolfe, P. (eds.) Contributions to the Theory of Games 3, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Kalai, A., Vempala, S.: Efficient algorithms for online decision. In: Proc. 16th Annual Conference on Learning Theory (COLT), pp. 506–521. Springer, Heidelberg (2003)
Google Scholar
McMahan, H.B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 109–123. Springer, Heidelberg (2004)
Chapter Google Scholar
Hutter, M., Poland, J.: Prediction with expert advice by following the perturbed leader for general weights. In: International Conference on Algorithmic Learning Theory (ALT), pp. 279–293 (2004)
Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proc. 36th Annual Symposium on Foundations of Computer Science (FOCS), pp. 322–331. IEEE, Los Alamitos (1995)
Google Scholar
Poland, J.: FPL analysis for adaptive bandits. In: 3rd Symposium on Stochastic Algorithms, Foundations and Applications, SAGA (2005) (to appear)
Google Scholar
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)
Article MATH MathSciNet Google Scholar
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research 6, 639–660 (2005)
MathSciNet Google Scholar
Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Inform. Theory 24, 422–432 (1978)
Article MATH MathSciNet Google Scholar
Hutter, M.: Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions. In: Proc. 12^th European Conference on Machine Learning (ECML-2001), pp. 226–238 (2001)
Google Scholar
de Farias, D.P., Megiddo, N.: Exploration-exploitation tradeoffs for experts algorithms in reactive environments. In: Advances in Neural Information Processing Systems 17 (2005)
Google Scholar
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 77–92. Springer, Heidelberg (2004)
Chapter Google Scholar
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Regret minimization under partial monitoring. Technical report (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Grad. School of Inf. Sci. and Tech., Hokkaido University, Japan
Jan Poland
IDSIA, Galleria 2, CH-6928, Manno, (TI), Switzerland
Marcus Hutter

Authors

Jan Poland
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, National University of Singapore, 117590, Singapore
Sanjay Jain
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon
Department of Information and Communication Engineering, Faculty of Electro-Communications, The University of Electro-Communications, Chofugaoka 1–5–1, Chofu, 182-8585, Tokyo, Japan
Etsuji Tomita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Poland, J., Hutter, M. (2005). Defensive Universal Learning with Experts. In: Jain, S., Simon, H.U., Tomita, E. (eds) Algorithmic Learning Theory. ALT 2005. Lecture Notes in Computer Science(), vol 3734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564089_28

Download citation

DOI: https://doi.org/10.1007/11564089_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29242-5
Online ISBN: 978-3-540-31696-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics