Exploration from Generalization Mediated by Multiple Controllers

Dayan, Peter

doi:10.1007/978-3-642-32375-1_4

Peter Dayan²

3411 Accesses
7 Citations

Abstract

Intrinsic motivation involves internally governed drives for exploration, curiosity, and play. These shape subjects over the course of development and beyond to explore to learn and expand the actions they are capable of performing and to acquire skills that can be useful in future domains. We adopt a utilitarian view of this learning process, treating it in terms of exploration bonuses that arise from distributions over the structure of the world that imply potential benefits from generalizing knowledge and skills to subsequent environments. We discuss how functionally and architecturally different controllers may realize these bonuses in different ways.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acuna, D., Schrater, P.: Improving bayesian reinforcement learning using transition abstraction. In: ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning. Montreal, Canada (2009)
Google Scholar
Asmuth, J., Li, L., Littman, M., Nouri, A., Wingate, D.: A bayesian sampling approach to exploration in reinforcement learning. In: UAI, Montreal, Canada (2009)
Google Scholar
Aston-Jones, G., Cohen, J.D.: An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403–450 (2005)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002a)
MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002b)
MathSciNet MATH Google Scholar
Balleine, B.W.: Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits. Physiol. Behav. 86(5), 717–730 (2005)
Google Scholar
Bandler, R., Shipley, M.T.: Columnar organization in the midbrain periaqueductal gray: Modules for emotional expression? Trends Neurosci. 17(9), 379–389 (1994)
Google Scholar
Barto, A.: Adaptive critics and the basal ganglia. In: Houk, J., Davis, J., Beiser, D. (eds.) Models of Information Processing in the Basal Ganglia, pp. 215–232. MIT, Cambridge (1995)
Google Scholar
Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13(4), 341–379 (2003)
MathSciNet MATH Google Scholar
Barto, A., Singh, S., Chentanez, N.: Intrinsically motivated learning of hierarchical collections of skills. In: ICDL 2004, La Jolla, CA (2004)
Google Scholar
Barto, A., Sutton, R., Anderson, C.: Neuronlike elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 13(5), 834–846 (1983)
Google Scholar
Barto, A.G.: Intrinsic motivation and reinforcement learning. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 17–47. Springer, Berlin (2012)
Google Scholar
Beal, M., Ghahramani, Z., Rasmussen, C.: The infinite hidden Markov model. In: NIPS, pp. 577–584, Vancouver, Canada (2002)
Google Scholar
Behrens, T.E.J., Woolrich, M.W., Walton, M.E., Rushworth, M.F.S.: Learning the value of information in an uncertain world. Nat. Neurosci. 10(9), 1214–1221 (2007)
Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Berridge, K.C.: Motivation concepts in behavioral neuroscience. Physiol. Behav. 81, 179–209 (2004)
Google Scholar
Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Springer, Berlin (1985)
MATH Google Scholar
Blanchard, D.C., Blanchard, R.J.: Ethoexperimental approaches to the biology of emotion. Annu. Rev. Psychol. 39, 43–68 (1988)
Google Scholar
Blank, D., Kumar, D., Meeden, L., Marshall, J.: Bringing up robot: Fundamental mechanisms for creating a self-motivated, self-organizing architecture. Cybern. Syst. 36(2), 125–150 (2005)
MATH Google Scholar
Bolles, R.C.: Species-specific defense reactions and avoidance learning. Psychol. Rev. 77, 32–48 (1970)
Google Scholar
Botvinick, M.M., Niv, Y., Barto, A.C.: Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition 113(3), 262–280 (2009)
Google Scholar
Boureau, Y.-L., Dayan, P.: Opponency revisited: Competition and cooperation between dopamine and serotonin. Neuropsychopharmacology 36(1), 74–97 (2011)
Google Scholar
Brafman, R., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)
MathSciNet MATH Google Scholar
Breland, K., Breland, M.: The misbehavior of organisms. Am. Psychol. 16(9), 681–84 (1961)
Google Scholar
Carpenter, G., Grossberg, S.: The ART of adaptive pattern recognition by a self-organizing neural network. Computer 21, 77–88 (1988)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
MathSciNet Google Scholar
Collins, A.: Apprentissage et Contrôle Cognitif: Une Théorie de la Fonction Executive Préfrontale Humaine. Ph.D. Thesis, Université Pierre et Marie Curie, Paris (2010)
Google Scholar
Courville, A., Daw, N., Touretzky, D.: Similarity and discrimination in classical conditioning: A latent variable account. In: NIPS, pp. 313–320, Vancouver, Canada (2004)
Google Scholar
Daw, N.D., Doya, K.: The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16(2), 199–204 (2006)
Google Scholar
Daw, N.D., Kakade, S., Dayan, P.: Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–16 (2002)
Google Scholar
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)
Google Scholar
Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441(7095), 876–879 (2006)
Google Scholar
Dayan, P.: Bilinearity, rules, and prefrontal cortex. Front. Comput. Neurosci. 1, 1 (2007)
Google Scholar
Dayan, P., Hinton, G.: Feudal reinforcement learning. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems (NIPS) 5. MIT, Cambridge (1993)
Google Scholar
Dayan, P., Huys, Q.J.M.: Serotonin, inhibition, and negative mood. PLoS Comput. Biol. 4(2), e4 (2008)
Google Scholar
Dayan, P., Huys, Q.J.M.: Serotonin in affective control. Annu. Rev. Neurosci. 32, 95–126 (2009)
Google Scholar
Dayan, P., Niv, Y., Seymour, B., Daw, N.D.: The misbehavior of value and the discipline of the will. Neural Netw. 19(8), 1153–1160 (2006)
MATH Google Scholar
Dayan, P., Sejnowski, T.: Exploration bonuses and dual control. Mach. Learn. 25(1), 5–22 (1996)
Google Scholar
Deakin, J.F.W., Graeff, F.G.: 5-HT and mechanisms of defence. J. Psychopharmacol. 5, 305–316 (1991)
Google Scholar
Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI, Stockholm, Sweden pp. 150–159 (1999)
Google Scholar
Deci, E., Ryan, R.: Intrinsic motivation and self-determination in human behavior. Plenum, New York (1985)
Google Scholar
Dickinson, A.: Contemporary animal learning theory. Cambridge University Press, Cambridge (1980)
Google Scholar
Dickinson, A., Balleine, B.: The role of learning in motivation. In: Gallistel, C. (ed.) Stevens’ Handbook of Experimental Psychology, vol. 3, pp. 497–533. Wiley, New York (2002)
Google Scholar
Dietterich, T.: The MAXQ method for hierarchical reinforcement learning. In: ICML, pp. 118–126, Madison, Wisconsin, (1998)
Google Scholar
Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13(1), 227–303 (2000)
MathSciNet MATH Google Scholar
Doya, K.: Metalearning and neuromodulation. Neural Netw. 15(4–6), 495–506 (2002)
Google Scholar
Doya, K., Samejima, K., ichi Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Comput. 14(6), 1347–1369 (2002)
MATH Google Scholar
Duff, M.: Optimal Learning: Computational approaches for Bayes-adaptive Markov decision processes. Ph.D. Thesis, Computer Science Department, University of Massachusetts, Amherst (2000)
Google Scholar
Foster, D., Dayan, P.: Structure in the space of value functions. Mach. Learn. 49(2), 325–346 (2002)
MATH Google Scholar
Gershman, S., Cohen, J., Niv, Y.: Learning to selectively attend. In: Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Portland, Oregon (2010a)
Google Scholar
Gershman, S., Niv, Y.: Learning latent structure: Carving nature at its joints. Curr. Opin. Neurobiol. (2010)
Google Scholar
Gershman, S.J., Blei, D.M., Niv, Y.: Context, learning, and extinction. Psychol. Rev. 117(1), 197–209 (2010b)
Google Scholar
Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989)
MATH Google Scholar
Goodkin, F.: Rats learn the relationship between responding and environmental events: An expansion of the learned helplessness hypothesis. Learn. Motiv. 7, 382–393 (1976)
Google Scholar
Gray, J.A., McNaughton, N.: The Neuropsychology of Anxiety, 2nd edn. OUP, Oxford (2003)
Google Scholar
Guthrie, E.: The Psychology of Learning. Harper & Row, New York (1952)
Google Scholar
Hazy, T.E., Frank, M.J., O’reilly, R.C.: Towards an executive without a homunculus: Computational models of the prefrontal cortex/basal ganglia system. Philos. Trans. R. Soc. Lond. B Biol. Sci. 362(1485), 1601–1613 (2007)
Google Scholar
Hempel, C.M., Hartman, K.H., Wang, X.J., Turrigiano, G.G., Nelson, S.B.: Multiple forms of short-term plasticity at excitatory synapses in rat medial prefrontal cortex. J. Neurophysiol. 83(5), 3031–3041 (2000)
Google Scholar
Hershberger, W.A.: An approach through the looking-glass. Anim. Learn. Behav. 14, 443–51 (1986)
Google Scholar
Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep” algorithm for unsupervised neural networks. Science 268(5214), 1158–1161 (1995)
Google Scholar
Hinton, G.E., Ghahramani, Z.: Generative models for discovering sparse distributed representations. Philos. Trans. R. Soc. Lond. B Biol. Sci. 352(1358), 1177–1190 (1997)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
MathSciNet MATH Google Scholar
Holland, P.: Amount of training affects associatively-activated event representation. Neuropharmacology 37(4–5), 461–469 (1998)
Google Scholar
Horvitz, J.C.: Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96(4), 651–656 (2000)
Google Scholar
Horvitz, J.C., Stewart, T., Jacobs, B.L.: Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759(2), 251–258 (1997)
Google Scholar
Howard, R.: Information value theory. IEEE Trans. Syst. Sci. Cybern. 2(1), 22–26 (1966)
Google Scholar
Huang, X., Weng, J.: Inherent value systems for autonomous mental development. Int. J. Human. Robot. 4, 407–433 (2007)
Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer, Berlin (2005)
MATH Google Scholar
Huys, Q.: Reinforcers and control. Towards a computational ætiology of depression. Ph.D. Thesis, Gatsby Computational Neuroscience Unit, UCL (2007)
Google Scholar
Huys, Q.J.M., Dayan, P.: A Bayesian formulation of behavioral control. Cognition 113, 314–328 (2009)
Google Scholar
Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4–6), 665–687 (2002)
Google Scholar
Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
MathSciNet MATH Google Scholar
Kakade, S., Dayan, P.: Dopamine: Generalization and bonuses. Neural Netw. 15(4–6), 549–559 (2002)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2), 209–232 (2002)
MATH Google Scholar
Keay, K.A., Bandler, R.: Parallel circuits mediating distinct emotional coping reactions to different types of stress. Neurosci. Biobehav. Rev. 25(7–8), 669–678 (2001)
Google Scholar
Killcross, S., Coutureau, E.: Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13(4), 400–408 (2003)
Google Scholar
Konidaris, G., Barto, A.: Building portable options: Skill transfer in reinforcement learning. In: IJCAI, pp. 895–900, Hyderabad, India (2007)
Google Scholar
Konidaris, G., Barto, A.: Efficient skill learning using abstraction selection. In: IJCAI, pp. 1107–1112, Pasadena, California (2009)
Google Scholar
Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110(3), 380–394 (2009)
Google Scholar
Mackintosh, N.J.: Conditioning and Associative Learning. Oxford University Press, Oxford (1983)
Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
MathSciNet MATH Google Scholar
Maier, S.F., Amat, J., Baratta, M.V., Paul, E., Watkins, L.R.: Behavioral control, the medial prefrontal cortex, and resilience. Dialogues Clin. Neurosci. 8(4), 397–406 (2006)
Google Scholar
Maier, S.F., Watkins, L.R.: Stressor controllability and learned helplessness: The roles of the dorsal raphe nucleus, serotonin, and corticotropin-releasing factor. Neurosci. Biobehav. Rev. 29(4–5), 829–841 (2005)
Google Scholar
McNaughton, N., Corr, P.J.: A two-dimensional neuropsychology of defense: Fear/anxiety and defensive distance. Neurosci. Biobehav. Rev. 28(3), 285–305 (2004)
Google Scholar
Mirolli, M., Baldassarre, G.: Functions and mechanisms of intrinsic motivations: The knowledge versus competence distinction. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 49–72. Springer, Berlin (2012)
Google Scholar
Mongillo, G., Barak, O., Tsodyks, M.: Synaptic theory of working memory. Science 319(5869), 1543–1546 (2008)
Google Scholar
Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci. 16(5), 1936–1947 (1996)
Google Scholar
Neal, R.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)
MathSciNet Google Scholar
Ng, A., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: ICML, pp. 278–287, Bled, Slovenia (1999)
Google Scholar
Nouri, A., Littman, M.: Multi-resolution exploration in continuous spaces. NIPS, pp. 1209–1216 (2009)
Google Scholar
O’Reilly, R.C., Frank, M.J.: Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 18(2), 283–328 (2006)
MathSciNet MATH Google Scholar
Oudeyer, P., Kaplan, F., Hafner, V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11(2), 265–286 (2007)
Google Scholar
Panksepp, J.: Affective Neuroscience. OUP, New York (1998)
Google Scholar
Papadimitriou, C., Tsitsiklis, J.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
MathSciNet MATH Google Scholar
Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: NIPS, pp. 1043–1049, Denver, Colorado (1998)
Google Scholar
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete bayesian reinforcement learning. In: ICML, pp. 697–704, Pittsburgh, Pennslyvania (2006)
Google Scholar
Rao, R.P.N., Olshausen, B.A., Lewicki, M.S. (eds.): Probabilistic Models of the Brain: Perception and Neural Function. MIT, Cambridge (2002)
Google Scholar
Redgrave, P., Gurney, K., Stafford, T., Thirkettle, M., Lewis, J.: The role of the basal ganglia in discovering novel actions. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 129–149. Springer, Berlin (2012)
Google Scholar
Redgrave, P., Prescott, T.J., Gurney, K.: Is the short-latency dopamine response too short to signal reward error? Trends Neurosci. 22(4), 146–151 (1999)
Google Scholar
Reynolds, S.M., Berridge, K.C. (2001): Fear and feeding in the nucleus accumbens shell: Rostrocaudal segregation of GABA-elicited defensive behavior versus eating behavior. J. Neurosci. 21(9), 3261–3270 (1999)
Google Scholar
Reynolds, S.M., Berridge, K.C.: Positive and negative motivation in nucleus accumbens shell: Bivalent rostrocaudal gradients for GABA-elicited eating, taste “liking”/“disliking” reactions, place preference/avoidance, and fear. J. Neurosci. 22(16), 7308–7320 (2002)
Google Scholar
Reynolds, S.M., Berridge, K.C.: Emotional environments retune the valence of appetitive versus fearful functions in nucleus accumbens. Nat. Neurosci. 11(4), 423–425 (2008)
Google Scholar
Ring, M.: CHILD: A first step towards continual learning. Mach. Learn. 28(1), 77–104 (1997)
MATH Google Scholar
Ring, M.: Toward a formal framework for continual learning. In: NIPS Workshop on Inductive Transfer, Whistler, Canada (2005)
Google Scholar
Rushworth, M.F.S., Behrens, T.E.J.: Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11(4), 389–397 (2008)
Google Scholar
Ryan, R., Deci, E.: Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp. Educ. Psychol. 25(1), 54–67 (2000)
Google Scholar
Samejima, K., Doya, K., Kawato, M.: Inter-module credit assignment in modular reinforcement learning. Neural Netw. 16(7), 985–994 (2003)
Google Scholar
Samuel, A.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3, 210–229 (1959)
MathSciNet Google Scholar
Schembri, M., Mirolli, M., Baldassarre, G.: Evolving childhood’s length and learning parameters in an intrinsically motivated reinforcement learning robot. In: Proceedings of the Seventh International Conference on Epigenetic Robotics, pp. 141–148, Piscataway, New Jersey (2007)
Google Scholar
Schmidhuber, J.: Curious model-building control systems. In: IJCNN, pp. 1458–1463, Seattle, Washington State IEEE (1991)
Google Scholar
Schmidhuber, J.: Gödel machines: Fully self-referential optimal universal self-improvers. Artif. Gen. Intell., pp. 199–226 (2006)
Google Scholar
Schmidhuber, J.: Ultimate cognition à la gödel. Cogn. Comput. 1, 117–193 (2009)
Google Scholar
Seligman, M.: Helplessness: On Depression, Development, and Death. WH Freeman, San Francisco (1975)
Google Scholar
Sheffield, F.: Relation between classical conditioning and instrumental learning. In: Prokasy, W. (ed.) Classical Conditioning, pp. 302–322. Appelton-Century-Crofts, New York (1965)
Google Scholar
Şimşek, Ö., Barto, A.G.: An intrinsic reward mechanism for efficient exploration. In: ICML, pp. 833–840, Pittsburgh, Pennsylvania (2006)
Google Scholar
Singh, S.: Transfer of learning by composing solutions of elemental sequential tasks. Mach. Learn. 8(3), 323–339 (1992)
MATH Google Scholar
Singh, S., Barto, A., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS, pp. 1281–1288, Vancouver, Canada (2005)
Google Scholar
Skinner, E.A.: A guide to constructs of control. J. Pers. Soc. Psychol. 71(3), 549–570 (1996)
Google Scholar
Smith, A., Li, M., Becker, S., Kapur, S.: Dopamine, prediction error and associative learning: A model-based account. Network 17(1), 61–84 (2006)
Google Scholar
Soubrié, P.: Reconciling the role of central serotonin neurons in human and animal behaviour. Behav. Brain Sci. 9, 319–364 (1986)
Google Scholar
Strens, M.: A Bayesian framework for reinforcement learning. In: ICML, pp. 943–950, Stanford, California (2000)
Google Scholar
Suri, R.E., Schultz, W.: A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91(3), 871–890 (1999)
Google Scholar
Sutton, R.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. ICML Austin, Texas 216, 224 (1990)
Google Scholar
Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)
MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). MIT, Cambridge (1998)
Google Scholar
Tanaka, F., Yamamura, M.: Multitask reinforcement learning on the distribution of MDPs. IEEJ Trans. Electron. Inform. Syst. C 123(5), 1004–1011 (2003)
Google Scholar
Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
MathSciNet MATH Google Scholar
Tenenbaum, J., Griffiths, T., Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10(7), 309–318 (2006)
Google Scholar
Thibaux, R., Jordan, M.: Hierarchical beta processes and the Indian buffet process. In: AIStats, pp. 564–571, San Juan, Puerto Rico (2007)
Google Scholar
Thorndike, E.: Animal Intelligence. MacMillan, New York (1911)
Google Scholar
Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: NIPS, pp. 385–392, Denver, Colorado (1995)
Google Scholar
Tolman, E.C.: Cognitive maps in rats and men. Psychol. Rev. 55(4), 189–208 (1948)
Google Scholar
Tricomi, E., Balleine, B.W., O’Doherty, J.P.: A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29(11), 2225–2232 (2009)
Google Scholar
Valentin, V.V., Dickinson, A., O’Doherty, J.P.: Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27(15), 4019–4026 (2007)
Google Scholar
Vasilaki, E., Fusi, S., Wang, X.-J., Senn, W. (2009): Learning flexible sensori-motor mappings in a complex network. Biol. Cybern. 100(2), 147–158 (2007)
MATH Google Scholar
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML, pp. 956–963, Bonn, Germany (2005)
Google Scholar
Watkins, C. (1989): Learning from delayed rewards. Ph.D. Thesis, University of Cambridge (2005)
Google Scholar
Wiering, M., Schmidhuber, J.: Efficient model-based exploration. In: Simulation of Adaptive Behavior, pp. 223–228, Zurich, Switzerland (1998)
Google Scholar
Williams, D.R., Williams, H.: Auto-maintenance in the pigeon: Sustained pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 12(4), 511–520 (1969)
Google Scholar
Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: A hierarchical bayesian approach. In: ICML, pp. 1015–1022, Corvallis, Oregon (2007)
Google Scholar
Wingate, D., Goodman, N.D., Roy, D.M., Kaelbling, L.P., Tenenbaum, J.B.: Bayesian policy search with policy priors. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume, vol. 2, pp. 1565–1570. AAAI Press, Menlo Park (2011)
Google Scholar
Wolpert, D.M., Kawato, M.: Multiple paired forward and inverse models for motor control. Neural Netw. 11(7–8), 1317–1329 (1998)
Google Scholar
Yoshida, W., Ishii, S.: Resolution of uncertainty in prefrontal cortex. Neuron 50(5), 781–789 (2006)
Google Scholar
Yu, A.J., Dayan, P.: Uncertainty, neuromodulation, and attention. Neuron 46(4), 681–692 (2005)
Google Scholar

Download references

Acknowledgements

I am very grateful to Andrew Barto, the editors, and two anonymous reviewers for their comments on this chapter. My work is funded by the Gatsby Charitable Foundation.

Author information

Authors and Affiliations

University College London Gatsby Computational Neuroscience Unit, London, UK
Peter Dayan

Authors

Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Dayan .

Editor information

Editors and Affiliations

Istituto di Scienze e Tecnologie della Cognizione Consiglio Nazionale delle Ricerche, Rome, Italy
Gianluca Baldassarre & Marco Mirolli &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dayan, P. (2013). Exploration from Generalization Mediated by Multiple Controllers. In: Baldassarre, G., Mirolli, M. (eds) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32375-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-32375-1_4
Published: 10 November 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32374-4
Online ISBN: 978-3-642-32375-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics