Skip to main content

Exploration from Generalization Mediated by Multiple Controllers

  • Chapter
  • First Online:
Intrinsically Motivated Learning in Natural and Artificial Systems

Abstract

Intrinsic motivation involves internally governed drives for exploration, curiosity, and play. These shape subjects over the course of development and beyond to explore to learn and expand the actions they are capable of performing and to acquire skills that can be useful in future domains. We adopt a utilitarian view of this learning process, treating it in terms of exploration bonuses that arise from distributions over the structure of the world that imply potential benefits from generalizing knowledge and skills to subsequent environments. We discuss how functionally and architecturally different controllers may realize these bonuses in different ways.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acuna, D., Schrater, P.: Improving bayesian reinforcement learning using transition abstraction. In: ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning. Montreal, Canada (2009)

    Google Scholar 

  2. Asmuth, J., Li, L., Littman, M., Nouri, A., Wingate, D.: A bayesian sampling approach to exploration in reinforcement learning. In: UAI, Montreal, Canada (2009)

    Google Scholar 

  3. Aston-Jones, G., Cohen, J.D.: An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403–450 (2005)

    Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002a)

    MATH  Google Scholar 

  5. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002b)

    MathSciNet  MATH  Google Scholar 

  6. Balleine, B.W.: Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits. Physiol. Behav. 86(5), 717–730 (2005)

    Google Scholar 

  7. Bandler, R., Shipley, M.T.: Columnar organization in the midbrain periaqueductal gray: Modules for emotional expression? Trends Neurosci. 17(9), 379–389 (1994)

    Google Scholar 

  8. Barto, A.: Adaptive critics and the basal ganglia. In: Houk, J., Davis, J., Beiser, D. (eds.) Models of Information Processing in the Basal Ganglia, pp. 215–232. MIT, Cambridge (1995)

    Google Scholar 

  9. Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13(4), 341–379 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Barto, A., Singh, S., Chentanez, N.: Intrinsically motivated learning of hierarchical collections of skills. In: ICDL 2004, La Jolla, CA (2004)

    Google Scholar 

  11. Barto, A., Sutton, R., Anderson, C.: Neuronlike elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 13(5), 834–846 (1983)

    Google Scholar 

  12. Barto, A.G.: Intrinsic motivation and reinforcement learning. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 17–47. Springer, Berlin (2012)

    Google Scholar 

  13. Beal, M., Ghahramani, Z., Rasmussen, C.: The infinite hidden Markov model. In: NIPS, pp. 577–584, Vancouver, Canada (2002)

    Google Scholar 

  14. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., Rushworth, M.F.S.: Learning the value of information in an uncertain world. Nat. Neurosci. 10(9), 1214–1221 (2007)

    Google Scholar 

  15. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  16. Berridge, K.C.: Motivation concepts in behavioral neuroscience. Physiol. Behav. 81, 179–209 (2004)

    Google Scholar 

  17. Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Springer, Berlin (1985)

    MATH  Google Scholar 

  18. Blanchard, D.C., Blanchard, R.J.: Ethoexperimental approaches to the biology of emotion. Annu. Rev. Psychol. 39, 43–68 (1988)

    Google Scholar 

  19. Blank, D., Kumar, D., Meeden, L., Marshall, J.: Bringing up robot: Fundamental mechanisms for creating a self-motivated, self-organizing architecture. Cybern. Syst. 36(2), 125–150 (2005)

    MATH  Google Scholar 

  20. Bolles, R.C.: Species-specific defense reactions and avoidance learning. Psychol. Rev. 77, 32–48 (1970)

    Google Scholar 

  21. Botvinick, M.M., Niv, Y., Barto, A.C.: Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition 113(3), 262–280 (2009)

    Google Scholar 

  22. Boureau, Y.-L., Dayan, P.: Opponency revisited: Competition and cooperation between dopamine and serotonin. Neuropsychopharmacology 36(1), 74–97 (2011)

    Google Scholar 

  23. Brafman, R., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)

    MathSciNet  MATH  Google Scholar 

  24. Breland, K., Breland, M.: The misbehavior of organisms. Am. Psychol. 16(9), 681–84 (1961)

    Google Scholar 

  25. Carpenter, G., Grossberg, S.: The ART of adaptive pattern recognition by a self-organizing neural network. Computer 21, 77–88 (1988)

    Google Scholar 

  26. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    MathSciNet  Google Scholar 

  27. Collins, A.: Apprentissage et Contrôle Cognitif: Une Théorie de la Fonction Executive Préfrontale Humaine. Ph.D. Thesis, Université Pierre et Marie Curie, Paris (2010)

    Google Scholar 

  28. Courville, A., Daw, N., Touretzky, D.: Similarity and discrimination in classical conditioning: A latent variable account. In: NIPS, pp. 313–320, Vancouver, Canada (2004)

    Google Scholar 

  29. Daw, N.D., Doya, K.: The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16(2), 199–204 (2006)

    Google Scholar 

  30. Daw, N.D., Kakade, S., Dayan, P.: Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–16 (2002)

    Google Scholar 

  31. Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)

    Google Scholar 

  32. Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441(7095), 876–879 (2006)

    Google Scholar 

  33. Dayan, P.: Bilinearity, rules, and prefrontal cortex. Front. Comput. Neurosci. 1, 1 (2007)

    Google Scholar 

  34. Dayan, P., Hinton, G.: Feudal reinforcement learning. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems (NIPS) 5. MIT, Cambridge (1993)

    Google Scholar 

  35. Dayan, P., Huys, Q.J.M.: Serotonin, inhibition, and negative mood. PLoS Comput. Biol. 4(2), e4 (2008)

    Google Scholar 

  36. Dayan, P., Huys, Q.J.M.: Serotonin in affective control. Annu. Rev. Neurosci. 32, 95–126 (2009)

    Google Scholar 

  37. Dayan, P., Niv, Y., Seymour, B., Daw, N.D.: The misbehavior of value and the discipline of the will. Neural Netw. 19(8), 1153–1160 (2006)

    MATH  Google Scholar 

  38. Dayan, P., Sejnowski, T.: Exploration bonuses and dual control. Mach. Learn. 25(1), 5–22 (1996)

    Google Scholar 

  39. Deakin, J.F.W., Graeff, F.G.: 5-HT and mechanisms of defence. J. Psychopharmacol. 5, 305–316 (1991)

    Google Scholar 

  40. Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI, Stockholm, Sweden pp. 150–159 (1999)

    Google Scholar 

  41. Deci, E., Ryan, R.: Intrinsic motivation and self-determination in human behavior. Plenum, New York (1985)

    Google Scholar 

  42. Dickinson, A.: Contemporary animal learning theory. Cambridge University Press, Cambridge (1980)

    Google Scholar 

  43. Dickinson, A., Balleine, B.: The role of learning in motivation. In: Gallistel, C. (ed.) Stevens’ Handbook of Experimental Psychology, vol. 3, pp. 497–533. Wiley, New York (2002)

    Google Scholar 

  44. Dietterich, T.: The MAXQ method for hierarchical reinforcement learning. In: ICML, pp. 118–126, Madison, Wisconsin, (1998)

    Google Scholar 

  45. Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13(1), 227–303 (2000)

    MathSciNet  MATH  Google Scholar 

  46. Doya, K.: Metalearning and neuromodulation. Neural Netw. 15(4–6), 495–506 (2002)

    Google Scholar 

  47. Doya, K., Samejima, K., ichi Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Comput. 14(6), 1347–1369 (2002)

    MATH  Google Scholar 

  48. Duff, M.: Optimal Learning: Computational approaches for Bayes-adaptive Markov decision processes. Ph.D. Thesis, Computer Science Department, University of Massachusetts, Amherst (2000)

    Google Scholar 

  49. Foster, D., Dayan, P.: Structure in the space of value functions. Mach. Learn. 49(2), 325–346 (2002)

    MATH  Google Scholar 

  50. Gershman, S., Cohen, J., Niv, Y.: Learning to selectively attend. In: Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Portland, Oregon (2010a)

    Google Scholar 

  51. Gershman, S., Niv, Y.: Learning latent structure: Carving nature at its joints. Curr. Opin. Neurobiol. (2010)

    Google Scholar 

  52. Gershman, S.J., Blei, D.M., Niv, Y.: Context, learning, and extinction. Psychol. Rev. 117(1), 197–209 (2010b)

    Google Scholar 

  53. Gittins, J.C.: Multi-Armed Bandit Allocation Indices. Wiley, New York (1989)

    MATH  Google Scholar 

  54. Goodkin, F.: Rats learn the relationship between responding and environmental events: An expansion of the learned helplessness hypothesis. Learn. Motiv. 7, 382–393 (1976)

    Google Scholar 

  55. Gray, J.A., McNaughton, N.: The Neuropsychology of Anxiety, 2nd edn. OUP, Oxford (2003)

    Google Scholar 

  56. Guthrie, E.: The Psychology of Learning. Harper & Row, New York (1952)

    Google Scholar 

  57. Hazy, T.E., Frank, M.J., O’reilly, R.C.: Towards an executive without a homunculus: Computational models of the prefrontal cortex/basal ganglia system. Philos. Trans. R. Soc. Lond. B Biol. Sci. 362(1485), 1601–1613 (2007)

    Google Scholar 

  58. Hempel, C.M., Hartman, K.H., Wang, X.J., Turrigiano, G.G., Nelson, S.B.: Multiple forms of short-term plasticity at excitatory synapses in rat medial prefrontal cortex. J. Neurophysiol. 83(5), 3031–3041 (2000)

    Google Scholar 

  59. Hershberger, W.A.: An approach through the looking-glass. Anim. Learn. Behav. 14, 443–51 (1986)

    Google Scholar 

  60. Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep” algorithm for unsupervised neural networks. Science 268(5214), 1158–1161 (1995)

    Google Scholar 

  61. Hinton, G.E., Ghahramani, Z.: Generative models for discovering sparse distributed representations. Philos. Trans. R. Soc. Lond. B Biol. Sci. 352(1358), 1177–1190 (1997)

    Google Scholar 

  62. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    MathSciNet  MATH  Google Scholar 

  63. Holland, P.: Amount of training affects associatively-activated event representation. Neuropharmacology 37(4–5), 461–469 (1998)

    Google Scholar 

  64. Horvitz, J.C.: Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96(4), 651–656 (2000)

    Google Scholar 

  65. Horvitz, J.C., Stewart, T., Jacobs, B.L.: Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759(2), 251–258 (1997)

    Google Scholar 

  66. Howard, R.: Information value theory. IEEE Trans. Syst. Sci. Cybern. 2(1), 22–26 (1966)

    Google Scholar 

  67. Huang, X., Weng, J.: Inherent value systems for autonomous mental development. Int. J. Human. Robot. 4, 407–433 (2007)

    Google Scholar 

  68. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer, Berlin (2005)

    MATH  Google Scholar 

  69. Huys, Q.: Reinforcers and control. Towards a computational ætiology of depression. Ph.D. Thesis, Gatsby Computational Neuroscience Unit, UCL (2007)

    Google Scholar 

  70. Huys, Q.J.M., Dayan, P.: A Bayesian formulation of behavioral control. Cognition 113, 314–328 (2009)

    Google Scholar 

  71. Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4–6), 665–687 (2002)

    Google Scholar 

  72. Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)

    MathSciNet  MATH  Google Scholar 

  73. Kakade, S., Dayan, P.: Dopamine: Generalization and bonuses. Neural Netw. 15(4–6), 549–559 (2002)

    Google Scholar 

  74. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2), 209–232 (2002)

    MATH  Google Scholar 

  75. Keay, K.A., Bandler, R.: Parallel circuits mediating distinct emotional coping reactions to different types of stress. Neurosci. Biobehav. Rev. 25(7–8), 669–678 (2001)

    Google Scholar 

  76. Killcross, S., Coutureau, E.: Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13(4), 400–408 (2003)

    Google Scholar 

  77. Konidaris, G., Barto, A.: Building portable options: Skill transfer in reinforcement learning. In: IJCAI, pp. 895–900, Hyderabad, India (2007)

    Google Scholar 

  78. Konidaris, G., Barto, A.: Efficient skill learning using abstraction selection. In: IJCAI, pp. 1107–1112, Pasadena, California (2009)

    Google Scholar 

  79. Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110(3), 380–394 (2009)

    Google Scholar 

  80. Mackintosh, N.J.: Conditioning and Associative Learning. Oxford University Press, Oxford (1983)

    Google Scholar 

  81. Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)

    MathSciNet  MATH  Google Scholar 

  82. Maier, S.F., Amat, J., Baratta, M.V., Paul, E., Watkins, L.R.: Behavioral control, the medial prefrontal cortex, and resilience. Dialogues Clin. Neurosci. 8(4), 397–406 (2006)

    Google Scholar 

  83. Maier, S.F., Watkins, L.R.: Stressor controllability and learned helplessness: The roles of the dorsal raphe nucleus, serotonin, and corticotropin-releasing factor. Neurosci. Biobehav. Rev. 29(4–5), 829–841 (2005)

    Google Scholar 

  84. McNaughton, N., Corr, P.J.: A two-dimensional neuropsychology of defense: Fear/anxiety and defensive distance. Neurosci. Biobehav. Rev. 28(3), 285–305 (2004)

    Google Scholar 

  85. Mirolli, M., Baldassarre, G.: Functions and mechanisms of intrinsic motivations: The knowledge versus competence distinction. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 49–72. Springer, Berlin (2012)

    Google Scholar 

  86. Mongillo, G., Barak, O., Tsodyks, M.: Synaptic theory of working memory. Science 319(5869), 1543–1546 (2008)

    Google Scholar 

  87. Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci. 16(5), 1936–1947 (1996)

    Google Scholar 

  88. Neal, R.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)

    MathSciNet  Google Scholar 

  89. Ng, A., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: ICML, pp. 278–287, Bled, Slovenia (1999)

    Google Scholar 

  90. Nouri, A., Littman, M.: Multi-resolution exploration in continuous spaces. NIPS, pp. 1209–1216 (2009)

    Google Scholar 

  91. O’Reilly, R.C., Frank, M.J.: Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 18(2), 283–328 (2006)

    MathSciNet  MATH  Google Scholar 

  92. Oudeyer, P., Kaplan, F., Hafner, V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11(2), 265–286 (2007)

    Google Scholar 

  93. Panksepp, J.: Affective Neuroscience. OUP, New York (1998)

    Google Scholar 

  94. Papadimitriou, C., Tsitsiklis, J.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)

    MathSciNet  MATH  Google Scholar 

  95. Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: NIPS, pp. 1043–1049, Denver, Colorado (1998)

    Google Scholar 

  96. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete bayesian reinforcement learning. In: ICML, pp. 697–704, Pittsburgh, Pennslyvania (2006)

    Google Scholar 

  97. Rao, R.P.N., Olshausen, B.A., Lewicki, M.S. (eds.): Probabilistic Models of the Brain: Perception and Neural Function. MIT, Cambridge (2002)

    Google Scholar 

  98. Redgrave, P., Gurney, K., Stafford, T., Thirkettle, M., Lewis, J.: The role of the basal ganglia in discovering novel actions. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 129–149. Springer, Berlin (2012)

    Google Scholar 

  99. Redgrave, P., Prescott, T.J., Gurney, K.: Is the short-latency dopamine response too short to signal reward error? Trends Neurosci. 22(4), 146–151 (1999)

    Google Scholar 

  100. Reynolds, S.M., Berridge, K.C. (2001): Fear and feeding in the nucleus accumbens shell: Rostrocaudal segregation of GABA-elicited defensive behavior versus eating behavior. J. Neurosci. 21(9), 3261–3270 (1999)

    Google Scholar 

  101. Reynolds, S.M., Berridge, K.C.: Positive and negative motivation in nucleus accumbens shell: Bivalent rostrocaudal gradients for GABA-elicited eating, taste “liking”/“disliking” reactions, place preference/avoidance, and fear. J. Neurosci. 22(16), 7308–7320 (2002)

    Google Scholar 

  102. Reynolds, S.M., Berridge, K.C.: Emotional environments retune the valence of appetitive versus fearful functions in nucleus accumbens. Nat. Neurosci. 11(4), 423–425 (2008)

    Google Scholar 

  103. Ring, M.: CHILD: A first step towards continual learning. Mach. Learn. 28(1), 77–104 (1997)

    MATH  Google Scholar 

  104. Ring, M.: Toward a formal framework for continual learning. In: NIPS Workshop on Inductive Transfer, Whistler, Canada (2005)

    Google Scholar 

  105. Rushworth, M.F.S., Behrens, T.E.J.: Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11(4), 389–397 (2008)

    Google Scholar 

  106. Ryan, R., Deci, E.: Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp. Educ. Psychol. 25(1), 54–67 (2000)

    Google Scholar 

  107. Samejima, K., Doya, K., Kawato, M.: Inter-module credit assignment in modular reinforcement learning. Neural Netw. 16(7), 985–994 (2003)

    Google Scholar 

  108. Samuel, A.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3, 210–229 (1959)

    MathSciNet  Google Scholar 

  109. Schembri, M., Mirolli, M., Baldassarre, G.: Evolving childhood’s length and learning parameters in an intrinsically motivated reinforcement learning robot. In: Proceedings of the Seventh International Conference on Epigenetic Robotics, pp. 141–148, Piscataway, New Jersey (2007)

    Google Scholar 

  110. Schmidhuber, J.: Curious model-building control systems. In: IJCNN, pp. 1458–1463, Seattle, Washington State IEEE (1991)

    Google Scholar 

  111. Schmidhuber, J.: Gödel machines: Fully self-referential optimal universal self-improvers. Artif. Gen. Intell., pp. 199–226 (2006)

    Google Scholar 

  112. Schmidhuber, J.: Ultimate cognition à la gödel. Cogn. Comput. 1, 117–193 (2009)

    Google Scholar 

  113. Seligman, M.: Helplessness: On Depression, Development, and Death. WH Freeman, San Francisco (1975)

    Google Scholar 

  114. Sheffield, F.: Relation between classical conditioning and instrumental learning. In: Prokasy, W. (ed.) Classical Conditioning, pp. 302–322. Appelton-Century-Crofts, New York (1965)

    Google Scholar 

  115. Şimşek, Ö., Barto, A.G.: An intrinsic reward mechanism for efficient exploration. In: ICML, pp. 833–840, Pittsburgh, Pennsylvania (2006)

    Google Scholar 

  116. Singh, S.: Transfer of learning by composing solutions of elemental sequential tasks. Mach. Learn. 8(3), 323–339 (1992)

    MATH  Google Scholar 

  117. Singh, S., Barto, A., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS, pp. 1281–1288, Vancouver, Canada (2005)

    Google Scholar 

  118. Skinner, E.A.: A guide to constructs of control. J. Pers. Soc. Psychol. 71(3), 549–570 (1996)

    Google Scholar 

  119. Smith, A., Li, M., Becker, S., Kapur, S.: Dopamine, prediction error and associative learning: A model-based account. Network 17(1), 61–84 (2006)

    Google Scholar 

  120. Soubrié, P.: Reconciling the role of central serotonin neurons in human and animal behaviour. Behav. Brain Sci. 9, 319–364 (1986)

    Google Scholar 

  121. Strens, M.: A Bayesian framework for reinforcement learning. In: ICML, pp. 943–950, Stanford, California (2000)

    Google Scholar 

  122. Suri, R.E., Schultz, W.: A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91(3), 871–890 (1999)

    Google Scholar 

  123. Sutton, R.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  124. Sutton, R.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. ICML Austin, Texas 216, 224 (1990)

    Google Scholar 

  125. Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)

    MathSciNet  MATH  Google Scholar 

  126. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). MIT, Cambridge (1998)

    Google Scholar 

  127. Tanaka, F., Yamamura, M.: Multitask reinforcement learning on the distribution of MDPs. IEEJ Trans. Electron. Inform. Syst. C 123(5), 1004–1011 (2003)

    Google Scholar 

  128. Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    MathSciNet  MATH  Google Scholar 

  129. Tenenbaum, J., Griffiths, T., Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10(7), 309–318 (2006)

    Google Scholar 

  130. Thibaux, R., Jordan, M.: Hierarchical beta processes and the Indian buffet process. In: AIStats, pp. 564–571, San Juan, Puerto Rico (2007)

    Google Scholar 

  131. Thorndike, E.: Animal Intelligence. MacMillan, New York (1911)

    Google Scholar 

  132. Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: NIPS, pp. 385–392, Denver, Colorado (1995)

    Google Scholar 

  133. Tolman, E.C.: Cognitive maps in rats and men. Psychol. Rev. 55(4), 189–208 (1948)

    Google Scholar 

  134. Tricomi, E., Balleine, B.W., O’Doherty, J.P.: A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29(11), 2225–2232 (2009)

    Google Scholar 

  135. Valentin, V.V., Dickinson, A., O’Doherty, J.P.: Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27(15), 4019–4026 (2007)

    Google Scholar 

  136. Vasilaki, E., Fusi, S., Wang, X.-J., Senn, W. (2009): Learning flexible sensori-motor mappings in a complex network. Biol. Cybern. 100(2), 147–158 (2007)

    MATH  Google Scholar 

  137. Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML, pp. 956–963, Bonn, Germany (2005)

    Google Scholar 

  138. Watkins, C. (1989): Learning from delayed rewards. Ph.D. Thesis, University of Cambridge (2005)

    Google Scholar 

  139. Wiering, M., Schmidhuber, J.: Efficient model-based exploration. In: Simulation of Adaptive Behavior, pp. 223–228, Zurich, Switzerland (1998)

    Google Scholar 

  140. Williams, D.R., Williams, H.: Auto-maintenance in the pigeon: Sustained pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 12(4), 511–520 (1969)

    Google Scholar 

  141. Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: A hierarchical bayesian approach. In: ICML, pp. 1015–1022, Corvallis, Oregon (2007)

    Google Scholar 

  142. Wingate, D., Goodman, N.D., Roy, D.M., Kaelbling, L.P., Tenenbaum, J.B.: Bayesian policy search with policy priors. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume, vol. 2, pp. 1565–1570. AAAI Press, Menlo Park (2011)

    Google Scholar 

  143. Wolpert, D.M., Kawato, M.: Multiple paired forward and inverse models for motor control. Neural Netw. 11(7–8), 1317–1329 (1998)

    Google Scholar 

  144. Yoshida, W., Ishii, S.: Resolution of uncertainty in prefrontal cortex. Neuron 50(5), 781–789 (2006)

    Google Scholar 

  145. Yu, A.J., Dayan, P.: Uncertainty, neuromodulation, and attention. Neuron 46(4), 681–692 (2005)

    Google Scholar 

Download references

Acknowledgements

I am very grateful to Andrew Barto, the editors, and two anonymous reviewers for their comments on this chapter. My work is funded by the Gatsby Charitable Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Dayan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dayan, P. (2013). Exploration from Generalization Mediated by Multiple Controllers. In: Baldassarre, G., Mirolli, M. (eds) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32375-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32375-1_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32374-4

  • Online ISBN: 978-3-642-32375-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics