Skip to main content

Which Temporal Difference Learning Algorithm Best Reproduces Dopamine Activity in a Multi-choice Task?

  • Conference paper
From Animals to Animats 12 (SAB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7426))

Included in the following conference series:

Abstract

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) which corresponds to the error signal in Temporal Difference (TD) learning algorithms. This hypothesis has been reinforced by numerous studies showing the relevance of TD learning algorithms to describe the role of basal ganglia in classical conditioning. However, recent recordings of DA neurons during multi-choice tasks raised contradictory interpretations on whether DA’s RPE signal is action dependent or not. Thus the precise TD algorithm (i.e. Actor-Critic, Q-learning or SARSA) that best describes DA signals remains unknown. Here we simulate and precisely analyze these TD algorithms on a multi-choice task performed by rats. We find that DA activity previously reported in this task is best fitted by a TD error which has not fully converged, and which converged faster than observed behavioral adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)

    Article  Google Scholar 

  2. Hollerman, J.R., Schultz, W.: Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1(4), 304–309 (1998)

    Article  Google Scholar 

  3. Schultz, W.: Predictive reward signal of dopamine neurons. Journal of Neurophysiology 80(1), 1–27 (1998)

    Google Scholar 

  4. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press (March 1998)

    Google Scholar 

  5. Morris, G., Nevet, A., Arkadir, D., Vaadia, E., Bergman, H.: Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9(8), 1057–1063 (2006)

    Article  Google Scholar 

  6. Roesch, M.R., Calu, D.J., Schoenbaum, G.: Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10(12), 1615–1624 (2007)

    Article  Google Scholar 

  7. Tanaka, S.C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S.: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience 7(8), 887–893 (2004)

    Article  Google Scholar 

  8. Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)

    Article  Google Scholar 

  9. Bayer, H.M., Glimcher, P.W.: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47(1), 129–141 (2005)

    Article  Google Scholar 

  10. Niv, Y., Daw, N.D., Dayan, P.: Choice values. Nature Neuroscience 9(8), 987–988 (2006)

    Google Scholar 

  11. Daw, N.D.: Dopamine: at the intersection of reward and action. Nat. Neurosci. 10(12), 1505–1507 (2007)

    Article  Google Scholar 

  12. Niv, Y., Schoenbaum, G.: Dialogues on prediction errors. Trends in Cognitive Sciences 12(7), 265–272 (2008)

    Article  Google Scholar 

  13. Matsumoto, M., Hikosaka, O.: Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459(7248), 837–841 (2009)

    Article  Google Scholar 

  14. Keramati, M., Dezfouli, A., Piray, P.: Speed/Accuracy Trade-Off between the habitual and the Goal-Directed processes. PLoS Comput. Biol. 7(5), e1002055 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bellot, J., Sigaud, O., Khamassi, M. (2012). Which Temporal Difference Learning Algorithm Best Reproduces Dopamine Activity in a Multi-choice Task?. In: Ziemke, T., Balkenius, C., Hallam, J. (eds) From Animals to Animats 12. SAB 2012. Lecture Notes in Computer Science(), vol 7426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33093-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33093-3_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33092-6

  • Online ISBN: 978-3-642-33093-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics