Which Temporal Difference Learning Algorithm Best Reproduces Dopamine Activity in a Multi-choice Task?

Bellot, Jean; Sigaud, Olivier; Khamassi, Mehdi

doi:10.1007/978-3-642-33093-3_29

Jean Bellot²²,
Olivier Sigaud²² &
Mehdi Khamassi^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7426))

Included in the following conference series:

International Conference on Simulation of Adaptive Behavior

1494 Accesses
4 Citations

Abstract

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) which corresponds to the error signal in Temporal Difference (TD) learning algorithms. This hypothesis has been reinforced by numerous studies showing the relevance of TD learning algorithms to describe the role of basal ganglia in classical conditioning. However, recent recordings of DA neurons during multi-choice tasks raised contradictory interpretations on whether DA’s RPE signal is action dependent or not. Thus the precise TD algorithm (i.e. Actor-Critic, Q-learning or SARSA) that best describes DA signals remains unknown. Here we simulate and precisely analyze these TD algorithms on a multi-choice task performed by rats. We find that DA activity previously reported in this task is best fitted by a TD error which has not fully converged, and which converged faster than observed behavioral adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)
Article Google Scholar
Hollerman, J.R., Schultz, W.: Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1(4), 304–309 (1998)
Article Google Scholar
Schultz, W.: Predictive reward signal of dopamine neurons. Journal of Neurophysiology 80(1), 1–27 (1998)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press (March 1998)
Google Scholar
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., Bergman, H.: Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9(8), 1057–1063 (2006)
Article Google Scholar
Roesch, M.R., Calu, D.J., Schoenbaum, G.: Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10(12), 1615–1624 (2007)
Article Google Scholar
Tanaka, S.C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S.: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience 7(8), 887–893 (2004)
Article Google Scholar
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)
Article Google Scholar
Bayer, H.M., Glimcher, P.W.: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47(1), 129–141 (2005)
Article Google Scholar
Niv, Y., Daw, N.D., Dayan, P.: Choice values. Nature Neuroscience 9(8), 987–988 (2006)
Google Scholar
Daw, N.D.: Dopamine: at the intersection of reward and action. Nat. Neurosci. 10(12), 1505–1507 (2007)
Article Google Scholar
Niv, Y., Schoenbaum, G.: Dialogues on prediction errors. Trends in Cognitive Sciences 12(7), 265–272 (2008)
Article Google Scholar
Matsumoto, M., Hikosaka, O.: Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459(7248), 837–841 (2009)
Article Google Scholar
Keramati, M., Dezfouli, A., Piray, P.: Speed/Accuracy Trade-Off between the habitual and the Goal-Directed processes. PLoS Comput. Biol. 7(5), e1002055 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut des Systemes Intelligents et de Robotique (ISIR), Universite Pierre et Marie Curie (UPMC), 4 place Jussieu, 75005, Paris, France
Jean Bellot, Olivier Sigaud & Mehdi Khamassi
UMR 7222, Centre National de la Recherche Scientifique (CNRS), France
Mehdi Khamassi

Authors

Jean Bellot
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Sigaud
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Khamassi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Research Centre, University of Skövde, Kanikegränd 3A, 54134, Skövde, Sweden
Tom Ziemke
Lund University, Lundagård, Kungshuset, 22222, Lund, Sweden
Christian Balkenius
Mærsk McKinney Møller Institute, University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
John Hallam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bellot, J., Sigaud, O., Khamassi, M. (2012). Which Temporal Difference Learning Algorithm Best Reproduces Dopamine Activity in a Multi-choice Task?. In: Ziemke, T., Balkenius, C., Hallam, J. (eds) From Animals to Animats 12. SAB 2012. Lecture Notes in Computer Science(), vol 7426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33093-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-33093-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33092-6
Online ISBN: 978-3-642-33093-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics