Skip to main content

Counter Example for Q-Bucket-Brigade Under Prediction Problem

  • Conference paper
Learning Classifier Systems (IWLCS 2003, IWLCS 2004, IWLCS 2005)

Abstract

Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS’s reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter example for LCS with the Q-bucket-brigade based on the 11-state star problem, a counter example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter example to LCS verified the results predicted from the theory: (1) LCS with the Q-bucket-brigade diverged under prediction problems, where the action selection policy was fixed; and (2) such divergence was avoided by using the implicit-bucket-brigade or applying residual gradient algorithm to the Q-bucket-brigade.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holland, J.H.: Escaping brittleness: the possibilities of general-purpose. Machine Learning, an artificial intelligence approach 2, 593–623 (1986)

    Google Scholar 

  2. Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)

    Article  Google Scholar 

  3. Kovacs, T.: Evolving optimal populations with xcs classifier systems. Technical Report CSRP-96-17, University of Birmingham, School of Computer Science (1996)

    Google Scholar 

  4. Butz, M.V., Pelikan, M.: Analyzing the evolutionary pressures in XCS. In: Spector, L., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 935–942. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  5. Butz, M.V., Goldberg, D.E., Lanzi, P.L.: Bounding learning time in XCS. In: Deb, K., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), pp. 739–750. Springer, Heidelberg (2004)

    Google Scholar 

  6. Butz, M.V., et al.: Toward a theory of generalization and learning in XCS. IEEE Transactions on Evolutionary Computation 8, 28–46 (2004)

    Article  Google Scholar 

  7. Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)

    Google Scholar 

  8. Dorigo, M., Bersini, H.: A comparison of Q-learning and classifier systems. In: Cliff, D., et al. (eds.) Proceedings of From Animals to Animats, Third International Conference on Simulation of Adaptive Behavior, pp. 248–255. MIT Press, Cambridge (1994)

    Google Scholar 

  9. Lanzi, P.L.: Learning classifier systems from a reinforcement learning perspective. Soft Computing 6, 162–170 (2002)

    MATH  Google Scholar 

  10. Butz, M.V., Lanzi, P.L., Goldberg, D.E.: Gradient descent methods in learning classifier systems: Improving xcs performance in multistep problems. IEEE Transactions on Evolutionary Computation 9, 452–473 (2005)

    Article  Google Scholar 

  11. Booker, L.: Adaptive value function approximations in classifier systems. In: The Eighth International Workshop on Learning Classifier Systems (IWLCS2005), pp. 90–91 (2005)

    Google Scholar 

  12. O’Hara, T., Bull, L.: A memetic accuracy-based neural learning classifier system. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2040–2045. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  13. Wada, A., et al.: Comparison between Q-learning and ZCS Learning Classifier System: From aspect of function approximation. In: The 8th Conference on Intelligent Autonomous Systems (2004)

    Google Scholar 

  14. Wada, A., et al.: Learning classifier system equivalent with reinforcement learning with function approximation. In: The Eighth International Workshop on Learning Classifier Systems (IWLCS2005), pp. 24–29 (2005)

    Google Scholar 

  15. Wilson, S.W.: ZCS: A zeroth level classifier system. Evolutionary Computation 2, 1–18 (1994)

    Article  Google Scholar 

  16. Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1038–1044. MIT Press, Cambridge (1996)

    Google Scholar 

  17. Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Prieditis, A., Russell, S.J. (eds.) Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning (ICML1995), pp. 30–37. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  18. Wada, A., et al.: Learning Classifier Systems with Convergence and Generalization. In: Foundations on Learning Classifier Systems, pp. 285–304. Springer, London (2005)

    Chapter  Google Scholar 

  19. Dayan, P., Sejnowski, T.J.: TD(λ) converges with probability 1. Machine Learning 14, 295–301 (1994)

    Google Scholar 

  20. Jaakkola, T.S., Jorda, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. IEEE Transactions on Automatic Control 6, 1185–1201 (1994)

    MATH  Google Scholar 

  21. Peng, J., Williams, R.J.: On the convergence of stochastic iterative dynamic programming algorithms. Adaptive Behavior 1, 437–454 (1993)

    Article  Google Scholar 

  22. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  23. Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Machine Learning 16, 185–202 (1994)

    MATH  Google Scholar 

  24. Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)

    Article  MATH  Google Scholar 

  25. Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, pp. 261–268. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  26. Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7, pp. 361–368. MIT Press, Cambridge (1995)

    Google Scholar 

  27. Singh, S.P., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38, 287–308 (2000)

    Article  MATH  Google Scholar 

  28. Watkins, J.C.H.: Learning from delayed rewards. PhD thesis, Cambridge University (1989)

    Google Scholar 

  29. Butz, M.V., Wilson, S.W.: An Algorithmic Description of XCS. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 253–272. Springer, Heidelberg (2002)

    Google Scholar 

  30. Baird, L.C.: Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie Mellon University, Pittsburgh, PA (1999)

    Google Scholar 

  31. Merke, A., Schoknecht, R.: Convergence of synchronous reinforcement learning with linear function approximation. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning, p. 75. ACM Press, New York (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Tim Kovacs Xavier Llorà Keiki Takadama Pier Luca Lanzi Wolfgang Stolzmann Stewart W. Wilson

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Wada, A., Takadama, K., Shimohara, K. (2007). Counter Example for Q-Bucket-Brigade Under Prediction Problem. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds) Learning Classifier Systems. IWLCS IWLCS IWLCS 2003 2004 2005. Lecture Notes in Computer Science(), vol 4399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71231-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71231-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71230-5

  • Online ISBN: 978-3-540-71231-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics