Abstract
In this paper, we develop a new algorithm called Fuzzy Q-Learning (or FQ-Learning) which extends Watkin's Q-Learning method. It can be used for decision processes in which the goals and/or the constraints, but not necessarily the system under control, are fuzzy in nature. An example of a fuzzy constraint is: “the weight of object A must not be substantially heavier than w” where w is a specified weight. Similarly, an example of a fuzzy goal is: “the robot must be in the vicinity of door k”. We show that FQ-Learning provides an alternative solution to this problem which is simpler than the Bellman-Zadeh's fuzzy dynamic programming approach. We apply the algorithm to a multistage decision making problem.
Preview
Unable to display preview. Download preview PDF.
References
A. G. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Submitted to AI Journal special issue on Computational Theories of Interaction and Agency, 1993.
A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:834–846, 1983.
R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
R.E. Bellman and L.A. Zadeh. Decision-making in a fuzzy environment. Management Science, 17(4):B–141:B-164, 1970.
H.R. Berenji and P. Khedkar. Learning and tuning fuzzy logic controllers through reinforcements. IEEE Transactions on Neural Networks, 3(5), 1992.
H.R. Berenji, Y. Jani R.N Lea, P. Khedkar, A. Malkani, and J. Hoblit. Space shuttle attitude control by fuzzy logic and reinforcement learning. In Second IEEE International conference on Fuzzy Systems, San Francisco, CA, March 1993.
L.J. Lin. Programming robots using reinforcement learning and teaching. In Proceedings of the Ninth National Conference on Artificial Intelligence, 1991.
A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, page to appear.
R.S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
R.S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, 1990.
G. Tesauro. Practical issues in temporal difference learning. Machine Learning, (8):257–277, 1992.
G. Tesauro. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215–219, 1994.
C. Watkins and P. Dayan. Q-learning. Machine Learning, (8):279–292, 1992.
C.J.C.H. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University, Psychology Department, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berenji, H.R. (1994). Fuzzy reinforcement Learning and dynamic programming. In: Ralescu, A.L. (eds) Fuzzy Logic in Artificial Intelligence. FLAI 1993. Lecture Notes in Computer Science, vol 847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58409-9_1
Download citation
DOI: https://doi.org/10.1007/3-540-58409-9_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58409-4
Online ISBN: 978-3-540-48780-7
eBook Packages: Springer Book Archive