An Investigation of Reinforcement Learning for Reactive Search Optimization

Battiti, Roberto; Campigotto, Paolo

doi:10.1007/978-3-642-21434-9_6

Roberto Battiti⁴ &
Paolo Campigotto

1831 Accesses
3 Citations

Abstract

Reactive Search Optimization advocates the adoption of learning mechanisms as an integral part of a heuristic optimization scheme. This work studies reinforcement learning methods for the online tuning of parameters in stochastic local search algorithms. In particular, the reactive tuning is obtained by learning a (near-)optimal policy in a Markov decision process where the states summarize relevant information about the recent history of the search. The learning process is performed by the Least Squares Policy Iteration (LSPI) method. The proposed framework is applied for tuning the prohibition value in the Reactive Tabu Search, the noise parameter in the Adaptive Walksat, and the smoothing probability in the Reactive Scaling and Probabilistic Smoothing (RSAPS) algorithm. The novel approach is experimentally compared with the original ad hoc. reactive schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baluja S., Barto A., Boese K., Boyan J., Buntine W., Carson T., Caruana R., Cook D., Davies S., Dean T., et al.: Statistical machine learning for large-scale optimization. Neural Computing Surveys 3:1–58 (2000)
Google Scholar
Battiti R.: Machine learning methods for parameter tuning in heuristics. In: 5th DIMACS Challenge Workshop: Experimental Methodology Day, Rutgers University (1996)
Google Scholar
Battiti R., Brunato M.: Reactive search: Machine learning for memory-based heuristics. In: Gonzalez T.F. (ed.) Approximation Algorithms and Metaheuristics, Taylor and Francis Books (CRC Press), Washington, DC, chap. 21, pp. 21-1–21-17 (2007)
Google Scholar
Battiti R., Campigotto P.: Reinforcement learning and reactive search: An adaptive MAX-SAT solver. In: Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, IOS Press, pp. 909–910 (2008)
Google Scholar
Battiti R., Protasi M.: Reactive search, a history-sensitive heuristic for MAX-SAT. ACM Journal of Experimental Algorithmics 2 (article 2), http://www.jea.acm.org/ (1997)
Google Scholar
Battiti R., Tecchiolli G.: The Reactive Tabu Search. ORSA Journal on Computing 6(2):126–140 (1994)
MATH Google Scholar
Battiti R., Brunato M., Mascia F.: Reactive Search and Intelligent Optimization, Operations research/Computer Science Interfaces, vol. 45. Springer Verlag (2008)
MATH Google Scholar
Bennett K., Parrado-Hernández E.: The interplay of optimization and machine learning research. The Journal of Machine Learning Research 7:1265–1281 (2006)
MATH Google Scholar
Bertsekas D., Tsitsiklis J.: Neuro-dynamic programming. Athena Scientific (1996)
MATH Google Scholar
Boyan J.A., Moore A.W.: Learning evaluation functions for global optimization and Boolean satisfiability. In: Press A. (ed.) Proc. of 15th National Conf. on Artificial Intelligence (AAAI), pp. 3–10 (1998)
Google Scholar
Brunato M., Battiti R., Pasupuleti S.: A memory-based rash optimizer. In: Geffner A.F.R.H.H. (ed.) Proceedings of AAAI-06 Workshop on Heuristic Search, Memory Based Heuristics and Their applications, Boston, Mass., pp. 45–51, ISBN 978-1-57735-290-7 (2006)
Google Scholar
Eiben A., Horvath M., Kowalczyk W., Schut M.: Reinforcement learning for online control of evolutionary algorithms. In: Brueckner S.A., Hassas S., Jelasity M., Yamins D. (eds.) Proceedings of the 4th International Workshop on Engineering Self-Organizing Applications (ESOA’06), Springer Verlag, LNAI 4335, pp. 151–160 (2006)
Google Scholar
Epstein S.L., Freuder E.C., Wallace R.J.: Learning to support constraint programmers. Computational Intelligence 21(4):336–371 (2005)
Article MathSciNet Google Scholar
Fong P.W.L.: A quantitative study of hypothesis selection. In: International Conference on Machine Learning, pp. 226–234 (1995) URL citeseer.ist.psu.edu/fong95quantitative.html
Google Scholar
Hamadi Y., Monfroy E., Saubion F.: What is Autonomous Search? Tech. Rep. MSR-TR-2008-80, Microsoft Research (2008)
Google Scholar
Hoos H.: An adaptive noise mechanism for WalkSAT. In: Proceedings of the National Conference on Artificial Intelligence, AAAI Press; MIT Press, vol. 18, pp. 655–660 (1999)
Google Scholar
Hoos H., Stuetzle T.: Stochastic Local Search: Foundations and applications. Morgan Kaufmann (2005)
MATH Google Scholar
Hutter F., Tompkins D., Hoos H.: Scaling and probabilistic smoothing: Efficient dynamic local search for sat. In: Proc. Principles and Practice of Constraint Programming - CP 2002, Ithaca, NY, Sept. 2002, Springer LNCS, pp. 233–248 (2002)
Google Scholar
Hutter F., Hoos H.H., Stützle T.: Automatic algorithm configuration based on local search. In: Proc. of the Twenty-Second Conference on Artifical Intelligence (AAAI ’07), pp. 1152–1157 (2007)
Google Scholar
Lagoudakis M., Littman M.: Algorithm selection using reinforcement learning. Proceedings of the Seventeenth International Conference on Machine Learning, pp. 511–518 (2000)
Google Scholar
Lagoudakis M., Littman M.: Learning to select branching rules in the DPLL procedure for satisfiability. LICS 2001 Workshop on Theory and Applications of Satisfiability Testing (SAT 2001) (2001)
Google Scholar
Lagoudakis M., Parr R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4(6):1107–1149 (2004)
MATH MathSciNet Google Scholar
Mitchell D., Selman B., Levesque H.: Hard and easy distributions of SAT problems. In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), San Jose, CA, pp. 459–465 (1992)
Google Scholar
Muller S., Schraudolph N., Koumoutsakos P.: Step size adaptation in Evolution Strategies using reinforcement learning. Proceedings of the 2002 Congress on Evolutionary Computation, 2002 CEC’02 1, pp. 151–156 (2002)
Chapter Google Scholar
Prestwich S.: Tuning local search by average-reward reinforcement learning. In: Proceedings of the 2nd Learning and Intelligent OptimizatioN Conference (LION II), Trento, Italy, Dec. 10–12, 2007, Springer, Lecture Notes in Computer Science (2008)
Google Scholar
Schwartz A.: A reinforcement learning method for maximizing undiscounted rewards. In: ICML, pp. 298–305 (1993)
Google Scholar
Selman B., Kautz H., Cohen B.: Noise strategies for improving local search. In: Proceedings of the national conference on artificial intelligence, John Wiley & sons Ltd, USA, vol. 12 (1994)
Google Scholar
Sutton R. S., Barto A. G.: Reinforcement Learning: An introduction. MIT Press (1998)
Google Scholar
Tompkins D.: UBCSAT. http://www.satlib.org/ubcsat/#introduction (as of Oct. 1, 2008)
Xu Y., Stern D., Samulowitz H.: Learning adaptation to solve Constraint Satisfaction Problems. In: Proceedings of the 3rd Learning and Intelligent OptimizatioN Conference (LION III), Trento, Italy, Jan. 14–18, 2009, Springer, Lecture Notes in Computer Science (2009)
Google Scholar
Zhang W., Dietterich T.: A reinforcement learning approach to job-shop scheduling. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1114–1120 (1995)
Google Scholar
Zhang W., Dietterich T.: High-performance job-shop scheduling with a time-delay TD (λ) network. Advances in Neural Information Processing Systems 8:1024–1030 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

DISI – Dipartimento di Ingegneria e Scienza dell’Informazione, Università di Trento, Trento, Italy
Roberto Battiti

Authors

Roberto Battiti
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Campigotto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto Battiti .

Editor information

Editors and Affiliations

Microsoft Research Cambridge, JJ Thomson Avenue 7, Cambridge, CB3 0FB, United Kingdom
Youssef Hamadi
Federico Santa María, Departamento de Informática, Universidad Técnica, Avenida España 1680, Valparaíso, Chile
Eric Monfroy
Faculté des Sciences, LERIA, Université d'Angers, Boulevard Lavoisier 2, Angers CX 01, 49045, France
Frédéric Saubion

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Battiti, R., Campigotto, P. (2011). An Investigation of Reinforcement Learning for Reactive Search Optimization. In: Hamadi, Y., Monfroy, E., Saubion, F. (eds) Autonomous Search. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21434-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-21434-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21433-2
Online ISBN: 978-3-642-21434-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics