Skip to main content

An Investigation of Reinforcement Learning for Reactive Search Optimization

  • Chapter
Autonomous Search

Abstract

Reactive Search Optimization advocates the adoption of learning mechanisms as an integral part of a heuristic optimization scheme. This work studies reinforcement learning methods for the online tuning of parameters in stochastic local search algorithms. In particular, the reactive tuning is obtained by learning a (near-)optimal policy in a Markov decision process where the states summarize relevant information about the recent history of the search. The learning process is performed by the Least Squares Policy Iteration (LSPI) method. The proposed framework is applied for tuning the prohibition value in the Reactive Tabu Search, the noise parameter in the Adaptive Walksat, and the smoothing probability in the Reactive Scaling and Probabilistic Smoothing (RSAPS) algorithm. The novel approach is experimentally compared with the original ad hoc. reactive schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baluja S., Barto A., Boese K., Boyan J., Buntine W., Carson T., Caruana R., Cook D., Davies S., Dean T., et al.: Statistical machine learning for large-scale optimization. Neural Computing Surveys 3:1–58 (2000)

    Google Scholar 

  2. Battiti R.: Machine learning methods for parameter tuning in heuristics. In: 5th DIMACS Challenge Workshop: Experimental Methodology Day, Rutgers University (1996)

    Google Scholar 

  3. Battiti R., Brunato M.: Reactive search: Machine learning for memory-based heuristics. In: Gonzalez T.F. (ed.) Approximation Algorithms and Metaheuristics, Taylor and Francis Books (CRC Press), Washington, DC, chap. 21, pp. 21-1–21-17 (2007)

    Google Scholar 

  4. Battiti R., Campigotto P.: Reinforcement learning and reactive search: An adaptive MAX-SAT solver. In: Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, IOS Press, pp. 909–910 (2008)

    Google Scholar 

  5. Battiti R., Protasi M.: Reactive search, a history-sensitive heuristic for MAX-SAT. ACM Journal of Experimental Algorithmics 2 (article 2), http://www.jea.acm.org/ (1997)

    Google Scholar 

  6. Battiti R., Tecchiolli G.: The Reactive Tabu Search. ORSA Journal on Computing 6(2):126–140 (1994)

    MATH  Google Scholar 

  7. Battiti R., Brunato M., Mascia F.: Reactive Search and Intelligent Optimization, Operations research/Computer Science Interfaces, vol. 45. Springer Verlag (2008)

    MATH  Google Scholar 

  8. Bennett K., Parrado-Hernández E.: The interplay of optimization and machine learning research. The Journal of Machine Learning Research 7:1265–1281 (2006)

    MATH  Google Scholar 

  9. Bertsekas D., Tsitsiklis J.: Neuro-dynamic programming. Athena Scientific (1996)

    MATH  Google Scholar 

  10. Boyan J.A., Moore A.W.: Learning evaluation functions for global optimization and Boolean satisfiability. In: Press A. (ed.) Proc. of 15th National Conf. on Artificial Intelligence (AAAI), pp. 3–10 (1998)

    Google Scholar 

  11. Brunato M., Battiti R., Pasupuleti S.: A memory-based rash optimizer. In: Geffner A.F.R.H.H. (ed.) Proceedings of AAAI-06 Workshop on Heuristic Search, Memory Based Heuristics and Their applications, Boston, Mass., pp. 45–51, ISBN 978-1-57735-290-7 (2006)

    Google Scholar 

  12. Eiben A., Horvath M., Kowalczyk W., Schut M.: Reinforcement learning for online control of evolutionary algorithms. In: Brueckner S.A., Hassas S., Jelasity M., Yamins D. (eds.) Proceedings of the 4th International Workshop on Engineering Self-Organizing Applications (ESOA’06), Springer Verlag, LNAI 4335, pp. 151–160 (2006)

    Google Scholar 

  13. Epstein S.L., Freuder E.C., Wallace R.J.: Learning to support constraint programmers. Computational Intelligence 21(4):336–371 (2005)

    Article  MathSciNet  Google Scholar 

  14. Fong P.W.L.: A quantitative study of hypothesis selection. In: International Conference on Machine Learning, pp. 226–234 (1995) URL citeseer.ist.psu.edu/fong95quantitative.html

    Google Scholar 

  15. Hamadi Y., Monfroy E., Saubion F.: What is Autonomous Search? Tech. Rep. MSR-TR-2008-80, Microsoft Research (2008)

    Google Scholar 

  16. Hoos H.: An adaptive noise mechanism for WalkSAT. In: Proceedings of the National Conference on Artificial Intelligence, AAAI Press; MIT Press, vol. 18, pp. 655–660 (1999)

    Google Scholar 

  17. Hoos H., Stuetzle T.: Stochastic Local Search: Foundations and applications. Morgan Kaufmann (2005)

    MATH  Google Scholar 

  18. Hutter F., Tompkins D., Hoos H.: Scaling and probabilistic smoothing: Efficient dynamic local search for sat. In: Proc. Principles and Practice of Constraint Programming - CP 2002, Ithaca, NY, Sept. 2002, Springer LNCS, pp. 233–248 (2002)

    Google Scholar 

  19. Hutter F., Hoos H.H., Stützle T.: Automatic algorithm configuration based on local search. In: Proc. of the Twenty-Second Conference on Artifical Intelligence (AAAI ’07), pp. 1152–1157 (2007)

    Google Scholar 

  20. Lagoudakis M., Littman M.: Algorithm selection using reinforcement learning. Proceedings of the Seventeenth International Conference on Machine Learning, pp. 511–518 (2000)

    Google Scholar 

  21. Lagoudakis M., Littman M.: Learning to select branching rules in the DPLL procedure for satisfiability. LICS 2001 Workshop on Theory and Applications of Satisfiability Testing (SAT 2001) (2001)

    Google Scholar 

  22. Lagoudakis M., Parr R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4(6):1107–1149 (2004)

    MATH  MathSciNet  Google Scholar 

  23. Mitchell D., Selman B., Levesque H.: Hard and easy distributions of SAT problems. In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), San Jose, CA, pp. 459–465 (1992)

    Google Scholar 

  24. Muller S., Schraudolph N., Koumoutsakos P.: Step size adaptation in Evolution Strategies using reinforcement learning. Proceedings of the 2002 Congress on Evolutionary Computation, 2002 CEC’02 1, pp. 151–156 (2002)

    Chapter  Google Scholar 

  25. Prestwich S.: Tuning local search by average-reward reinforcement learning. In: Proceedings of the 2nd Learning and Intelligent OptimizatioN Conference (LION II), Trento, Italy, Dec. 10–12, 2007, Springer, Lecture Notes in Computer Science (2008)

    Google Scholar 

  26. Schwartz A.: A reinforcement learning method for maximizing undiscounted rewards. In: ICML, pp. 298–305 (1993)

    Google Scholar 

  27. Selman B., Kautz H., Cohen B.: Noise strategies for improving local search. In: Proceedings of the national conference on artificial intelligence, John Wiley & sons Ltd, USA, vol. 12 (1994)

    Google Scholar 

  28. Sutton R. S., Barto A. G.: Reinforcement Learning: An introduction. MIT Press (1998)

    Google Scholar 

  29. Tompkins D.: UBCSAT. http://www.satlib.org/ubcsat/#introduction (as of Oct. 1, 2008)

  30. Xu Y., Stern D., Samulowitz H.: Learning adaptation to solve Constraint Satisfaction Problems. In: Proceedings of the 3rd Learning and Intelligent OptimizatioN Conference (LION III), Trento, Italy, Jan. 14–18, 2009, Springer, Lecture Notes in Computer Science (2009)

    Google Scholar 

  31. Zhang W., Dietterich T.: A reinforcement learning approach to job-shop scheduling. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1114–1120 (1995)

    Google Scholar 

  32. Zhang W., Dietterich T.: High-performance job-shop scheduling with a time-delay TD (λ) network. Advances in Neural Information Processing Systems 8:1024–1030 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Battiti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Battiti, R., Campigotto, P. (2011). An Investigation of Reinforcement Learning for Reactive Search Optimization. In: Hamadi, Y., Monfroy, E., Saubion, F. (eds) Autonomous Search. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21434-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21434-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21433-2

  • Online ISBN: 978-3-642-21434-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics