Abstract
In this paper we investigate an alternative knowledge representation and learning strategy for the automated machine learning (AutoML) task. Our approach combines a symbolic planner with reinforcement learning to evolve programs that process data and train machine learning classifiers. The planner, which generates all feasible plans from the initial state to the goal state, gives preference first to shortest programs and then later to ones that maximize rewards. The results demonstrate the efficacy of the approach for finding good machine learning pipelines, while at the same time showing that the representation can be used to infer new knowledge relevant for the problem instances being solved. These insights can be useful for other automatic programming approaches, like genetic programming (GP) and Bayesian optimization pipeline learning, with respect to representation and learning strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Tildes are used to distinguish actions in MDP space from actions in symbolic planning space.
- 3.
References
Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Systems Journal 13, 41–77 (2003)
Cimatti, A., Pistore, M., Traverso, P.: Automated planning. In: F. van Harmelen, V. Lifschitz, B. Porter (eds.) Handbook of Knowledge Representation. Elsevier (2008)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Gebser, M., Kaufmann, B., Schaub, T.: Conflict-driven answer set solving: From theory to practice. Artificial Intelligence 187–188, 52–89 (2012)
Gelfond, M., Lifschitz, V.: Action languages. Electronic Transactions on Artificial Intelligence (ETAI) 6 (1998)
Gulwani, S., Harris, W.R., Singh, R.: Spreadsheet data manipulation using examples. Commun. ACM 55(8), 97–105 (2012)
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization, pp. 507–523. Springer (2011)
Lee, J., Lifschitz, V., Yang, F.: Action Language \(\mathcal {BC}\): A Preliminary Report. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 983–989 (2013)
Lifschitz, V.: What is answer set programming? In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1594–1597. MIT Press (2008)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011)
Mahadevan, S.: Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning 22, 159–195 (1996)
Martineau, J., Finin, T.: Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In: Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media, pp. 258–261. AAAI Press, San Jose, CA (2009)
McDermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL-the planning domain definition language. Tech. Rep. CVC-TR-98–003, Yale Center for Computational Vision and Control (1998)
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 485–492. ACM, New York, NY, USA (2016)
O’Reilly, U.M., Oppacher, F.: Program search with a hierarchical variable length representation: Genetic programming, simulated annealing and hill climbing. In: Y. Davidor, H.P. Schwefel, R. Männer (eds.) Parallel Problem Solving from Nature — PPSN III, pp. 397–406. Springer Berlin Heidelberg, Berlin, Heidelberg (1994)
Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)
Puterman, M.L.: Markov Decision Processes. Wiley Interscience, New York, USA (1994)
Schwartz, A.: A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the Tenth International Conference on International Conference on Machine Learning, ICML’93, pp. 298–305. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993)
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In: Proc. of KDD-2013, pp. 847–855 (2013)
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Computation 8, 1341–1390 (1996)
Yang, F., Lyu, D., Liu, B., Gustafson, S.: Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 4860–4866. International Joint Conferences on Artificial Intelligence Organization (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Yang, F., Gustafson, S., Elkholy, A., Lyu, D., Liu, B. (2019). Program Search for Machine Learning Pipelines Leveraging Symbolic Planning and Reinforcement Learning. In: Banzhaf, W., Spector, L., Sheneman, L. (eds) Genetic Programming Theory and Practice XVI. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-04735-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-04735-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04734-4
Online ISBN: 978-3-030-04735-1
eBook Packages: Computer ScienceComputer Science (R0)