Learning feature spaces for regression with genetic programming

La Cava, William; Moore, Jason H.

doi:10.1007/s10710-020-09383-4

Learning feature spaces for regression with genetic programming

Published: 11 March 2020

Volume 21, pages 433–467, (2020)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

824 Accesses
23 Citations
7 Altmetric
Explore all metrics

Abstract

Genetic programming has found recent success as a tool for learning sets of features for regression and classification. Multidimensional genetic programming is a useful variant of genetic programming for this task because it represents candidate solutions as sets of programs. These sets of programs expose additional information that can be exploited for building block identification. In this work, we discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. We investigate methods for biasing the components of programs that are promoted in order to guide search towards useful and complementary feature spaces. We study two main approaches: (1) the introduction of new objectives and (2) the use of specialized semantic variation operators. We find that a semantic crossover operator based on stagewise regression leads to significant improvements on a set of regression problems. The inclusion of semantic crossover produces state-of-the-art results in a large benchmark study of open-source regression problems in comparison to several state-of-the-art machine learning approaches and other genetic programming frameworks. Finally, we look at the collinearity and complexity of the data representations produced by different methods, in order to assess whether relevant, concise, and independent factors of variation can be produced in application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genetic Programming for Classification and Feature Selection

Novel Approach for Feature Selection Using Genetic Algorithm

Econometric Genetic Programming in Binary Classification: Evolving Logistic Regressions Through Genetic Programming

References

I. Arnaldo, K. Krawiec, U.M. O’Reilly, Multiple regression genetic programming, in Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (ACM Press, 2014), pp. 879–886. https://doi.org/10.1145/2576768.2598291. http://dl.acm.org/citation.cfm?doid=2576768.2598291. Accessed 15 Oct 2019
I. Arnaldo, U.M. O’Reilly, K. Veeramachaneni, Building predictive models via feature synthesis, in GECCO (ACM Press, 2015), pp. 983–990. https://doi.org/10.1145/2739480.2754693. http://dl.acm.org/citation.cfm?doid=2739480.2754693. Accessed 15 Oct 2019
D.A. Belsley, A guide to using the collinearity diagnostics. Comput. Sci. Econ. Manag. 4(1), 33–50 (1991). https://doi.org/10.1007/BF00426854
Article MathSciNet MATH Google Scholar
Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
P.P. Brahma, D. Wu, Y. She, Why deep learning works: a manifold disentanglement perspective. IEEE Trans. Neural Netw. Learn. Syst. 27(10), 1997–2008 (2016)
Article MathSciNet Google Scholar
M. Castelli, S. Silva, L. Vanneschi, A C++ framework for geometric semantic genetic programming. Genet. Program. Evol. Mach. 16(1), 73–81 (2015). https://doi.org/10.1007/s10710-014-9218-0
Article Google Scholar
T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (ACM, New York, NY, USA, 2016), pp. 785–794. https://doi.org/10.1145/2939672.2939785
W.S. Cleveland, Visualizing Data (Hobart Press, New Jersey, 1993)
Google Scholar
A. Cline, C. Moler, G. Stewart, J. Wilkinson, An estimate for the condition number of a matrix. SIAM J. Numer. Anal. 16(2), 368–375 (1979). https://doi.org/10.1137/0716029
Article MathSciNet MATH Google Scholar
E. Conti, V. Madhavan, F.P. Such, J. Lehman, K.O. Stanley, J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. arXiv:1712.06560 [cs] (2017)
C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, S. Yang, Adanet: adaptive structural learning of artificial neural networks. arXiv preprint arXiv:1607.01097 (2016)
V.V. De Melo, Kaizen Programming, in GECCO (ACM Press, New York, 2014), pp. 895–902. https://doi.org/10.1145/2576768.2598264. http://dl.acm.org/citation.cfm?doid=2576768.2598264
K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II, in Parallel Problem Solving from Nature PPSN VI, vol. 1917, ed. by M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J.J. Merelo, H.P. Schwefel (Springer, Berlin, 2000), pp. 849–858. http://repository.ias.ac.in/83498/. Accessed 15 Oct 2019
J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)
MathSciNet MATH Google Scholar
C. Eastwood, C.K.I. Williams, A framework for the quantitative evaluation of disentangled representations, in ICLR (2018). https://openreview.net/forum?id=By-7dz-AZ. Accessed 15 Oct 2019
C. Fernando, D. Banarse, M. Reynolds, F. Besse, D. Pfau, M. Jaderberg, M. Lanctot, D. Wierstra, Convolution by evolution: differentiable pattern producing networks. arXiv:1606.02580 [cs] (2016)
R. Ffrancon, M. Schoenauer, Memetic Semantic Genetic Programming (ACM Press, 2015), pp. 1023–1030. https://doi.org/10.1145/2739480.2754697. http://dl.acm.org/citation.cfm?doid=2739480.2754697
S.B. Fine, E. Hemberg, K. Krawiec, U.M. O’Reilly, Exploiting subprograms in genetic programming, in Genetic Programming Theory and Practice XV, Genetic and Evolutionary Computation, ed. by W. Banzhaf, R.S. Olson, W. Tozier, R. Riolo (Springer, Berlin, 2018), pp. 1–16
Google Scholar
D. Floreano, P. Dürr, C. Mattiussi, Neuroevolution: from architectures to learning. Evol. Intell. 1(1), 47–62 (2008). https://doi.org/10.1007/s12065-007-0002-4
Article Google Scholar
Y. Freund, R.E. Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, in Computational Learning Theory, ed. by P. Vitanyi (Springer, Berlin, 1995), pp. 23–37. https://doi.org/10.1007/3-540-59119-2_166
Chapter Google Scholar
J. Friedman, T. Hastie, R. Tibshirani, The elements of statistical learning. Springer series in statistics, vol. 1 (Springer, Berlin, 2001). http://statweb.stanford.edu/tibs/book/preface.ps. Accessed 15 Oct 2019
A.H. Gandomi, A.H. Alavi, A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems. Neural Comput. Appl. 21(1), 171–187 (2012). https://doi.org/10.1007/s00521-011-0734-z
Article Google Scholar
F. Gomez, J. Schmidhuber, R. Miikkulainen, Efficient non-linear control through neuroevolution, in ECML, vol. 4212 (Springer, 2006), pp. 654–662. http://link.springer.com/content/pdf/10.1007/11871842.pdf#page=676
A. Gonzalez-Garcia, J. van de Weijer, Y. Bengio, Image-to-image translation for cross-domain disentanglement. arXiv preprint arXiv:1805.09730 (2018)
Goodfellow, I., H. Lee, Q.V. Le, A. Saxe, A.Y. Ng, Measuring invariances in deep networks, in Advances in Neural Information Processing Systems, pp. 646–654 (2009)
M. Graff, E.S. Tellez, E. Villaseñor, S. Miranda, Semantic genetic programming operators based on projections in the phenotype space. Res. Comput. Sci. 94, 73–85 (2015)
Article Google Scholar
N. Hadad, L. Wolf, M. Shahar, A two-step disentanglement method, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 772–780 (2018)
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, A. Lerchner, $\beta$-VAE: Learning basic visual concepts with a constrained variational framework, in ICLR (2017)
A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
C. Igel, Neuroevolution for reinforcement learning using evolution strategies, in The 2003 Congress on Evolutionary Computation, 2003. CEC’03, vol. 4 (IEEE, 2003), pp. 2588–2595. http://ieeexplore.ieee.org/abstract/document/1299414/. Accessed 15 Oct 2019
V. Ingalalli, S. Silva, M. Castelli, L. Vanneschi, A multi-dimensional genetic programming approach for multi-class classification problems, in Genetic Programming, ed. by M. Nicolau (Springer, Berlin, 2014), pp. 48–60. https://doi.org/10.1007/978-3-662-44303-3_5
Chapter Google Scholar
G. James, D. Witten, T. Hastie, R. Tibshirani, An introduction to statistical learning, in Springer Texts in Statistics, vol. 103, ed. by N.H. Timm (Springer, New York, 2013). https://doi.org/10.1007/978-1-4614-7138-7
Chapter Google Scholar
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. arXiv:1412.6980 [cs] (2014).
S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet Google Scholar
M. Kommenda, G. Kronberger, M. Affenzeller, S.M. Winkler, B. Burlacu, Evolving simple symbolic regression models by multi-objective genetic programming, in Genetic Programming Theory and Practice, vol. XIV. Genetic and Evolutionary Computation (Springer, Ann Arbor, MI, 2015)
K. Krawiec, Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evol. Mach. 3(4), 329–343 (2002). https://doi.org/10.1023/A:1020984725014
Article MATH Google Scholar
K. Krawiec, On relationships between semantic diversity, complexity and modularity of programming tasks, in Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference (ACM, 2012), pp. 783–790. http://dl.acm.org/citation.cfm?id=2330272. Accessed 15 Oct 2019
K. Krawiec, Behavioral Program Synthesis with Genetic Programming, vol. 618 (Springer, Berlin, 2016)
Book Google Scholar
K. Krawiec, U.M. O’Reilly, Behavioral programming: a broader and more detailed take on semantic GP, in Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (ACM Press, 2014), pp. 935–942. https://doi.org/10.1145/2576768.2598288. http://dl.acm.org/citation.cfm?doid=2576768.2598288. Accessed 15 Oct 2019
A. Kumar, P. Sattigeri, A. Balakrishnan, Variational inference of disentangled latent concepts from unlabeled observations, in ICLR (2018). https://openreview.net/forum?id=H1kG7GZAW. Accessed 15 Oct 2019
W. La Cava, T. Helmuth, L. Spector, J.H. Moore, A probabilistic and multi-objective analysis of lexicase selection and $\varepsilon$-lexicase selection. Evolut. Comput. (2018). https://doi.org/10.1162/evco_a_00224
Article Google Scholar
W. La Cava, J. Moore, A general feature engineering wrapper for machine learning using ${\backslash }$epsilon-lexicase survival, in Genetic Programming (Springer, Cham, 2017), pp. 80–95. https://doi.org/10.1007/978-3-319-55696-3_6
W. La Cava, J.H. Moore, Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods, in GECCO ’17: Proceedings of the 2017 Genetic and Evolutionary Computation Conference (ACM, Berlin, Germany), pp. 961–968 (2017). https://doi.org/10.1145/3071178.3071215. arxiv:1703.06934
W. La Cava, J.H. Moore, Semantic variation operators for multidimensional genetic programming, in Proceedings of the 2019 Genetic and Evolutionary Computation Conference, GECCO ’19 (ACM, Prague, Czech Republic, 2019). https://doi.org/10.1145/3321707.3321776. arXiv:1904.08577
W. La Cava, S. Silva, K. Danai, L. Spector, L. Vanneschi, J.H. Moore, Multidimensional genetic programming for multiclass classification. Swarm Evolut. Comput. (2018). https://doi.org/10.1016/j.swevo.2018.03.015
Article Google Scholar
W. La Cava, T.R. Singh, J. Taggart, S. Suri, J.H. Moore, Learning concise representations for regression by evolving networks of trees, in International Conference on Learning Representations, ICLR (2019). arxiv:1807.00981 (in press)
W. La Cava, L. Spector, K. Danai, Epsilon-lexicase selection for regression, in Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16 (ACM, New York, NY, USA, 2016), pp. 741–748. https://doi.org/10.1145/2908812.2908898
Q. Le, B. Zoph, Using machine learning to explore neural network architecture (2017). https://ai.googleblog.com/2017/05/using-machine-learning-to-explore.html. Accessed 15 Oct 2019
C. Liu, B. Zoph, J. Shlens, W. Hua, L.J. Li, L. Fei-Fei, A. Yuille, J. Huang, K. Murphy, Progressive neural architecture search. arXiv preprint arXiv:1712.00559 (2017)
T. McConaghy, FFX: Fast, scalable, deterministic symbolic regression technology, in Genetic Programming Theory and Practice IX, ed. by R. Riolo, E. Vladislavleva, J.H. Moore (Springer, Berlin, 2011), pp. 235–260. https://doi.org/10.1007/978-1-4614-1770-5_13
Chapter Google Scholar
D. Medernach, J. Fitzgerald, R.M.A. Azad, C. Ryan, A new wave: a dynamic approach to genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16 (ACM, New York, NY, USA, 2016), pp. 757–764. https://doi.org/10.1145/2908812.2908857
V.V. de Melo, W. Banzhaf, Automatic feature engineering for regression models with machine learning: an evolutionary computation and statistics hybrid. Inf. Sci. (2017). https://doi.org/10.1016/j.ins.2017.11.041
Article Google Scholar
G. Montavon, K.R. Müller, Better representations: invariant, disentangled and reusable, in Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, ed. by G. Montavon, K.R. Müller (Springer, Berlin, 2012), pp. 559–560
Chapter Google Scholar
A. Moraglio, K. Krawiec, C.G. Johnson, Geometric semantic genetic programming, in Parallel Problem Solving from Nature-PPSN XII (Springer, 2012), pp. 21–31. http://link.springer.com/chapter/10.1007/978-3-642-32937-1_3. Accessed 15 Oct 2019
M. Muharram, G.D. Smith, Evolutionary constructive induction. IEEE Trans. Knowl. Data Eng. 17(11), 1518–1528 (2005)
Article Google Scholar
L. Muñoz, S. Silva, L. Trujillo, M3gp—multiclass classification with GP, in Genetic Programming (Springer, 2015), pp. 78–91. http://link.springer.com/chapter/10.1007/978-3-319-16501-1_7. Accessed 15 Oct 2019
L. Muñoz, L. Trujillo, S. Silva, M. Castelli, L. Vanneschi, Evolving multidimensional transformations for symbolic regression with M3gp. Memet. Comput. (2018). https://doi.org/10.1007/s12293-018-0274-5
Article Google Scholar
K. Neshatian, M. Zhang, P. Andreae, A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evolut. Comput. 16(5), 645–661 (2012). (ZSCC: 0000081)
Article Google Scholar
R.M. O’brien, A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41(5), 673–690 (2007). https://doi.org/10.1007/s11135-006-9018-6. (ZSCC: 0005201)
Article Google Scholar
R.S. Olson, W. La Cava, P. Orzechowski, R.J. Urbanowicz, J.H. Moore, PMLB: A large benchmark suite for machine learning evaluation and comparison. BioData Mining (2017). ArXiv preprint arXiv:1703.00512
P. Orzechowski, W. La Cava, J.H. Moore, Where are we now? A large benchmark study of recent symbolic regression methods. arXiv:1804.09331 [cs] (2018). https://doi.org/10.1145/3205455.3205539.
T.P. Pawlak, B. Wieloch, K. Krawiec, Semantic backpropagation for designing search operators in genetic programming. IEEE Trans. Evol. Comput. 19(3), 326–340 (2015)
Article Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
H. Pham, M.Y. Guan, B. Zoph, Q.V. Le, J. Dean, Efficient neural architecture search via parameter sharing. ArXiv preprint arXiv:1802.03268 (2018)
E. Real, Using evolutionary AutoML to discover neural network architectures (2018). https://ai.googleblog.com/2018/03/using-evolutionary-automl-to-discover.html. Accessed 15 Oct 2019
E. Real, S. Moore, A. Selle, S. Saxena, Y.L. Suematsu, J. Tan, Q. Le, A. Kurakin, Large-scale evolution of image classifiers. arXiv:1703.01041 [cs] (2017)
M. Schmidt, H. Lipson, Age-fitness pareto optimization, in Genetic Programming Theory and Practice VIII (Springer, 2011), pp. 129–146. http://link.springer.com/chapter/10.1007/978-1-4419-7747-2_8. Accessed 15 Oct 2019
D. Searson, M. Willis, G. Montague, Co-evolution of non-linear PLS model components. J. Chemom. 21(12), 592–603 (2007). https://doi.org/10.1002/cem.1084
Article Google Scholar
D.P. Searson, D.E. Leahy, M.J. Willis, GPTIPS: an open source genetic programming toolbox for multigene symbolic regression, in Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1 (IMECS, Hong Kong, 2010), pp. 77–80
S. Silva, L. Munoz, L. Trujillo, V. Ingalalli, M. Castelli, L. Vanneschi, Multiclass classification through multidimensional clustering, in Genetic Programming Theory and Practice XIII, vol. 13 (Springer, Ann Arbor, MI, 2015)
L. Spector, Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report, in Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion (2012), pp. 401–408. http://dl.acm.org/citation.cfm?id=2330846. Accessed 15 Oct 2019
K.O. Stanley, Compositional pattern producing networks: a novel abstraction of development. Genet. Program. Evolvable Mach. 8(2), 131–162 (2007). https://doi.org/10.1007/s10710-007-9028-8
Article Google Scholar
K.O. Stanley, J. Clune, J. Lehman, R. Miikkulainen, Designing neural networks through neuroevolution. Nat. Mach. Intell. 1(1), 24 (2019). https://doi.org/10.1038/s42256-018-0006-z
Article Google Scholar
K.O. Stanley, D.B. D’Ambrosio, J. Gauci, A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009). https://doi.org/10.1162/artl.2009.15.2.15202
Article Google Scholar
K.O. Stanley, R. Miikkulainen, Evolving neural networks through augmenting topologies. Evolut. Comput. 10(2), 99–127 (2002). https://doi.org/10.1162/106365602320169811
Article Google Scholar
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996)
MathSciNet MATH Google Scholar
R. Tibshirani, T. Hastie, B. Narasimhan, G. Chu, Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002). https://doi.org/10.1073/pnas.082099299
Article Google Scholar
L. Vanneschi, M. Castelli, L. Manzoni, K. Krawiec, A. Moraglio, S. Silva, I. Gonçalves, PSXO: population-wide semantic crossover, in Proceedings of the Genetic and Evolutionary Computation Conference Companion (ACM, 2017), pp. 257–258
E. Vladislavleva, G. Smits, D. den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009). https://doi.org/10.1109/TEVC.2008.926486
Article Google Scholar
W. Whitney, Disentangled representations in neural models. arXiv:1602.02383 [cs] (2016).
B. Zoph, Q.V. Le, Neural architecture search with reinforcement learning (2016). https://arxiv.org/abs/1611.01578

Download references

Acknowledgements

This work was supported by NIH Grants K99LM012926-01A1, AI116794 and LM012601, as well as the PA CURE Grant from the Pennsylvania Department of Health. Special thanks to Tilak Raj Singh and other members of the Computational Genetics Lab at the University of Pennsylvania.

Author information

Authors and Affiliations

University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, USA
William La Cava & Jason H. Moore

Authors

William La Cava
View author publications
You can also search for this author in PubMed Google Scholar
Jason H. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William La Cava.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Additional experiment information

Table 6 details the hyperparameters for each method used in the experimental results described in Sects. 4 and 5.

Table 6 Comparison methods and their hyperparameters for the comparisons in Sect. 4.2. Tuned values denoted with brackets

Full size table

1.2 Comparison of selection algorithms

Our initial analysis sought to determine how different SO approaches performed within this framework. We tested five methods: (1) NSGA2, (2) Lex, (3) LexNSGA2, (4) Simulated annealing, and (5) random search. The simulated annealing and random search approaches are described below.

Simulated annealing Simulated annealing (SimAnn) is a non-evolutionary technique that instead models the optimization process on the metallurgical process of annealing. In our implementation, offspring compete with their parents; in the case of multiple parents, offspring compete with the program with which they share more nodes. The probability of an offspring replacing its parent in the population is given by the equation

$$\begin{aligned} P_{sel}(n_o | n_p, t) = \exp {\left( \frac{F(n_p) - F(n_o)}{t}\right) } \end{aligned}$$

(7)

The probability of offspring replacing its parent is a function of its fitness, F, in our case the mean squared loss of the candidate model. In Eq. 7, t is a scheduling parameter that controls the rate of “cooling”, i.e. the rate at which steps in the search space that are worse are tolerated by the update rule. In accordance with [34], we use an exponential schedule for t, defined as $t_{g} = (0.9)^gt_0$ , where g is the current generation and t0 is the starting temperature. t0 is set to 10 in our experiments.

Random search We compare the selection and survival methods to random search, in which no assumptions are made about the structure of the search space. To conduct random search, we randomly sample ${\mathbb {S}}$ using the initialization procedure. Since FEAT begins with a linear model of the process, random search will produce a representation at least as good as this initial model on the internal validation set.

A note on archiving When FEAT is used without a complexity-aware survival method (i.e., with Lex, SimAnn, Random), a separate population is maintained that acts as an archive. The archive maintains a Pareto front according to minimum loss and complexity (Eq. 3). At the end of optimization, the archive is tested on a small hold-out validation set. The individual with the lowest validation loss is the final selected model. Maintaining this archive helps protect against overfitting resulting from overly complex/high capacity representations, and also can be interpreted directly to help understand the process being modelled.

We benchmarked these approaches in a separate experiment on 88 datasets from PMLB [60]. The results are shown in Figs. 13, 14, 15 and 16. Considering Figs. 13 and 14, we see that LexNSGA2 achieves the best average $R^2$ value while producing small solutions in comparison to Lex. NSGA2, SimAnneal, and Random search all produce less accurate models. The runtime comparisons of the methods in Fig. 15 show that they are mostly within an order of magnitude, with NSGA2 being the fastest (due to its maintenance of small representations) and Random search being the slowest, suggesting that it maintains large representations during search. The computational behavior of Random search suggests the variation operators tend to increase the average size of solutions over many iterations.

1.3 Illustrative example

We show an illustrative example of the final archive and model selection process from applying FEAT to a galaxy visualization dataset [8] in Fig. 17. The red and blue points correspond to training and validation scores for each archived representation with a square denoting the final model selection. Five of the representations are printed in plain text, with each feature separated by brackets. The vertical lines in the left figure denote the test scores for FEAT, RF and ElasticNet. It is interesting to note that ElasticNet performance roughly matches the performance of a linear representation, and the RF test performance corresponds to the representation $[\tanh (x_0)][\tanh (x_1)]$ that is suggestive of axis-aligned splits for $x_0$ and $x_1$. The selected model is shown on the right, with the features sorted according to the magnitudes of $\beta$ in the linear model. The final representation combines tanh, polynomial, linear and interacting features. This representation is a clear extension of simpler ones in the archive, and the archive thereby serves to characterize the improvement in predictive accuracy brought about by increasing complexity. Although a mechanistic interpretation requires domain expertise, the final representation is certainly concise and amenable to interpretation.

1.4 Statistical comparisons

We perform pairwise comparisons of methods according to the procedure recommended by Demšar [14] for comparing multiple estimators (Table 7). In Table 8, the CV $R^2$ rankings are compared. In Table 9, the best model size rankings are compared. Note that KernelRidge is omitted from the size comparisons since we don’t have a comparable way of measuring the model size.

Table 7 Algorithms from Orzechowski et. al. [61] with their parameter settings

Full size table

Table 8 Bonferroni-adjusted p values using a Wilcoxon signed rank test of R$^2$ scores for the FEAT variants across all benchmarks

Full size table

Table 9 Bonferroni-adjusted p values using a Wilcoxon signed rank test of sizes for the FEAT variants across all benchmarks

Full size table

Table 10 Bonferroni-adjusted p values using a Wilcoxon signed rank test of MSE scores for the methods across all benchmarks

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

La Cava, W., Moore, J.H. Learning feature spaces for regression with genetic programming. Genet Program Evolvable Mach 21, 433–467 (2020). https://doi.org/10.1007/s10710-020-09383-4

Download citation

Received: 15 October 2019
Revised: 17 January 2020
Published: 11 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10710-020-09383-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning feature spaces for regression with genetic programming

Abstract

Access this article

Similar content being viewed by others

Genetic Programming for Classification and Feature Selection

Novel Approach for Feature Selection Using Genetic Algorithm

Econometric Genetic Programming in Binary Classification: Evolving Logistic Regressions Through Genetic Programming

References

Acknowledgements