Skip to main content
Log in

Efficient genomic selection using ensemble learning and ensemble feature reduction

  • Original Research
  • Published:
Journal of Crop Science and Biotechnology Aims and scope Submit manuscript

Abstract

Genomic selection (GS) is a popular breeding method that uses genome-wide markers to predict plant phenotypes. Empirical studies and simulations have shown that GS can greatly accelerate the breeding cycle, beyond what is possible with traditional quantitative trait locus (QTL) approaches. GS is a regression problem, where one often uses SNPs to predict the phenotypes. Since the SNP data are extremely high-dimensional, of the order of 100 K dimensions, it is difficult to make accurate phenotypic predictions. Moreover, finding the optimal prediction model is computationally very costly. Out of thousands of SNPs, usually only a few influence a particular phenotypic trait. We first of all show how ensemble-based regression techniques give better prediction accuracy compared to traditional regression methods, which have been used in existing papers. We then further improve the prediction accuracy by using an ensemble of feature selection and feature extraction techniques, which also reduces the time to compute the regression model parameters. We predict three traits: grain yield, time to 50% flowering and plant height for which the existing methods give an accuracy of 0.304, 0.627 and 0.341, respectively. Our proposed regression model gives an accuracy of 0.330, 0.674 and 0.458 for these traits. Additionally, we also propose a computationally efficient regression model that reduces the computation time by as much as 90% and gives an accuracy of 0.342, 0.580 and 0.411, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aggarwal CC (2015) Data mining: the textbook. Springer Publishing Company, Berlin

    Google Scholar 

  • Alpaydin E (2004) Introduction to machine learning (OIP). MIT Press, Cambridge

    Google Scholar 

  • Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I et al (2015) Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci Rep 5:1

    Google Scholar 

  • Beukert U, Li Z, Liu G, Zhao Y, Ramachandra N, Mirdita V et al (2017) Genome-based identification of heterotic patterns in rice. Rice 10:1

    Article  Google Scholar 

  • Bishop CM (2016) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  • Blondel M, Onogi A, Iwata H, Ueda N (2015) A ranking approach to genomic selection. PLoS ONE 10:6

    Article  Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (KDD’16). ACM, New York, pp 785–794

  • Collard BC, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc B Biol Sci 363(1491):557–572

    Article  CAS  Google Scholar 

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on international conference on machine learning (ICML'96), pp 148–156

  • González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J (2018) Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome 11:2

    Article  Google Scholar 

  • Gregorio GB, Islam MR, Vergara GV, Thirumeni S (2013) Recent advances in rice science to design salinity and other abiotic stress tolerant rice varieties. SABRAO J Breed Genetics 45(1):31–40

    Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  • Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning, pp 359–366

  • Hastie T, Tibshirani R, Friedman JH (2017) The elements of statistical learning: data mining, inference, and prediction. Springer, New York, NY

    Google Scholar 

  • James G, Witten D, Hastie T, Tibshirani R (2017) An introduction to statistical learning with applications in R. Springer, New York

    Google Scholar 

  • Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9(2):166–177

    Article  CAS  PubMed  Google Scholar 

  • Jena KK, Mackill DJ (2008) Molecular markers and their use in marker-assisted selection in rice. Crop Sci 48(4):1266

    Article  Google Scholar 

  • Kadam DC, Potts SM, Bohn MO, Lipka AE, Lorenz AJ (2016) Genomic prediction of single crosses in the early stages of a maize hybrid breeding pipeline. G3 (Bethesda) 6(11):3443–3453

    Article  Google Scholar 

  • Khush GS (2005) IR varieties and their impact. International Rice Research Inst, Los Baños

    Google Scholar 

  • Mackill DJ, Coffman WR, Garrity DP (1996) Rainfed lowland rice improvement. IRRI, Manila

    Google Scholar 

  • Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    CAS  PubMed  PubMed Central  Google Scholar 

  • Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Peng S, Khushg G (2003) Four Decades of breeding for varietal improvement of irrigated lowland rice in the international rice research institute. Plant Prod Sci 6(3):157–164

    Article  Google Scholar 

  • Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  CAS  PubMed  Google Scholar 

  • Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E et al (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11:2

    Google Scholar 

  • Wang X, Xu Y, Hu Z, Xu C (2018) Genomic selection methods for crop improvement: current status and prospects. Crop J 6:330–340

    Article  Google Scholar 

  • Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14(7):507–515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohan Banerjee.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banerjee, R., Marathi, B. & Singh, M. Efficient genomic selection using ensemble learning and ensemble feature reduction. J. Crop Sci. Biotechnol. 23, 311–323 (2020). https://doi.org/10.1007/s12892-020-00039-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12892-020-00039-4

Keywords

Navigation