Efficient genomic selection using ensemble learning and ensemble feature reduction

Banerjee, Rohan; Marathi, Balram; Singh, Manish

doi:10.1007/s12892-020-00039-4

Efficient genomic selection using ensemble learning and ensemble feature reduction

Original Research
Published: 29 April 2020

Volume 23, pages 311–323, (2020)
Cite this article

Journal of Crop Science and Biotechnology Aims and scope Submit manuscript

Rohan Banerjee¹,
Balram Marathi² &
Manish Singh¹

396 Accesses
9 Citations
Explore all metrics

Abstract

Genomic selection (GS) is a popular breeding method that uses genome-wide markers to predict plant phenotypes. Empirical studies and simulations have shown that GS can greatly accelerate the breeding cycle, beyond what is possible with traditional quantitative trait locus (QTL) approaches. GS is a regression problem, where one often uses SNPs to predict the phenotypes. Since the SNP data are extremely high-dimensional, of the order of 100 K dimensions, it is difficult to make accurate phenotypic predictions. Moreover, finding the optimal prediction model is computationally very costly. Out of thousands of SNPs, usually only a few influence a particular phenotypic trait. We first of all show how ensemble-based regression techniques give better prediction accuracy compared to traditional regression methods, which have been used in existing papers. We then further improve the prediction accuracy by using an ensemble of feature selection and feature extraction techniques, which also reduces the time to compute the regression model parameters. We predict three traits: grain yield, time to 50% flowering and plant height for which the existing methods give an accuracy of 0.304, 0.627 and 0.341, respectively. Our proposed regression model gives an accuracy of 0.330, 0.674 and 0.458 for these traits. Additionally, we also propose a computationally efficient regression model that reduces the computation time by as much as 90% and gives an accuracy of 0.342, 0.580 and 0.411, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on ensemble learning

Article 30 August 2019

Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making

Article Open access 11 January 2024

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Aggarwal CC (2015) Data mining: the textbook. Springer Publishing Company, Berlin
Google Scholar
Alpaydin E (2004) Introduction to machine learning (OIP). MIT Press, Cambridge
Google Scholar
Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I et al (2015) Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci Rep 5:1
Google Scholar
Beukert U, Li Z, Liu G, Zhao Y, Ramachandra N, Mirdita V et al (2017) Genome-based identification of heterotic patterns in rice. Rice 10:1
Article Google Scholar
Bishop CM (2016) Pattern recognition and machine learning. Springer, New York
Google Scholar
Blondel M, Onogi A, Iwata H, Ueda N (2015) A ranking approach to genomic selection. PLoS ONE 10:6
Article Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (KDD’16). ACM, New York, pp 785–794
Collard BC, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc B Biol Sci 363(1491):557–572
Article CAS Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on international conference on machine learning (ICML'96), pp 148–156
González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J (2018) Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. Plant Genome 11:2
Article Google Scholar
Gregorio GB, Islam MR, Vergara GV, Thirumeni S (2013) Recent advances in rice science to design salinity and other abiotic stress tolerant rice varieties. SABRAO J Breed Genetics 45(1):31–40
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning, pp 359–366
Hastie T, Tibshirani R, Friedman JH (2017) The elements of statistical learning: data mining, inference, and prediction. Springer, New York, NY
Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2017) An introduction to statistical learning with applications in R. Springer, New York
Google Scholar
Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9(2):166–177
Article CAS PubMed Google Scholar
Jena KK, Mackill DJ (2008) Molecular markers and their use in marker-assisted selection in rice. Crop Sci 48(4):1266
Article Google Scholar
Kadam DC, Potts SM, Bohn MO, Lipka AE, Lorenz AJ (2016) Genomic prediction of single crosses in the early stages of a maize hybrid breeding pipeline. G3 (Bethesda) 6(11):3443–3453
Article Google Scholar
Khush GS (2005) IR varieties and their impact. International Rice Research Inst, Los Baños
Google Scholar
Mackill DJ, Coffman WR, Garrity DP (1996) Rainfed lowland rice improvement. IRRI, Manila
Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
CAS PubMed PubMed Central Google Scholar
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Peng S, Khushg G (2003) Four Decades of breeding for varietal improvement of irrigated lowland rice in the international rice research institute. Plant Prod Sci 6(3):157–164
Article Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article CAS PubMed Google Scholar
Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E et al (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11:2
Google Scholar
Wang X, Xu Y, Hu Z, Xu C (2018) Genomic selection methods for crop improvement: current status and prospects. Crop J 6:330–340
Article Google Scholar
Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14(7):507–515
Article CAS PubMed PubMed Central Google Scholar

Download references

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Indian Institute of Technology Hyderabad, Hyderabad, Telangana, India
Rohan Banerjee & Manish Singh
Institute of Biotechnology, Professor Jayashankar Telangana State Agricultural University, Hyderabad, Telangana, India
Balram Marathi

Authors

Rohan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Balram Marathi
View author publications
You can also search for this author in PubMed Google Scholar
Manish Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Banerjee.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, R., Marathi, B. & Singh, M. Efficient genomic selection using ensemble learning and ensemble feature reduction. J. Crop Sci. Biotechnol. 23, 311–323 (2020). https://doi.org/10.1007/s12892-020-00039-4

Download citation

Accepted: 13 April 2020
Published: 29 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s12892-020-00039-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient genomic selection using ensemble learning and ensemble feature reduction

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient genomic selection using ensemble learning and ensemble feature reduction

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation