Skip to main content

Advertisement

Log in

Supervised learning algorithms in the classification of plant populations with different degrees of kinship

  • Genetics & Evolutionary Biology - Short Communication
  • Published:
Brazilian Journal of Botany Aims and scope Submit manuscript

Abstract

The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity—comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Barbosa CD, Viana AP, Quintal SSR, Pereira MG (2011) Artificial neural network analysis of genetic diversity in Carica papaya L. Crop Breed Appl Biotechnol 11:224–231. https://doi.org/10.1590/S1984-70332011000300004

    Article  Google Scholar 

  • Barroso LMA, Nascimento M, Nascimento ACC, Silva FF, Ferreira RP (2013) Uso do método de Eberhart e Russell como informação a priori para aplicação de redes neurais artificiais e análise discriminante visando a classificação de genótipos de alfafa quanto à adaptabilidade e estabilidade fenotípica. Rev Bras Biometria 31:176–188

    Google Scholar 

  • Borém A, Miranda GV, Fritsche-Neto R (2017) Melhoramento de plantas. Editora UFV, Viçosa

    Google Scholar 

  • Carvalho VP, de Sousa IC, Nascimento M, Nascimento ACC, Cruz CD (2018b) Discrimination of populations under covariance matrix heterogeneity and non-normal random vectors in genetic diversity studies. Científica 46:344. https://doi.org/10.15361/1984-5529.2018v46n4p344-352

    Article  Google Scholar 

  • CarvalhoSant’Anna VPIC, Nascimento M, Nascimento ACC, Cruz CD, Arbex WA, Oliveira FC, Silva FF (2018) Support vector machines applied to the genetic classification problem of hybrid populations with high degrees of similarity. Genet Mol Res 17:1–10. https://doi.org/10.4238/gmr18122

    Article  Google Scholar 

  • Coppin B (2010) Inteligência artificial. LTC, Rio de Janeiro

    Google Scholar 

  • Cruz CD (2006) Programa Genes: Análise multivariada e simulações. UFV, Viçosa

    Google Scholar 

  • Cruz CD (2013) GENES—a software package for analysis in experimental statistics and quantitative genetics. Acta Sci Agron 35:271–276. https://doi.org/10.4025/actasciagron.v35i3.21251

    Article  Google Scholar 

  • Cruz CD, Ferreira FM, Pessoni LA (2011) Biometria aplicada ao estudo da diversidade, genética. Suprema, Visconde do Rio Branco

    Google Scholar 

  • Fonseca AFA, Sediyama T, Cruz CD, Sakiyama NS, Ferrão RG, Ferrão MAG, Bragança SM (2004) Discriminant analysis for the classification and clustering of robusta coffee genotypes. Crop Breed Appl Biotechnol 4:285–289. https://doi.org/10.12702/1984-7033.v04n03a04

    Article  Google Scholar 

  • Fuentes S, Hernández-Montes E, Escalona JM, Bota J, Gonzalez Viejo C, Poblete-Echeverría C, Tongson E, Medrano H (2018) Automated grapevine cultivar classification based on machine learning using leaf morpho-colorimetry, fractal dimension and near-infrared spectroscopy parameters. Comput Electron Agric 151:311–318. https://doi.org/10.1016/j.compag.2018.06.035

    Article  Google Scholar 

  • Li T, Zhu S, Ogihara M (2006) Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10:453–472. https://doi.org/10.1007/s10115-006-0013-y

    Article  Google Scholar 

  • Maurer HP, Melchinger AE, Frisch M (2008) Population genetic simulation and data analysis with Plabsoft. Euphytica 161:133–139. https://doi.org/10.1007/s10681-007-9493-4

    Article  Google Scholar 

  • Naik HS, Zhang J, Lofquist A, Assefa T, Sarkar S, Ackerman D, Singh A, Singh AK, Ganapathysubramanian B (2017) A real-time phenotyping framework using machine learning for plant stress severity rating in soybean. Plant Methods 13:23. https://doi.org/10.1186/s13007-017-0173-7

    Article  PubMed  PubMed Central  Google Scholar 

  • Nascimento M, Peternelli LA, Cruz CD, Nascimento ACC, Ferreira RP, Bhering LL, Salgado CC (2013) Artificial neural networks for adaptability and stability evaluation in alfalfa genotypes. Crop Breed Appl Biotechnol 13:152–156

    Article  Google Scholar 

  • Nei M (1972) Genetic distance between populations. Am Nat 106:283–292. https://doi.org/10.1086/282771

    Article  Google Scholar 

  • Odong TL, van Heerwaarden J, Jansen J, van Hintum TJL, van Eeuwijk FA (2011) Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data? Theor Appl Genet 123:195–205. https://doi.org/10.1007/s00122-011-1576-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Oliveira ACL, Pasqual M, Pio LAS, Lacerda WS, Silva SO (2013) Utilização da modelagem matemática (redes neurais artificiais) na classificação de autotetraploides de bananeira (Musa acuminata Colla). Biosci J 29:617–622

    Google Scholar 

  • Ornella L, Tapia E (2010) Supervised machine learning and heterotic classification of maize (Zea mays L.) using molecular marker data. Comput Electron Agric 74:250–257. https://doi.org/10.1016/j.compag.2010.08.013

    Article  Google Scholar 

  • Pereira TM (2009) Discriminação de populações com diferentes graus de similaridade por redes neurais artificiais. Dissertation, Federal University of Viçosa, Viçosa, Brazil

  • R Core Team (2018) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available online: https://www.R-project.org/

  • Sant’Anna IC, Tomaz RS, Silva GN, Nascimento M, Bhering LL, Cruz CD (2015) Superiority of artificial neural networks for a genetic classification procedure. Genet Mol Res 14:9898–9906. https://doi.org/10.4238/2015.August.19.24

    Article  CAS  PubMed  Google Scholar 

  • Silva GN, Tomaz RS, Sant’Anna IC, Carneiro VQ, Cruz CD, Nascimento M (2016) Evaluation of the efficiency of artificial neural networks for genetic value prediction. Genet Mol Res 15:1–11. https://doi.org/10.4238/gmr.15017676

    Article  CAS  Google Scholar 

  • Singh A, Ganapathysubramanian B, Singh AK, Sarkar S (2016) Machine Learning for High-Throughput Stress Phenotyping in Plants. Trends Plant Sci 21:110–124. https://doi.org/10.1016/j.tplants.2015.10.015

    Article  CAS  PubMed  Google Scholar 

  • Vasconcelos ESD, Cruz CD, Bhering LL, Resende Júnior MFR (2007) Método alternativo para análise de agrupamento. Pesquisa Agropecuária Brasileira 42:421–1428

    Google Scholar 

  • Zheng A (2015) Evaluating machine learning algorithms: a beginner’s guide to key concepts and pitfalls. O’Reilly Media, Sebastopol

    Google Scholar 

Download references

Acknowledgements

We thank and recognize the CNPq (National Council for Scientific and Technological Development) in addition to the CAPES (Coordination for the Improvement of Higher Education Personnel) for the research fellowships (Process PNPD/CAPES 88882.315120/2019-01) and financial support (Process CNPq 301840/2016-4).

Author information

Authors and Affiliations

Authors

Contributions

LS was involved in conceptualization, methodology, investigation, formal analysis, data curation, writing—original draft. PMM was involved in methodology, investigation. MLTM was involved in methodology, investigation WNG was involved in methodology and investigation. MC was involved in methodology and investigation. CSC was involved in methodology and investigation. WSF was involved in investigation, writing—review & editing. RBC was involved in funding acquisition, supervision, investigation, writing—review & editing.

Corresponding author

Correspondence to Leandro Skowronski.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Data archiving statement

Data used in current study are available in the Harvard Dataverse repository: Skowronski, Leandro, 2020, "Replication Data for: Machine learning in plant breeding," https://doi.org/10.7910/DVN/BCVCZVV, Harvard Dataverse, DRAFT VERSION

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Skowronski, L., de Moraes, P.M., de Moraes, M.L.T. et al. Supervised learning algorithms in the classification of plant populations with different degrees of kinship. Braz. J. Bot 44, 371–379 (2021). https://doi.org/10.1007/s40415-021-00703-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40415-021-00703-1

Keywords

Navigation