Skip to main content

Advertisement

Log in

New clustering methods for population comparison on paternal lineages

  • Methodology Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

The goal of this study is to show two new clustering and visualising techniques developed to find the most typical clusters of 18-dimensional Y chromosomal haplogroup frequency distributions of 90 Western Eurasian populations. The first technique called “self-organizing cloud (SOC)” is a vector-based self-learning method derived from the Self Organising Map and non-metric Multidimensional Scaling algorithms. The second technique is a new probabilistic method called the “maximal relation probability” (MRP) algorithm, based on a probability function having its local maximal values just in the condensation centres of the input data. This function is calculated immediately from the distance matrix of the data and can be interpreted as the probability that a given element of the database has a real genetic relation with at least one of the remaining elements. We tested these two new methods by comparing their results to both each other and the k-medoids algorithm. By means of these new algorithms, we determined 10 clusters of populations based on the similarity of haplogroup composition. The results obtained represented a genetically, geographically and historically well-interpretable picture of 10 genetic clusters of populations mirroring the early spread of populations from the Fertile Crescent to the Caucasus, Central Asia, Arabia and Southeast Europe. The results show that a parallel clustering of populations using SOC and MRP methods can be an efficient tool for studying the demographic history of populations sharing common genetic footprints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S et al (2011) Parallel evolution of genes and languages in the Caucasus region. Mol Biol Evol 28(10):2905–2920

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Ben-Israel A, Iyigun C (2007) Probabilistic D-Clustering, J Classif 25 doi:10.1007/s00357-007-0021-y

  • Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Bíró AZ, Zalán A, Völgyi A, Pamjav H (2009) A Y-chromosomal comparison of the Madjars (Kazakhstan) and the Magyars (Hungary). Am J Phys Anthropol 139(3):305–310

    Article  PubMed  Google Scholar 

  • Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Spinger, New York

    Google Scholar 

  • Breuel TM (2001) Classification by probabilistic clustering, Acoustics, Speech, and Signal Processing, Proc. (ICASSP ‘01) IEEE International Conference on IEEE International Conference (Volume:2) pp. 1333–1336

  • Capelli C, Redhead N, Romano V, Calì F, Lefranc G, Delague V (2005) Population structure in the mediterranean basin: a Y chromosome perspective. Ann Hum Genet 70((Pt 2)):207–225

    Google Scholar 

  • Cavalli-Sforza LL (1966) Population structure and human evolution. Proc R Soc Lond Ser B 164:362–379

    Article  CAS  Google Scholar 

  • Chiaroni J, Underhill PA, Cavalli-Sforza LL (2009) Y chromosome diversity, human expansion, drift, and cultural evolution. Proc Natl Acad Sci 106(48):20174–20179

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Chikhi L, Nichols RA, Barbujani G, Beaumont MA (2002) Y genetic data support the Neolithic demic diffusion model. Proc Natl Acad Sci 99(17):11008–11013

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Childe G (1942) What happened in history. Penguin books, Harmondsworth

    Google Scholar 

  • Childe G (1960) Vorgeschichte der europäischen Kultur. Rowohlt, Hamburg

    Google Scholar 

  • Cruciani F, Trombetta B, Massaia A, Destro-Bisol G, Sellitto D, Scozzari R (2011) A revised root for the human Y chromosomal phylogenetic tree: the origin of patrilineal diversity in Africa. Am J Hum Genet 88(6):814–818

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Demartines P, H´erault j (1997) Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets. IEEE Trans Neural Networks 8(1):148–154

    Article  CAS  Google Scholar 

  • Diaz-Lacava A, Walier M, Willuweit S, Wienker TF, Fimmers R, Baur MP, Roewer L (2011) Geostatistical inference of main Y-STR-haplotype groups in Europe. Forensic Sci Int Genet 5(2):91–94

    Article  CAS  PubMed  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Second international conference on knowledge discovery and data mining. AAAI Press, Portland, pp 226–231

    Google Scholar 

  • Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Res 10:564–567

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland

    Google Scholar 

  • Forgy EW (1965) Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometric Soc. Meetings, Riverside, California, 21

  • Gayden T, Cadenas AM, Regueiro M, Singh NB, Zhivotovsky LA, Underhill PA, Cavalli-Sforza LL, Herrera RJ (2007) The himalayas as a directional barrier to gene flow. Am J Hum Genet 80(5):884–894

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Goldstein DB, Schlotterer C (1999) Microsatellites: evolution and applications. Oxford University Press, Oxford

    Google Scholar 

  • Grugni V, Battaglia V, Kashani BH, Parolo S, Al-Zahery N, Achilli A et al (2012) Ancient migratory events in the middle east: new clues from the Y-Chromosome variation of modern Iranians. PLoS One 7(7):e41252

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Hancar F (1956) Das Pferd in prähistorischer und früher historischer Zeit, Wien

  • Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337–338

    Article  CAS  PubMed  Google Scholar 

  • Jancey RC (1966) Multidimensional group analysis. Austral J Bot 14:127–130

    Article  Google Scholar 

  • Jobling MA, Tyler-Smith C (2003) The human Y chromosome: an evolutionary marker comes of age. Nat Rev Genet 4(8):598–612

    Article  CAS  PubMed  Google Scholar 

  • Jombart T, Pontier D, Dufour AB (2009) Genetic markers in the playground of multivariate analysis. Heredity 102:330–341

    Article  CAS  PubMed  Google Scholar 

  • Juhász Z (2007) Analysis of melody roots in Hungarian folk music using self-organizing maps with adaptively weighted dynamic time warping. Appl Artif Intell 21(1):35–55

    Article  Google Scholar 

  • Juhász Z (2011) Low dimensional visualisation of folk music systems using the self organising cloud. Proceedings of the 12th International Society for Music Information Retrieval Conference. Miami (Florida), USA. October 24–28 pp. 299–304

  • Kanaya S, Kinouchi M, Abe T, Kudo Y, Yamada Y, Nishi T, Mori H, Ikemura T (2001) Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene 276:89–99

    Article  CAS  PubMed  Google Scholar 

  • Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) “An efficient k-means clustering algorithm: analysis and implementation”. IEEE Trans Pattern Anal Mach Intell 24:881–892

    Article  Google Scholar 

  • Karun K, Isaac E (2013) Cogitative analysis on k-means clustering algorithm and its variants. Int J Adv Res Comp Communi Eng 2(4):1875–1880

    Google Scholar 

  • Kharkov VN, Stepanov VA, Medvedeva OF, Spiridonova MG, Voevoda MI, Tadinova VN, Puzyrev VP (2007) Gene pool differences between northern and southern Altaians inferred from the data on Y-chromosomal haplogroups. Russ J Genet 43(5):551–562

    Article  CAS  Google Scholar 

  • Kimura M, Weiss GH (1964) The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 49(4):561–576

    PubMed Central  CAS  PubMed  Google Scholar 

  • Kohonen T (1995) Self-organising maps. Springer-Verlag, Berlin

    Book  Google Scholar 

  • Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29:1–27

    Article  Google Scholar 

  • Kussmaul F (1952–53) Frühe Nomadenkulturen in Innerasien. Tribus, pp. 305–360

  • Lessa EP (1990) Multidimensional analysis of geographic genetic structure. Syst Zool 39:242–252

    Article  Google Scholar 

  • Li WH, Gouy M (1990) Statistical tests of molecular phylogenies. Methods Enzymol 183:645–659

    Article  CAS  PubMed  Google Scholar 

  • Mirabal S, Regueiro M, Cadenas AM, Cavalli-Sforza LL, Underhill PA, Verbenko DA, Limborska SA, Herrera RJ (2009) Y-chromosome distribution within the geo-linguistic landscape of northwestern Russia. Eur J Hum Genet 17(10):1260–1273

    Article  PubMed Central  PubMed  Google Scholar 

  • Morozova I, Evsyukov A, Kon’kov A, Grosheva A, Zhukova O, Rychkov S (2012) Russian ethnic history inferred from mitochondrial DNA diversity. Am J Phys Anthropol 147(3):341–351

    Article  PubMed  Google Scholar 

  • Myres NM, Rootsi S, Lin AA, Järve M, King RJ, Kutuev I et al (2011) A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe. Eur J Hum Genet 19(1):95–101

    Article  PubMed Central  PubMed  Google Scholar 

  • Nei M (1972) Genetic distance between populations. The American Naturalist, 106(949): 283-292. The University of Chicago Press

  • Nei M (1996) Phylogenetic analysis in molecular evolutionary genetics. Annu Rev Genet 30:371–403

    Article  CAS  PubMed  Google Scholar 

  • Nock R, Nielsen F (2006) On Weighting Clustering. IEEE Trans Pattern Anal Mach Intell 28(8):1–13

    Article  Google Scholar 

  • Pamjav H, Zalán A, Béres J, Nagy M, Chang YM (2011) Genetic structure of the paternal lineage of the Roma people. Am J Phys Anthropol 145(1):21–29

    Article  PubMed  Google Scholar 

  • Pamjav H, Juhász Z, Zalán A, Németh E, Damdin B (2012) A comparative phylogenetic study of genetics and folk music. Mol Genet Genomics 287(4):337–349

    Article  CAS  PubMed  Google Scholar 

  • Ray N, Currat M et al (2005) Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations. Genome Res 15(8):1161–1167

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Rootsi S, Myres NM, Lin AA, Järve M, King RJ, Kutuev I, Cabrera VM et al (2012) Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus. Eur J Hum Genet 20(12):1275–1282

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Singh SS, Chauhan, NC (2011) K-means v/s k-medoids: A comparative study. National conference on recent trends in engineering and technology 2011-bvmengineering.ac.in

  • Sanchez-Mazas A, Langaney A (1988) Common genetic pools between human populations. Hum Genet 78:161–166

    Article  CAS  PubMed  Google Scholar 

  • Scozzari R, Massaia A, D’Atanasio E, Myres NM, Perego UA, Trombetta B, Cruciani F (2012) Molecular dissection of the basal clades in the human Y chromosome phylogenetic tree. PLoS One 7(11):e49170

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • She JX, Autem M, Kotulas G, Pasteur N, Bonhomme F (1987) Multivariate analysis of genetic exchanges between Solea aegyptiaca and Solea senegalensis (Teleosts, Soleidae). Biol J Linnean Soc 32:357–371

    Article  Google Scholar 

  • Slatkin M (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457–462

    PubMed Central  CAS  PubMed  Google Scholar 

  • Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 96(6):2907–2912

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wawro N and Pigeot I (2008) Application of self-organizing Maps to detect population stratification. In: Shalabh, Heuman C (eds) Recent advances in linear models and related areas. Physica Verlag, Heidelberg, pp 368–445

  • Zupan A, Vrabec K, Glavač D (2013) The paternal perspective of the Slovenian population and its relationship with other populations. Ann Hum Biol 40(6):515–526

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported by the Hungarian National Research Foundation (Grant No. K81954). We would like to say special thanks to Dr. Eva Susa (General Director of the Network of Forensic Science Institutes) for her financial support. We are also grateful to Kinga Rudolf for the birdsong field recordings. We thank two unknown reviewers for their constructive comments and suggestions and Ati Rosselet and István Borsos for the English editing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Pamjav.

Additional information

Communicated by S. Xu.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Juhász, Z., Fehér, T., Bárány, G. et al. New clustering methods for population comparison on paternal lineages. Mol Genet Genomics 290, 767–784 (2015). https://doi.org/10.1007/s00438-014-0949-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-014-0949-7

Keywords

Navigation