Skip to main content
Log in

Improved detection algorithm for copy number variations based on hidden Markov model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Aiming at the problems of parameter optimization and insufficient utilization of split reads in the detection for copy number variation (CNV), a new definition of relative read depth (RRD) and a randomized sampling strategy (RGN) are proposed in this paper. Compared to the raw read depth, the RRD parameter has weak correlation with GC content, mappability and the width of analysis windows tiled along the genome. The RGN strategy is based on the weighted sampling strategy which can speed up the read count data analysis. Subsequently, we propose an improved detection algorithm for CNV based on hidden Markov model (CNV-HMM). The HMM detects the abnormal signal of read count data and outputs the detection results of candidate CNVs. At the end of the algorithm, we filter out the results of candidate CNVs using the split reads to improve the performance of CNV-HMM algorithm. Finally, the experiment results show that our CNV-HMM algorithm has higher sensitivity and accuracy for CNVs detection than most of current detection algorithms and applicative both for diploid animal and plant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abyzov A, Urban AE, Snyder M et al (2011) CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21(6):974–984

    Article  Google Scholar 

  2. Chen K, Wallis JW, McLellan MD et al (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 6(9):677–681

    Article  Google Scholar 

  3. Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J (2007) QuantiSNP: An objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35:2013–2025

    Article  Google Scholar 

  4. Ellingford JM, Barton S, Bhaskar S et al (2016) Whole genome sequencing increases molecular diagnostic yield compared with current diagnostic testing for inherited retinal disease. Ophthalmology 123:1143–1150

    Article  Google Scholar 

  5. Ellingford JM, Horn B, Campbell C et al (2018) Assessment of the incorporation of CNV surveillance into gene panel next-generation sequencing testing for inherited retinal diseases. J Med Genet 55:114–121

    Article  Google Scholar 

  6. Gonzalez E, Kulkarni H, Bolivar H et al (2005) The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307(5714):1434–1440

    Article  Google Scholar 

  7. Jiang Yuchao ODA, Diskin SJ et al (2015) CODEX: a normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res 43(6):e39

    Article  Google Scholar 

  8. Korbel JO, Abyzov A, Mu XJ et al (2009) PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome BioI 10(2):R23

    Article  Google Scholar 

  9. Lee K, Garg S (2015) Navigating the current landscape of clinical genetic testing for inherited retinal dystrophies. Genet Med 17:245–252

    Article  Google Scholar 

  10. Li J, Lupat R, Amarasinghe KC et al (2012) CONTRA: copy number analysis for targeted resequencing. Bioinformatics 28(10):1307–1313

    Article  Google Scholar 

  11. Ma P, Sun X (2015) Leveraging for big data regression. Wiley Interdisciplinary Reviews Computational Statistics 7:70–76

    Article  MathSciNet  Google Scholar 

  12. Magi A, Tattini L, Pippucci T, Torricelli F, Benelh M (2012) Read count approach for DNA copy number variants detection. Biomformatics 28(4):470–478

    Article  Google Scholar 

  13. McKernan KJ, Peckham HE, Costa GL et al (2009) Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res 19(9):1527–1541

    Article  Google Scholar 

  14. Miller CA, Hampton O, Coarfa C et al (2011) ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One 6(1):e16327

    Article  Google Scholar 

  15. Rabiner LR (1989) A Tutorial on Hidden Markov-Models and Selected Applications in Speech Recognition. Pleee 77(2):257–286

    Google Scholar 

  16. Rabiner LR, Juang BH (1986) An introduction to hidden Markov models. IEEE Acoustics, Speech and Signal Processing Society Magazine 3(1):4–16

    Google Scholar 

  17. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: astronomical or genomical? PLoS Biol 13(7):1002195

    Article  Google Scholar 

  18. Tan R, Wang Y, Kleinstein SE et al (2014) An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum Mutat 35(7):899–907

    Article  Google Scholar 

  19. Wang Jianmin MCG, Easton J et al (2011) CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods 8(8):652–654

    Article  Google Scholar 

  20. Wang WB, Sun W, Wang W, Szatkiewicz J (2018) A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection. BMC Bioinformatics 19:74–84

    Article  Google Scholar 

  21. Xie C, Tammi MT (2009) CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10:80

    Article  Google Scholar 

  22. Yoon BJ, Vaid Yana Than PP (2007) Computational identification and analysis of noncoding RNAs-unearthing the buried treasures in the genome. IEEE Signal Process Mag 24(1):64–74

    Article  Google Scholar 

  23. Yoon B J, Vaid Yana Than PP (2007) Fast search of sequences with complex symbol correlations using profile context-sensitive HMMS and pre-screening filters. ICASSP 2007, Hawaii, USA: IEEE Press, 1:345–348

  24. Yoon S, Xuan Z, Makarov V, Ye K, Sebat J (2009) Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 19(9):1586–1592

    Article  Google Scholar 

  25. Zeju L, Li Y et al (2007) Recognition of DNA sequences based on hidden Markov models. Journal of South China University of Technology: Natural Science Edition 35(8):123–126

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Zhu, D. Improved detection algorithm for copy number variations based on hidden Markov model. Multimed Tools Appl 79, 9237–9253 (2020). https://doi.org/10.1007/s11042-019-7368-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7368-z

Keywords

Navigation