Improved detection algorithm for copy number variations based on hidden Markov model

Yang, Hai; Zhu, Daming

doi:10.1007/s11042-019-7368-z

Improved detection algorithm for copy number variations based on hidden Markov model

Published: 02 March 2019

Volume 79, pages 9237–9253, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hai Yang¹ &
Daming Zhu¹

345 Accesses
1 Citation
Explore all metrics

Abstract

Aiming at the problems of parameter optimization and insufficient utilization of split reads in the detection for copy number variation (CNV), a new definition of relative read depth (RRD) and a randomized sampling strategy (RGN) are proposed in this paper. Compared to the raw read depth, the RRD parameter has weak correlation with GC content, mappability and the width of analysis windows tiled along the genome. The RGN strategy is based on the weighted sampling strategy which can speed up the read count data analysis. Subsequently, we propose an improved detection algorithm for CNV based on hidden Markov model (CNV-HMM). The HMM detects the abnormal signal of read count data and outputs the detection results of candidate CNVs. At the end of the algorithm, we filter out the results of candidate CNVs using the split reads to improve the performance of CNV-HMM algorithm. Finally, the experiment results show that our CNV-HMM algorithm has higher sensitivity and accuracy for CNVs detection than most of current detection algorithms and applicative both for diploid animal and plant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Comparison and evaluation of statistical error models for scRNA-seq

Article Open access 18 January 2022

Hi-TOM: a platform for high-throughput tracking of mutations induced by CRISPR/Cas systems

Article 13 November 2018

Genome Sequencing

References

Abyzov A, Urban AE, Snyder M et al (2011) CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21(6):974–984
Article Google Scholar
Chen K, Wallis JW, McLellan MD et al (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 6(9):677–681
Article Google Scholar
Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J (2007) QuantiSNP: An objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35:2013–2025
Article Google Scholar
Ellingford JM, Barton S, Bhaskar S et al (2016) Whole genome sequencing increases molecular diagnostic yield compared with current diagnostic testing for inherited retinal disease. Ophthalmology 123:1143–1150
Article Google Scholar
Ellingford JM, Horn B, Campbell C et al (2018) Assessment of the incorporation of CNV surveillance into gene panel next-generation sequencing testing for inherited retinal diseases. J Med Genet 55:114–121
Article Google Scholar
Gonzalez E, Kulkarni H, Bolivar H et al (2005) The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307(5714):1434–1440
Article Google Scholar
Jiang Yuchao ODA, Diskin SJ et al (2015) CODEX: a normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res 43(6):e39
Article Google Scholar
Korbel JO, Abyzov A, Mu XJ et al (2009) PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome BioI 10(2):R23
Article Google Scholar
Lee K, Garg S (2015) Navigating the current landscape of clinical genetic testing for inherited retinal dystrophies. Genet Med 17:245–252
Article Google Scholar
Li J, Lupat R, Amarasinghe KC et al (2012) CONTRA: copy number analysis for targeted resequencing. Bioinformatics 28(10):1307–1313
Article Google Scholar
Ma P, Sun X (2015) Leveraging for big data regression. Wiley Interdisciplinary Reviews Computational Statistics 7:70–76
Article MathSciNet Google Scholar
Magi A, Tattini L, Pippucci T, Torricelli F, Benelh M (2012) Read count approach for DNA copy number variants detection. Biomformatics 28(4):470–478
Article Google Scholar
McKernan KJ, Peckham HE, Costa GL et al (2009) Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res 19(9):1527–1541
Article Google Scholar
Miller CA, Hampton O, Coarfa C et al (2011) ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One 6(1):e16327
Article Google Scholar
Rabiner LR (1989) A Tutorial on Hidden Markov-Models and Selected Applications in Speech Recognition. Pleee 77(2):257–286
Google Scholar
Rabiner LR, Juang BH (1986) An introduction to hidden Markov models. IEEE Acoustics, Speech and Signal Processing Society Magazine 3(1):4–16
Google Scholar
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: astronomical or genomical? PLoS Biol 13(7):1002195
Article Google Scholar
Tan R, Wang Y, Kleinstein SE et al (2014) An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum Mutat 35(7):899–907
Article Google Scholar
Wang Jianmin MCG, Easton J et al (2011) CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods 8(8):652–654
Article Google Scholar
Wang WB, Sun W, Wang W, Szatkiewicz J (2018) A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection. BMC Bioinformatics 19:74–84
Article Google Scholar
Xie C, Tammi MT (2009) CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10:80
Article Google Scholar
Yoon BJ, Vaid Yana Than PP (2007) Computational identification and analysis of noncoding RNAs-unearthing the buried treasures in the genome. IEEE Signal Process Mag 24(1):64–74
Article Google Scholar
Yoon B J, Vaid Yana Than PP (2007) Fast search of sequences with complex symbol correlations using profile context-sensitive HMMS and pre-screening filters. ICASSP 2007, Hawaii, USA: IEEE Press, 1:345–348
Yoon S, Xuan Z, Makarov V, Ye K, Sebat J (2009) Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 19(9):1586–1592
Article Google Scholar
Zeju L, Li Y et al (2007) Recognition of DNA sequences based on hidden Markov models. Journal of South China University of Technology: Natural Science Edition 35(8):123–126
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Qingdao, 266237, Shandong, China
Hai Yang & Daming Zhu

Authors

Hai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Daming Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Zhu, D. Improved detection algorithm for copy number variations based on hidden Markov model. Multimed Tools Appl 79, 9237–9253 (2020). https://doi.org/10.1007/s11042-019-7368-z

Download citation

Received: 11 December 2018
Revised: 03 February 2019
Accepted: 11 February 2019
Published: 02 March 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11042-019-7368-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved detection algorithm for copy number variations based on hidden Markov model

Abstract

Access this article

Similar content being viewed by others

Comparison and evaluation of statistical error models for scRNA-seq

Hi-TOM: a platform for high-throughput tracking of mutations induced by CRISPR/Cas systems

Genome Sequencing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved detection algorithm for copy number variations based on hidden Markov model

Abstract

Access this article

Similar content being viewed by others

Comparison and evaluation of statistical error models for scRNA-seq

Hi-TOM: a platform for high-throughput tracking of mutations induced by CRISPR/Cas systems

Genome Sequencing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation