Abstract
Next-generation sequencing techniques reduced the cost of sequencing a genome rapidly, but came with a relatively high error rate. Therefore, error correction of this data is a necessary task before assembly can take place. Since the input data is huge and error correction is compute intensive, parallelizing this work on a modern shared-memory system can help to keep the runtime feasible. In this work we present PAGANtec, a tool for error correction of next-generation sequencing data, based on the novel PAGAN graph structure. PAGANtec was parallelized with OpenMP and a performance analysis and tuning was done. The analysis led to the awareness, that OpenMP tasks are a more suitable paradigm for this work than traditional work-sharing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Badia, R.M., Martorell, X.: Tutorial OmpSs: single node programming. In: Parallel Programming Workshop (2013)
Bolger, A.M.: PAGAN Framework. Private Communication (2014)
Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30, 1–7 (2014)
Carrier, P., Long, B., Walsh, R., Dawson, J., Sosa, C.P., Haas, B., Tickle, T., William, T.: The impact of high-performance computing best practice applied to next-generation sequencing workflows. Technical report, April 2015. http://biorxiv.org/content/early/2015/04/07/017665.abstract
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Duran, A., Ayguade, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogenous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)
Georganas, E., Buluç, A., Chapman, J., Oliker, L., Rokhsar, D., Yelick, K.: Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly, pp. 437–448, November 2014
Intel: Intel VTune Amplifier XE 2013 (2013). https://software.intel.com/en-us/intel-vtune-amplifier-xe
Kaya, K., Hatem, A., Özer, H.G., Huang, K., Çatalyürek, U.V.: High-performance computing in high-throughput sequencing. In: Elloumi, M., Zomaya, A.Y. (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, Chap. 43, pp. 981–1002. Wiley, Hoboken (2013)
Kelley, D.R., Schatz, M.C., Salzberg, S.L.: Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11(11), R116 (2010)
Le, H.S., Schulz, M.H., McCauley, B.M., Hinman, V.F., Bar-Joseph, Z.: Probabilistic error correction for RNA sequencing. Nucleic Acids Res. 41(10), e109 (2013)
Liu, Y., Schmidt, B., Maskell, D.L.: DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI. BMC Bioinf. 12, 85 (2011)
Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics 95(6), 315–327 (2010)
NVIDIA: Tesla K40 and K80 GPU Accelerators for Servers, December 2014. http://www.nvidia.com/object/tesla-servers.html
RWTH Aachen: RWTH Compute Cluster, May 2015. https://doc.itc.rwth-aachen.de/display/CC/Hardware+of+the+RWTH+Compute+Cluster
Sachdeva, V., Kim, C., Jordan, K., Winn, M.: Parallelization of the trinity pipeline for De Novo transcriptome assembly. In: 2014 IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 566–575. IEEE, May 2014
Schmidt, B., Müller-Wittig, W.: Accelerating error correction in high-throughput short-read DNA sequencing data with CUDA. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE, May 2009
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Yang, X., Chockalingam, S.P., Aluru, S.: A survey of error-correction methods for next-generation sequencing. Briefings Bioinf. 14(1), 56–66 (2013)
Yang, X., Dorman, K.S., Aluru, S.: Reptile: representative tiling for short read error correction. Bioinformatics 26(20), 2526–2533 (2010). (Oxford, England)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Joppich, M., Schmidl, D., Bolger, A.M., Kuhlen, T., Usadel, B. (2015). PAGANtec: OpenMP Parallel Error Correction for Next-Generation Sequencing Data. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-24595-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24594-2
Online ISBN: 978-3-319-24595-9
eBook Packages: Computer ScienceComputer Science (R0)