Abstract
With the advent of next-generation DNA sequencing technology, the field of bioinformatics and computational biology is becoming increasingly complex and computationally intensive. The bioinformatics community faces the challenge of finding suitable methods to solve growing computational issues, for instance, processing of massive volumes of DNA sequences. Such method can be found in the field of high-performance computing through parallel processing. In this paper we have proposed parallel approach which is built on top of modified VSM. The proposed method is parallelized computation on a number of available processing cores in order to minimize computation time and support analysis of a large number of DNA sequences analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bald, P., Baronio, R., Cristofaro, E. D., Gasti, P., & Tsudik, G. (2000). Efficient and secure testing of fully-sequenced human genomes. Biological Sciences Initiative, 470, 7–10.
Memeti, S., & Pllana, S. 2016. Analyzing large-scale DNA sequences on multi-core architectures. Proceedings – IEEE 18th international conference on computational science and engineering CSE 2015, pp. 208–215.
Ogheneovo, E. E., & Japheth, R. B. (2016). Application of vector space model to query ranking and information retrieval. International Journal of Advanced Research in Computer Science and Software Engineering, 6(5), 42–47.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195–197.
Dereeper, A., Audic, S., Claverie, J.-M., & Blanc, G. (2010). BLAST-EXPLORER helps you building datasets for phylogenetic analysis. BMC Evolutionary Biology, 10(1), 8.
Abual-Rub, M., Abdullah, R., & Rashid, N. (2007). A modified vector space model for protein retrieval. International Journal of Computer Science and Network Security, 7(9), 85–89.
Patel, S., Panchal, H., & Anjaria, K. (2012). DNA sequence analysis by ORF FINDER amp; GENOMATIX tool: Bioinformatics analysis of some tree species of Leguminosae family, in 2012 IEEE international conference on bioinformatics and biomedicine workshops, pp. 922–926.
Vandin, F., Upfal, E., & Raphael, B. J. (2012, March). Algorithms and Genome Sequencing : Identifying Driver Pathways in Cancer. IEEE Computer Magazine, 45(3), 39–46.
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. Nucleic Acids Research, 41(D1 November 2012), 36–42.
de Almeida, T. J. B. M., & Roma, N. F. V. (2010, February). A Parallel Programming Framework for Multi-core DNA Sequence Alignment, 2010 international conference on Complex, Intelligent and Software Intensive Systems (CISIS), 2010, pp. 907–912.
Marçais, G., & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6), 764–770.
Herath, D., Lakmali, C., Ragel, R. (2012, March). Accelerating string matching for bio-computing applications on multi-core CPUs. IEEE 7th, Int. Conf. Ind. Inf. Syst. ICIIS 2012.
Takeuchi, T., Yamada, A., Aoki, T., & Nishimura, K. (2016). cljam: A library for handling DNA sequence alignment/map (SAM) with parallel processing. Source Code for Biology and Medicine, 11, 1–4.
Manning, C. D., Raghavan, P., & Schütze, H. (2008), An introduction to information retrieval, Cambridge University Press, 2008.
Raghavan, V. V., & Wong, S. K. M. (1986). A critical analysis of vector space model for information retrieval. Journal of the American Society for Information Science, 37(5), 279--287.
Singhal, A. (2001). Modern information retrieval : A brief overview. IEEE Data Engineering Bulletin, 24, 35–43.
Castells, P., Fernandez, M., & Vallet, D. (Feb. 2007). An adaptation of the vector-space model for ontology-based information retrieval. IEEE Transactions on Knowledge and Data Engineering, 19(2), 261–272.
Sarkar, I. N. (2012). A vector space model approach to identify genetically related diseases. Journal of the American Medical Informartion Association, 19(2), 249–254.
“NCBI,” National Center for Biotechnology Information. [Online]. Available: https://www.ncbi.nlm.nih.gov/. Accessed 26 Jan 2017.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Majid, A., Khan, M., Khan, M., Ahmad, J., Li, M., Paracha, R.Z. (2019). Parallel Computation on Large-Scale DNA Sequences. In: Khan, F., Jan, M., Alam, M. (eds) Applications of Intelligent Technologies in Healthcare. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-96139-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-96139-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96138-5
Online ISBN: 978-3-319-96139-2
eBook Packages: EngineeringEngineering (R0)