Skip to main content

GenPress: A Novel Dictionary Based Method to Compress DNA Data of Various Species

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11432))

Included in the following conference series:

Abstract

There can be a data boom in the near future, due to cheaper methods make possible for everyone to keep their own DNA on their own device or on a central medical cloud. With the development of sequencing methods, we are able to get the sequences of more and more species. However the size of the human genome is about 3 GB for each person. And for other species it can be more.

The need is growing for the efficient compression of these data and general compressors can not reach a satisfying result. These are not aware of the special structure of these data. There are already some algorithms tried to reach smaller and smaller rates. In this paper, we would like to present our new method to accomplish this task.

Dr. Kiss was also with J. Selye University, Komárno, Slovakia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deorowicz, S., Grabowski, S.: Data compression for sequencing data. Algorithms Mol. Biol. 8(1), 25 (2013)

    Article  Google Scholar 

  2. Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Inf. Process. Manag. 30(6), 875–886 (1994)

    Article  Google Scholar 

  3. Rivals, E., Delahaye, J.-P., Dauchet, M., Delgrange, O.: A guaranteed compression scheme for repetitive DNA sequences. In: Proceedings of Data Compression Conference, DCC 1996, p. 453. IEEE (1996)

    Google Scholar 

  4. Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences and its applications in genome comparison. Genome Inform. 10, 51–61 (1999)

    Google Scholar 

  5. Matsumoto, T., Sadakane, K., Imai, H.: Biological sequence compression algorithms. Genome Inform. 11, 43–52 (2000)

    Google Scholar 

  6. Chen, X., Li, M., Ma, B., Tromp, J.: DNACompress: fast and effective DNA sequence compression. Bioinformatics 18(12), 1696–1698 (2002)

    Article  Google Scholar 

  7. Cherniavsky, N., Ladner, R.: Grammar-based compression of DNA sequences. DIMACS Working Group on The Burrows-Wheeler Transform, 21 (2004)

    Google Scholar 

  8. Behzadi, B., Le Fessant, F.: DNA compression challenge revisited: a dynamic programming approach. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 190–200. Springer, Heidelberg (2005). https://doi.org/10.1007/11496656_17

    Chapter  Google Scholar 

  9. Ferreira, P.J.S.G., Neves, A.J.R., Afreixo, V., Pinho, A.J.: Exploring three-base periodicity for DNA compression and modeling. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 5, p. V. IEEE (2006)

    Google Scholar 

  10. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_20

    Chapter  MATH  Google Scholar 

  11. Rajeswari, P.R., Apparo, A., Kumar, V.K.: Genbit Compress Tool (GBC): a Java-based tool to compress DNA sequences and compute compression ratio (bits/base) of genomes. arXiv preprint arXiv:1006.1193 (2010)

  12. Rajarajeswari, P., Apparao, A.: DNABit compress-genome compression algorithm. Bioinformation 5(8), 350 (2011)

    Article  Google Scholar 

  13. Kuruppu, S., Beresford-Smith, B., Conway, T., Zobel, J.: Iterative dictionary construction for compression of large DNA data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9(1), 137–149 (2012)

    Article  Google Scholar 

  14. Machhi, V., Patel, M.S.: Compression techniques applied to DNA data of various species. DNA Seq. 8(3) (2016)

    Google Scholar 

  15. Keerthy, A.S., Priya, S.M.: Lempel-Ziv-Welch compression of DNA sequence data with indexed multiple dictionaries. Int. J. Appl. Eng. Res. 12(16), 5610–5615 (2017)

    Google Scholar 

  16. Bockenhauer, H.-J., Bongartz, D.: Algorithmic Aspects of Bioinformatics. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71913-7

    Book  MATH  Google Scholar 

  17. Cavalier-Smith, T.: A revised six-kingdom system of life. Biol. Rev. 73(3), 203–266 (1998)

    Article  Google Scholar 

  18. Moreira, D., López-García, P.: Ten reasons to exclude viruses from the tree of life. Nat. Rev. Microbiol. 7(4), 306 (2009)

    Article  Google Scholar 

  19. Hegde, N.R., Maddur, M.S., Kaveri, S.V., Bayry, J.: Reasons to include viruses in the tree of life. Nat. Rev. Microbiol. 7(8), 615 (2009)

    Article  Google Scholar 

  20. NCBI National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/

  21. Ensembl genomes. http://ensemblgenomes.org/

Download references

Acknowledgment

The project was supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Péter Lehotay-Kéry .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lehotay-Kéry, P., Kiss, A. (2019). GenPress: A Novel Dictionary Based Method to Compress DNA Data of Various Species. In: Nguyen, N., Gaol, F., Hong, TP., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science(), vol 11432. Springer, Cham. https://doi.org/10.1007/978-3-030-14802-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14802-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14801-0

  • Online ISBN: 978-3-030-14802-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics