A Deep Learning Network for Exploiting Positional Information in Nucleosome Related Sequences

Di Gangi, Mattia Antonino; Gaglio, Salvatore; La Bua, Claudio; Lo Bosco, Giosué; Rizzo, Riccardo

doi:10.1007/978-3-319-56154-7_47

Mattia Antonino Di Gangi^15,16,
Salvatore Gaglio¹⁷,
Claudio La Bua¹⁷,
Giosué Lo Bosco^18,19 &
…
Riccardo Rizzo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1927 Accesses
8 Citations

Abstract

A nucleosome is a DNA-histone complex, wrapping about 150 pairs of double-stranded DNA. The role of nucleosomes is to pack the DNA into the nucleus of the Eukaryote cells to form the Chromatin. Nucleosome positioning genome wide play an important role in the regulation of cell type-specific gene activities. Several biological studies have shown sequence specificity of nucleosome presence, clearly underlined by the organization of precise nucleotides substrings. Taking into consideration such advances, the identification of nucleosomes on a genomic scale has been successfully performed by DNA sequence features representation and classical supervised classification methods such as Support Vector Machines and Logistic regression. The goal of this work is to propose a classification method for nucleosome positioning that, differently from the proposed method so far, does not make any use of a sequence feature extraction step. Deep neural networks (DNN) or deep learning models, were proved to be able to extract automatically useful features from input patterns. Under this framework, Long Short-Term Memory (LSTM) is a recurrent unit that reads a sequence one step at a time and can exploit long range relations. In this work, we propose a DNN model for nucleosome identification on sequences from three different species. Our experiments show that it outperforms classical methods in two of the three data sets and give promising results also for the other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Svaren, J., Horz, W.: Transcription factors vs. nucleosomes: regulation of the PHO5 promoter in yeast. Trends Biochem. Sci. 22, 93–97 (1997)
Article Google Scholar
Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 20(3), 267–273 (2013)
Article Google Scholar
Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev.: Syst. Biol. Med. 4(3), 297–309 (2012)
Google Scholar
Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinform. 15(3), 419–430 (2014)
Article Google Scholar
Kuksa, P., Pavlovic, V.: Efficient alignment-free DNA barcode analytics. BMC Bioinform. 10(Suppl. 14), S9 (2009)
Article Google Scholar
Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinform. 12, Article No. 408 (2011)
Google Scholar
Giosué, L.B., Luca, P.: A new feature selection methodology for k-mers representation of DNA sequences. In: Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24462-4_9
Chapter Google Scholar
Rizzo, R., Fiannaca, A., Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode DNA. In: Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 142–155. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24462-4_13
Chapter Google Scholar
Lo Bosco, G.: Alignment free dissimilarities for nucleosome classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 114–128. Springer, Heidelberg (2016). doi:10.1007/978-3-319-44332-4_9
Chapter Google Scholar
Fiannaca, A., La Rosa, M., Rizzo, R., Urso, A.: Analysis of DNA barcode sequences using neural gas and spectral representation. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013. CCIS, vol. 384, pp. 212–221. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41016-1_23
Chapter Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MATH MathSciNet Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, New York (2001)
Google Scholar
Fiannaca, A., Rosa, M., Rizzo, R., Urso, A.: A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network. Artif. Intell. Med. 64(3), 173–184 (2015)
Article MATH Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MATH MathSciNet Google Scholar
Farabet, C., Couprie, C., Najman, L., et al.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Article Google Scholar
Tompson, J.J., Jain, A., LeCun, Y., et al.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)
Google Scholar
Kiros, R., Zhu, Y., Salakhutdinov, R.R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3276–3284 (2015)
Google Scholar
Li, J., Luong, M-T., Jurafsky, D.: A hierarchical neural autoencoder for paragraphs and documents. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1106–1115 (2015)
Google Scholar
Luong, M-T., Pham, H., Manning, C.D.: Effective approaches attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Google Scholar
Chatterjee, R., Farajian, M.A., Conforti, C., Jalalvand, S., Balaraman, V., Di Gangi, M.A., Ataman, D., Turchi, M., Negri, M., Federico, M.: FBK’s neural machine translation systems for IWSLT. In: Proceedings of 13th International Workshop on Spoken Language Translation (IWSLT 2016) (2016)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to DNA sequence classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 129–140. Springer, Heidelberg (2016). doi:10.1007/978-3-319-44332-4_10
Chapter Google Scholar
Lo Bosco, G., Di Gangi, M.A.: Deep learning architectures for DNA sequence classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds.) WILF 2016. LNCS (LNAI), vol. 10147, pp. 162–171. Springer, Cham (2017). doi:10.1007/978-3-319-52962-2_14
Chapter Google Scholar
Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: SITIS The 12th International Conference on Signal Image Technology & Internet Systems, pp. 688–692 (2016, to appear)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing, pp. 227–236. Springer, Heidelberg (1990)
Chapter Google Scholar
Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W., Chou, K.-C.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Fondazione Bruno Kessler, Trento, Italy
Mattia Antonino Di Gangi
ICT International Doctoral School, University of Trento, Trento, Italy
Mattia Antonino Di Gangi
Dipartimento dell’Innovazione Industriale e Digitale, Universitá degli studi di Palermo, Palermo, Italy
Salvatore Gaglio & Claudio La Bua
Dipartimento di Matematica e Informatica, Universitá degli studi di Palermo, Palermo, Italy
Giosué Lo Bosco
Dipartimento di Scienze per l’Innovazione e le Tecnologie Abilitanti, Istituto Euro Mediterraneo di Scienza e Tecnologia, Palermo, Italy
Giosué Lo Bosco
ICAR-CNR - National Research Council of Italy, Palermo, Italy
Riccardo Rizzo

Authors

Mattia Antonino Di Gangi
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Gaglio
View author publications
You can also search for this author in PubMed Google Scholar
Claudio La Bua
View author publications
You can also search for this author in PubMed Google Scholar
Giosué Lo Bosco
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Rizzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giosué Lo Bosco .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Ignacio Rojas
Universidad de Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Di Gangi, M.A., Gaglio, S., La Bua, C., Lo Bosco, G., Rizzo, R. (2017). A Deep Learning Network for Exploiting Positional Information in Nucleosome Related Sequences. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-56154-7_47
Published: 01 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics