Discriminative Motif Elicitation via Maximization of Statistical Overpresentation

Li, Ning

doi:10.1007/978-3-319-63309-1_45

Ning Li¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10361))

Included in the following conference series:

International Conference on Intelligent Computing

2964 Accesses

Abstract

The Fisher Exact Test score (FETS) and its variants are based on the hypergeometric distribution. It’s very natural to describe the enrichment level of TF binding site (TFBS) by it. And several widely used methods that discriminant motif discovery have choose them as the objective functions, for example, HOMER and DERME. Although the method is highly efficient and universal, FETS is a non-smooth and non-differentiable function. So it can not be optimized numerically. In order to solve the problem, the current methods that learn to optimize FETS either reduce the search set to discrete domain or introduce some external variables which will definitely hurt the precision, not to mention that to use the complete potential of input sequences for generate motifs. In this paper, we propose an approach that allows direct learning the motifs parameters in the continuous space use the FETS as the objective function. We find that when the loss function is optimized in a coordinate-wise mode, the cost function can be a piece-wise constant function in each resultant sub-problem. The process of finding optimal value is exactly and efficiently. Furthermore one key step in every iteration of optimize the FETS requires finding the most statistically significant scores among the tens of thousands of Fisher’s exact test scores, which is solved efficiently by a ‘lookahead’ technique. Experiments on ENCODE ChIP-seq data testify the performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Slattery, M., Zhou, T.Y., Yang, L., Machado, A.C.D., Gordan, R., Rohs, R.: Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014)
Article Google Scholar
Mason, M.J., Plath, K., Zhou, Q.: Identification of context dependent motifs by contrasting ChIP binding data. Bioinformatics 26, 2826–2832 (2010)
Article Google Scholar
Bailey, T.L.: DREME: motif discovery in transcription factor ChIPseq data. Bioinformatics 27, 1653–1659 (2011)
Article Google Scholar
Ichinose, N., Yada, T., Gotoh, O.: Large-scale motif discovery using DNA Gray code and equiprobable oligomers. Bioinformatics 28, 25–31 (2012)
Article Google Scholar
Furey, T.S.: ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat. Rev. Genet. 13, 840–852 (2012)
Article Google Scholar
Zhu, L., Guo, W.L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13, 55–63 (2016)
Article Google Scholar
Patel, R.Y., Stormo, G.D.: Discriminative motif optimization based on perceptron training. Bioinformatics 30, 941–948 (2014)
Article Google Scholar
Yao, Z., MacQuarrie, K.L., Fong, A.P., Tapscott, S.J., Ruzzo, W.L., Gentleman, R.C.: Discriminative motif analysis of high-throughput dataset. Bioinformatics 30, 775–783 (2013)
Article Google Scholar
Agostini, F., Cirillo, D., Ponti, R., Tartaglia, G.: SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences. BMC Genom. 15, 925 (2014)
Article Google Scholar
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., et al.: Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B Cell Identities. Mol. Cell 38, 576–589 (2010)
Google Scholar
Maaskola, J., Rajewsky, N.: Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models. Nucleic Acids Res. 42, 12995–13011 (2014)
Article Google Scholar
McLeay, R.C., Bailey, T.L.: Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinform. 11, 11 (2010)
Article Google Scholar
Tanaka, E., Bailey, T.L., Keich, U.: Improving MEME via a twotiered significance analysis. Bioinformatics 30, 1965–1973 (2014)
Article Google Scholar
Liseron-Monfils, C., Lewis, T., Ashlock, D., McNicholas, P.D., Fauteux, F., Strömvik, M., et al.: Promzea: a pipeline for discovery of coregulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas. BMC Plant Biol. 13, 1–17 (2013)
Article Google Scholar
Yu, Q., Huo, H.W., Vitter, J.S., Huan, J., Nekrich, Y.: An efficient exact algorithm for the motif stem search problem over large alphabets. IEEE-ACM Trans. Comput. Biol. Bioinform. 12, 384–397 (2015)
Article Google Scholar
Hartmann, H., Guthöhrlein, E.W., Siebert, M., Luehr, S., Söding, J.: P-value-based regulatory motif discovery using positional weight matrices. Genome Res. 23, 181–194 (2013)
Article Google Scholar
Pizzi, C., Rastas, P., Ukkonen, E.: Finding significant matches of position weight matrices in linear time. IEEE-ACM Trans. Comput. Biol. Bioinform. 8, 69–79 (2011)
Article Google Scholar
Valen, E., Sandelin, A., Winther, O., Krogh, A.: Discovery of regulatory elements is improved by a discriminatory approach. PLoS Comput. Biol. 5, 8 (2009)
Article MathSciNet Google Scholar
Colombo, N., Vlassis, N.: FastMotif: spectral sequence motif discovery. Bioinformatics 31, 2623–2631 (2015)
Article Google Scholar
Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3, e39 (2007)
Article MathSciNet Google Scholar
Hsieh, C.-J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: KDD, San Diego, CA, USA, pp. 1064–1072 (2011)
Google Scholar
ENCODE-Project-Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Google Scholar
Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)
Article Google Scholar
Simcha, D., Price, N.D., Geman, D.: The limits of De Novo DNA motif discovery. PLoS ONE 7, 9 (2012)
Article Google Scholar
Eggeling, R., Roos, T., Myllymäki, P., Grosse, I.: Inferring intramotif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 16, 1–15 (2015)
Article Google Scholar
Huang, D.S., Zheng, C.H.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22, 1855–1862 (2006)
Article Google Scholar
Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580, 380–384 (2006)
Article Google Scholar
Zhu, L., You, Z.H., Huang, D.S.: Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding. Neurocomputing 121, 99–107 (2013)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the grants of the National Science Foundation of China, Nos. 61532008, 61672203, 61402334, 61472282, 61520106006, 31571364, U1611265, 61472280, 61472173, 61572447, 61373098 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646.

Author information

Authors and Affiliations

Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China
Ning Li

Authors

Ning Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Li .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Politecnico di Bari, Bari, Italy
Vitoantonio Bevilacqua
University of Wollongong, North Wollongong, New South Wales, Australia
Prashan Premaratne
Indian Institute of Technology, Kanpur, India
Phalguni Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, N. (2017). Discriminative Motif Elicitation via Maximization of Statistical Overpresentation. In: Huang, DS., Bevilacqua, V., Premaratne, P., Gupta, P. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10361. Springer, Cham. https://doi.org/10.1007/978-3-319-63309-1_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-63309-1_45
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63308-4
Online ISBN: 978-3-319-63309-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics