Skip to main content

Discriminative Motif Elicitation via Maximization of Statistical Overpresentation

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10361))

Included in the following conference series:

  • 2964 Accesses

Abstract

The Fisher Exact Test score (FETS) and its variants are based on the hypergeometric distribution. It’s very natural to describe the enrichment level of TF binding site (TFBS) by it. And several widely used methods that discriminant motif discovery have choose them as the objective functions, for example, HOMER and DERME. Although the method is highly efficient and universal, FETS is a non-smooth and non-differentiable function. So it can not be optimized numerically. In order to solve the problem, the current methods that learn to optimize FETS either reduce the search set to discrete domain or introduce some external variables which will definitely hurt the precision, not to mention that to use the complete potential of input sequences for generate motifs. In this paper, we propose an approach that allows direct learning the motifs parameters in the continuous space use the FETS as the objective function. We find that when the loss function is optimized in a coordinate-wise mode, the cost function can be a piece-wise constant function in each resultant sub-problem. The process of finding optimal value is exactly and efficiently. Furthermore one key step in every iteration of optimize the FETS requires finding the most statistically significant scores among the tens of thousands of Fisher’s exact test scores, which is solved efficiently by a ‘lookahead’ technique. Experiments on ENCODE ChIP-seq data testify the performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Slattery, M., Zhou, T.Y., Yang, L., Machado, A.C.D., Gordan, R., Rohs, R.: Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014)

    Article  Google Scholar 

  2. Mason, M.J., Plath, K., Zhou, Q.: Identification of context dependent motifs by contrasting ChIP binding data. Bioinformatics 26, 2826–2832 (2010)

    Article  Google Scholar 

  3. Bailey, T.L.: DREME: motif discovery in transcription factor ChIPseq data. Bioinformatics 27, 1653–1659 (2011)

    Article  Google Scholar 

  4. Ichinose, N., Yada, T., Gotoh, O.: Large-scale motif discovery using DNA Gray code and equiprobable oligomers. Bioinformatics 28, 25–31 (2012)

    Article  Google Scholar 

  5. Furey, T.S.: ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat. Rev. Genet. 13, 840–852 (2012)

    Article  Google Scholar 

  6. Zhu, L., Guo, W.L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13, 55–63 (2016)

    Article  Google Scholar 

  7. Patel, R.Y., Stormo, G.D.: Discriminative motif optimization based on perceptron training. Bioinformatics 30, 941–948 (2014)

    Article  Google Scholar 

  8. Yao, Z., MacQuarrie, K.L., Fong, A.P., Tapscott, S.J., Ruzzo, W.L., Gentleman, R.C.: Discriminative motif analysis of high-throughput dataset. Bioinformatics 30, 775–783 (2013)

    Article  Google Scholar 

  9. Agostini, F., Cirillo, D., Ponti, R., Tartaglia, G.: SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences. BMC Genom. 15, 925 (2014)

    Article  Google Scholar 

  10. Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., et al.: Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B Cell Identities. Mol. Cell 38, 576–589 (2010)

    Google Scholar 

  11. Maaskola, J., Rajewsky, N.: Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models. Nucleic Acids Res. 42, 12995–13011 (2014)

    Article  Google Scholar 

  12. McLeay, R.C., Bailey, T.L.: Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinform. 11, 11 (2010)

    Article  Google Scholar 

  13. Tanaka, E., Bailey, T.L., Keich, U.: Improving MEME via a twotiered significance analysis. Bioinformatics 30, 1965–1973 (2014)

    Article  Google Scholar 

  14. Liseron-Monfils, C., Lewis, T., Ashlock, D., McNicholas, P.D., Fauteux, F., Strömvik, M., et al.: Promzea: a pipeline for discovery of coregulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas. BMC Plant Biol. 13, 1–17 (2013)

    Article  Google Scholar 

  15. Yu, Q., Huo, H.W., Vitter, J.S., Huan, J., Nekrich, Y.: An efficient exact algorithm for the motif stem search problem over large alphabets. IEEE-ACM Trans. Comput. Biol. Bioinform. 12, 384–397 (2015)

    Article  Google Scholar 

  16. Hartmann, H., Guthöhrlein, E.W., Siebert, M., Luehr, S., Söding, J.: P-value-based regulatory motif discovery using positional weight matrices. Genome Res. 23, 181–194 (2013)

    Article  Google Scholar 

  17. Pizzi, C., Rastas, P., Ukkonen, E.: Finding significant matches of position weight matrices in linear time. IEEE-ACM Trans. Comput. Biol. Bioinform. 8, 69–79 (2011)

    Article  Google Scholar 

  18. Valen, E., Sandelin, A., Winther, O., Krogh, A.: Discovery of regulatory elements is improved by a discriminatory approach. PLoS Comput. Biol. 5, 8 (2009)

    Article  MathSciNet  Google Scholar 

  19. Colombo, N., Vlassis, N.: FastMotif: spectral sequence motif discovery. Bioinformatics 31, 2623–2631 (2015)

    Article  Google Scholar 

  20. Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3, e39 (2007)

    Article  MathSciNet  Google Scholar 

  21. Hsieh, C.-J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: KDD, San Diego, CA, USA, pp. 1064–1072 (2011)

    Google Scholar 

  22. ENCODE-Project-Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

    Google Scholar 

  23. Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)

    Article  Google Scholar 

  24. Simcha, D., Price, N.D., Geman, D.: The limits of De Novo DNA motif discovery. PLoS ONE 7, 9 (2012)

    Article  Google Scholar 

  25. Eggeling, R., Roos, T., Myllymäki, P., Grosse, I.: Inferring intramotif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 16, 1–15 (2015)

    Article  Google Scholar 

  26. Huang, D.S., Zheng, C.H.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22, 1855–1862 (2006)

    Article  Google Scholar 

  27. Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580, 380–384 (2006)

    Article  Google Scholar 

  28. Zhu, L., You, Z.H., Huang, D.S.: Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding. Neurocomputing 121, 99–107 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the grants of the National Science Foundation of China, Nos. 61532008, 61672203, 61402334, 61472282, 61520106006, 31571364, U1611265, 61472280, 61472173, 61572447, 61373098 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Li, N. (2017). Discriminative Motif Elicitation via Maximization of Statistical Overpresentation. In: Huang, DS., Bevilacqua, V., Premaratne, P., Gupta, P. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10361. Springer, Cham. https://doi.org/10.1007/978-3-319-63309-1_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63309-1_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63308-4

  • Online ISBN: 978-3-319-63309-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics