Abstract
Identification of enhancer regions is important for understanding the regulation mechanism of gene expression. Recent studies show that it is possible to predict enhancers using discriminative classifiers with generic sequence features such as k-mers or words. The accuracy of such discriminative prediction largely depends on the ability of the models to capture not only the presence of predictive k-mers (words), but also spatial constraints on clusters of such k-mers. In this paper, we propose a method that first selects the most important word features and then use combinations of such words, which satisfy certain spatial constraints, as additional features. Experiments with real data sets show that the proposed method compares favorably with a state-of-the-art enhancer prediction method in terms of prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailey, T.L., Noble, W.S.: Searching for statistically significant regulatory modules. Bioinformatics 19(suppl. 2), ii16–ii25 (2003)
Fletez-Brant, C., Lee, D., McCallion, A.S., Beer, M.A.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41(Web Server issue), W544–W556 (2013)
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1) (1997)
Göke, J., Schulz, M.H., Lasserre, J., Vingron, M.: Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with Word Neighbourhood Counts. Bioinformatics 28(5), 656–663 (2012)
Kim, T., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al.: Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Gen. Biol. 10, R25 (2009)
Lee, D., Karchin, R., Beer, M.A.: Discriminative prediction of mammalian enhancers from DNA sequence. Gen. Res. 21(12), 2167–2180 (2011)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Proc. of Pac. Symp. Biocomput. 2002 (2002)
Leung, G., Eisen, M.B.: Identifying cis-regulatory sequences by word profile similarity. PLoS One 4, e6901 (2009), doi:10.1371/journal.pone.0006901
Palii, C.G., Perez-Iratxeta, C., Yao, Z., Cao, Y., Dai, F., Davison, J., Atkins, H., Allan, D., Dilworth, F.J., Gentleman, R., et al.: Differential genomic targeting of the transcription factor TAL1 in alternate haematopoietic lineages. EMBO J. 30, 494–509 (2011)
Pierstorff, N., Bergman, C.M., Wiehe, T.: Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA. Bioinformatics 22, 2858–2864 (2006)
Schultheiss, S.J., Busch, W., Lohmann, J.U., Kohlbacher, O., Ratsch, G.: KIRMES: Kernel-based identification of regulatory modules in euchromatic sequences. Bioinformatics 25(16), 2126–2133 (2009)
Sinha, S., He, X.: MORPH: probabilistic alignment combined with hidden Markov models of cis-regulatory modules. PLoS Comput. Biol. 3, e216 (2007)
Spitz, F., Furlong, E.E.M.: Transcription factors: from enhancer binding to developmental control. Nature Reviews Genetics 13, 613–626 (2012)
Su, J., Teichmann, S.A., Down, T.A.: Assessing Computational Methods of Cis-Regulatory Module Prediction. PLoS Comput. Biol. 6(12), e1001020 (2010)
Thanh, H.V., Phuong, T.M.: Enhancer Prediction Using Distance Aware Kernels. In: Proc. of RIVF 2013 (2013)
Verzi, M.P., Shin, H., He, H.H., Sulahian, R., Meyer, C.A., Montgomery, R.K., Fleet, J.C., Brown, M., Liu, X.S., Shivdasani, R.A.: Differentiation-Specific Histone Modifications Reveal Dynamic Chromatin Interactions and Partners for the Intestinal Transcription Factor CDX2. Developmental Cell 19, 713–726 (2010)
Yanez-Cuna, J.O., Dinh, H.Q., Kvon, E.Z.: Uncovering cis-regulatory sequence requirements for context specific transcription factor binding. Genome Research 22, 2018–2030 (2012)
Zhong, M., Niu, W., Lu, Z.J., Sarov, M., Murray, J.I., Janette, J., Raha, D., Sheaffer, K.L., Lam, H.Y.K., Preston, E., et al.: Genome-wide identification of binding sites defines distinct functions for Caenorhabditis elegans PHA-4/FOXA in development and environmental response. PLoS Genet. 6, e1000848 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hung, P.V., Phuong, T.M. (2015). Discriminative Prediction of Enhancers with Word Combinations as Features. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-11680-8_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)