Skip to main content

Advertisement

Log in

Parsing regulatory DNA: General tasks, techniques, and the PhyloGibbs approach

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

In this review, we discuss the general problem of understanding transcriptional regulation from DNA sequence and prior information. The main tasks we discuss are predicting local regions of DNA, cis-regulatory modules (CRMs) that contain binding sites for transcription factors (TFs), and predicting individual binding sites. We review various existing methods, and then describe the approach taken by PhyloGibbs, a recent motif-finding algorithm that we developed to predict TF binding sites, and PhyloGibbs-MP, an extension to PhyloGibbs that tackles other tasks in regulatory genomics, particularly prediction of CRMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

CRMS:

cis-regulatory modules

MCMC:

Markor Chain Monte Corlo

PWMS:

position weight matrices

TFs:

transcription factors

References

  • Amir A, Lewenstein M and Porat E 2004 Faster algorithms for string matching with k mismatches; J. Algorithms 50 257–275

    Article  Google Scholar 

  • Bailey T L and Elkan C 1994 Fitting a mixture model by expectation maximization to discover motifs in biopolymers; Proc. Int. Conf. Intell. Syst. Mol. Biol. 2 28–36

    PubMed  CAS  Google Scholar 

  • Berman B P, Nibu Y, Pfeiffer B D, Tomancak P, Celniker S E, Levine M, Rubin G M and Eisen M B 2002 Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome; Proc. Natl. Acad. Sci. USA 99 757–762

    Article  PubMed  CAS  Google Scholar 

  • Berman B P, Pfeiffer B D, Laverty T R, Salzberg S L, Rubin G M, Eisen M B and Celniker S E 2004 Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura; Genome Biol. 5 R61

    Article  Google Scholar 

  • Dermitzakis E T, Bergman C M and Clark A G 2003 Tracing the evolutionary history of drosophila regulatory regions with models that identify transcription factor binding sites; Mol. Biol. Evol. 20 703–714

    Article  PubMed  CAS  Google Scholar 

  • Djordjevic M, Sengupta A M and Shraiman B I 2003 A biophysical approach to transcription factor binding site discovery; Genome Res. 13 2381–2390

    Article  PubMed  CAS  Google Scholar 

  • Emberly E, Rajewsky N and Siggia E D 2003 Conservation of regulatory elements between two species of drosophila; BMC Bioinformatics 4 57

    Article  PubMed  Google Scholar 

  • He L and Hannon G J 2004 MicroRNAs: small RNAs with a big role in gene regulation; Nat. Rev. Genet. 5 522–531

    Article  PubMed  CAS  Google Scholar 

  • Lawrence C E, Altschul S F, Boguski M S, Liu J S, Neuwald A F and Wootton J C 1993 Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment; Science 262 208–214

    Article  PubMed  CAS  Google Scholar 

  • Lettice L A, Heaney S J H, Purdie L A, Li L, de Beer P, Oostra B A, Goode D, Elgar G, Hill R E and de Graaff E 2003A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly; Hum. Mol. Genet. 12 1725–1735

    Article  PubMed  CAS  Google Scholar 

  • Matzke M A and Birchler J A 2005 RNAi-mediated pathways in the nucleus; Nat. Rev. Genet. 6 24–35

    Article  PubMed  CAS  Google Scholar 

  • Morgenstern B 1999 DIALIGN 2: improvement of the segmenttosegment approach to multiple sequence alignment; Bioinformatics 15 211–218

    Article  PubMed  CAS  Google Scholar 

  • Pearson H 2006 Genetics: what is a gene?; Nature (London) 441 398–401

    Article  CAS  Google Scholar 

  • Pierstorff N, Bergman C M and Wiehe T 2006 Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA; Bioinformatics 22 2858–2864

    Article  PubMed  CAS  Google Scholar 

  • Sagot M-F 1998 Spelling approximate repeated or common motifs using a suffix tree; in Latin 98, lecture notes in computer science (Springer-Verlag) vol. 1380, pp 111–127

  • Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore I K, Wang J-P Z and Widom J 2006 A genomic code for nucleosome positioning; Nature (London) 442 772–778

    Article  CAS  Google Scholar 

  • Siddharthan R, Siggia E D and van Nimwegen E 2005 Phylogibbs: A gibbs sampling motif finder that incorporates phylogeny; PLoS Comput. Biol. 1 e67

    Article  PubMed  Google Scholar 

  • Siddharthan R 2006 Sigma: multiple alignment of weakly-conserved non-coding DNA sequence; BMC Bioinformatics 7 143

    Article  PubMed  Google Scholar 

  • Siddharthan R and van Nimwegen E 2007 Detecting regulatory sites using phylogibbs; in Comprehensive genomics, methods in molecular biology. (ed.) N H Bergman (Humana Press) (in press)

  • Sinha S, Liang Y and Siggia E 2006 Stubb: a program for discovery and analysis of cis-regulatory modules; Nucleic Acids Res. 34 555–559

    Article  Google Scholar 

  • Sinha S, Schroeder M D, Unnerstall U, Gaul U and Siggia E D 2004 Cross-species comparison significantly improves genomewide prediction of cis-regulatory modules in Drosophila; BMC Bioinformatics 5 129

    Article  Google Scholar 

  • Sinha S, van Nimwegen E and Siggia E D 2003 A probabilistic method to detect regulatory modules; Bioinformatics (Suppl. 1) 19 292–301

    Article  Google Scholar 

  • Smith, A F M and Roberts G O 1993 Bayesian computation via the gibbs sampler and related markov chain monte carlo methods; J. R. Stat. Soc. Series B (Methodological) 55 3–23

    Google Scholar 

  • Stein L D, Mungall C, Shu S Q, Caudy M, Mangone M, Day A, Nickerson E, Stajich J E, Harris T W, Arva A and Lewis S 2002 The generic genome browser: a building block for a model organism system database; Genome Res. 12 1599–1610

    Article  PubMed  CAS  Google Scholar 

  • Tanay A, Regev A and Shamir R 2005 Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast; Proc. Natl. Acad. Sci. USA 102 7203–7208

    Article  PubMed  CAS  Google Scholar 

  • Ukkonen E 1995 Online construction of suffix trees; Algorithmica 14 249–260

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Siddharthan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siddharthan, R. Parsing regulatory DNA: General tasks, techniques, and the PhyloGibbs approach. J Biosci 32 (Suppl 1), 863–870 (2007). https://doi.org/10.1007/s12038-007-0086-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-007-0086-0

Keywords

Navigation