Skip to main content

Hidden Markov Models for Human Genes

Periodic Patterns in Exon Sequences

  • Chapter
Theoretical and Computational Methods in Genome Research
  • 114 Accesses

Abstract

We analyse the sequential structure of human genomic DNA by hidden Markov models. We apply models of widely different design: conventional left-right constructs and models with a built-in periodic architecture. The models are trained on segments of DNA sequences extracted such that they cover complete internal exons flanked by introns, or splice sites flanked by coding and non-coding sequence. Together, models of donor site regions, acceptor site regions and flanked internal exons, show that exons — besides the reading frame — hold a specific periodic pattern. The pattern has the consensus: non-T(A/T)G and a minimal periodicity of roughly 10 nucleotides.

Jet Propulsion Laboratory, Caltech.

Department of Psychology, Stanford University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Trifonov, E. N. 1989. The Multiple Codes of Nucleotide Sequences, Bull. Math. Biol. 51:417–432.

    PubMed  CAS  Google Scholar 

  2. Drew, H. R. and Travers, A. A. 1985. DNA Bending and its Relation to Nucleosome Positioning, J. Mol. Biol. 186:773–790.

    Article  PubMed  CAS  Google Scholar 

  3. Trifonov, E. N. 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16S rRNA nucleotide sequences, J. Mol. Biol., 194:643–652.

    Article  PubMed  CAS  Google Scholar 

  4. Trifonov, E. N. and Sussman, J. L. 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence, PNAS USA 77:3816–3820.

    Article  PubMed  CAS  Google Scholar 

  5. Brendel, V., Beckmann, J. S. and Trifonov, E. N. 1986. Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies, J. Mol. Struct. Dyn. 4:11–21.

    Article  CAS  Google Scholar 

  6. Goodman, S. D. and Nash, H. A. 1989. Nature, 341:251–254.

    Article  PubMed  CAS  Google Scholar 

  7. Crothers, D. M. and Steitz, T. A. in Transcriptional Regulation eds. McKnight,S. L. and Yamamoto,K. R., 501–534 Cold Spring Harbor Laboratory Press, New York, 1992.

    Google Scholar 

  8. Haran, T. E., Kahn, J. D. and Crothers, D. M. 1994. Sequence Elements Responsible for DNA Curvature, J. Mol. Biol. 244:135–143.

    Article  PubMed  CAS  Google Scholar 

  9. Muyldermans, S. and Travers, A. A. 1994. DNA Sequence Organization in Chromatosomes, J. Mol. Biol., 235:855–870.

    Article  PubMed  CAS  Google Scholar 

  10. Senapathy, P. Shapiro, M. B., and Harris, N. L. 1990. Splice Junctions, Branch Point Sites, and Exons: Sequence Statistics, Identification and Applications to Genome Project. Patterns in Nucleic Acid Sequences, Academic Press, 252–278.

    Google Scholar 

  11. Nussinov, R. 1989. Strong patterns in homooligomer tracts occurrences in non-coding and in potential regulatory sites in eukaryotic genomes. J. Biomol. Struct. Dyn. 6:985–1000.

    Article  PubMed  CAS  Google Scholar 

  12. Engelbrecht, J., Knudsen, S. and Brunak S., 1992. G/C rich tract in 5’ end of human introns, J. Mol. Biol., 227:108–113.

    Article  PubMed  CAS  Google Scholar 

  13. Rumelhart, D. E., Durbin, R., Golden, R. and Chauvin, Y. 1994. Back-propagation: the Theory. In: Back-propagation: Theory, Architectures and Applications. Y. E. Chauvin and D. E. Rumelhart Editors, Chapter 1, Lawrence Erlbaum Associates, in press.

    Google Scholar 

  14. Lapedes, A., Barnes, C., Burks, C., Farber, R. and Sirotkin, K. Application of Neural Networks and Other Machine Learning Algorithms to DNA Sequence Analysis. In G. I. Bell and T. G. Marr, editors. The Proceedings of the Interface Between Computation Science and Nucleic Acid Sequencing Workshop. Proceedings of the Santa Fe Institute, volume VII, pages 157–182. Addison Wesley, Redwood City, CA, 1988.

    Google Scholar 

  15. Brunak, S., Engelbrecht, J. and Knudsen, S. 1991. Prediction of Human mRNA Donor and Acceptor Sites from the DNA Sequence. J. Mol. Biol., 220:49–65.

    Article  PubMed  CAS  Google Scholar 

  16. Uberbacher, E. C. and Mural, R. J. 1991. Locating Protein-Coding Regions in Human DNA Sequences by a Multiple Sensor-Neural Network Approach. PNAS USA, 88:11261–11265.

    Article  PubMed  CAS  Google Scholar 

  17. Snyder, E. E. and Stormo, G. D. 1993. Identification of Coding Regions in Genomic DNA Sequences: an Application of Dynamic Programming and Neural Networks. Nuc. Acids Res., 21:607–613.

    Article  CAS  Google Scholar 

  18. Xu, Y., Einstein, J. R., Mural, R. J., Shah, M. and Uberbacher, E. C. 1994. An Improved System for Exon Recognition and Gene Modeling in Human DNA Sequences. Proceedings of Second International Conference on Intelligent Systems for Molecular Biology Stanford University., R. Altman and D. Brutlag and P. Karp and R. Lathrop and D. Searls Editors, AAAI Press, 376–383.

    Google Scholar 

  19. Searls, D. B. 1992. The Linguistics of DNA. American Scientist, 80:579–591.

    Google Scholar 

  20. Sakakibara, Y., Brown, M., Underwood, R. C., Mian, S. I. and Haussler, D. 1993. Stochastic Context-Free Grammars for Modeling RNA. Technical Report UCSC-CRL-93–16, University of California, Santa Cruz.

    Google Scholar 

  21. Churchill, G. A. 1989. Stochastic Models for Heterogeneous DNA Sequences. Bull. Math. Biol., 51:79–94.

    PubMed  CAS  Google Scholar 

  22. Baldi, P., Chauvin, Y., Hunkapiller, T. and McClure, M. A. 1993. Hidden Markov Models in Molecular Biology: New Algorithms and Applications. Advances in Neural Information Processing Systems 5:747–754, Morgan Kaufmann Pub.

    Google Scholar 

  23. Baldi, P., Chauvin, Y., Hunkapiller, T. and McClure, M. A. 1994a. Hidden Markov Models of Biological Primary Sequence Information. PNAS USA, 91:1059–1063.

    Article  CAS  Google Scholar 

  24. Baldi, P., Brunak, S., Chauvin, Y, Engelbrecht, J. and Krogh, A. 1994b. Hidden Markov Models of Human Genes. Advances in Neural Information Processing Systems 6:761–768, Morgan Kaufmann Pub.

    Google Scholar 

  25. Baldi, P. and Chauvin, Y. 1994b. Hidden Markov Models of the G-Protein Coupled Receptor Family. J. Comp. Biol., 1:311–335.

    Article  CAS  Google Scholar 

  26. Baldi, P., Brunak, S., Chauvin, Y., Engelbrecht, J. and Krogh, A. 1994c. Hidden Markov Models of Human Genes. CalTech Technical Report. Division of Biology, Caltech.

    Google Scholar 

  27. Haussler, D., Krogh, A., Mian, I. S. and Sjölander, K. 1993. Protein Modeling using Hidden Markov Models: Analysis of Globins, Proceedings of the Hawaii International Conference on System Sciences, 1, IEEE Computer Society Press, Los Alamitos, CA, 792–802.

    Google Scholar 

  28. Krogh, A., Brown, M., Mian, I. S., Sjölander, K. and Haussier, D. 1994a. Hidden Markov Models in Computational Biology: Applications to Protein Modeling. J. Mol. Biol. 235:1501–1531.

    Article  CAS  Google Scholar 

  29. Krogh, A., Mian, I. S. and Haussier, D. 1994b. A Hidden Markov Model that Finds Genes in E. coli DNA, Nuc. Acids Res., 22:4768–4778.

    Article  CAS  Google Scholar 

  30. Levinson, S. E., Rabiner, L. R. and Sondhi, M. M. 1983. An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition. The Bell Syst. Tech. J., 62:1035–1074.

    Google Scholar 

  31. Rabiner, L. R. 1989. A Tutorial on Hidden Markor Models and Selected Applications in Speech Recognition. Proc. IEEE, 77.257–286.

    Article  Google Scholar 

  32. Ball, F. G. and Rice, J. A. 1992. Stochastic Models for Ion Channels: Introduction and Bibliography. Mathematical Bioscience.

    Google Scholar 

  33. Baum, L. E. 1972. An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes. Inequalities, 3:1–8.

    Google Scholar 

  34. Dempster, A. P., Laird, N. M. and Rubin, D. B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc., B39:1–22.

    Google Scholar 

  35. Baldi, P. and Chauvin, Y. 1994a. Smooth On-Line Learning Algorithms for Hidden Markov Models. Nçural Comp., 6:305–316.

    Google Scholar 

  36. Creighton, T. E. 1993. Proteins, Structures and Molecular Properties, W. H. Freeman, New York.

    Google Scholar 

  37. Baldi, P., Btunak, S., Chauvin, Y., Engelbrecht, J. & Krogh, A. 1995. Periodic sequence patterns in human exons. In Proc. of the Third Int. Conf. on Intelligent Systems for Mol. Biol., (Rawlings, C., Clark, D., Altman, R., Hunter, L., Lengauer, T. & Wodak, S. eds.), pp. 30–38. AAAI Press, Menlo Park.

    Google Scholar 

  38. Zhurkin, V. B. 1983. Specific alignment of nucleosomes on DNA correlates with periodic distribution of purine-pyrimidine and pyrimidine-purine dimers, FEBS Lett. 158:293–297.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media New York

About this chapter

Cite this chapter

Baldi, P., Brunak, S., Chauvin, Y., Krogh, A. (1997). Hidden Markov Models for Human Genes. In: Suhai, S. (eds) Theoretical and Computational Methods in Genome Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5903-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5903-0_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7708-5

  • Online ISBN: 978-1-4615-5903-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics