Skip to main content

Normalized Maximum Likelihood Models for Boolean Regression with Application to Prediction and Classification in Genomics

  • Chapter
Computational and Statistical Approaches to Genomics

Conclusions

Boolean regression classes of models are powerful modeling tools having associated NML models which can be easily computed and used in MDL inference, in particular for factor selection.

Comparing the MDL methods based on the two-part codes with those based on the NML models, we note that the former is faster to evaluate, but the latter provides a significantly shorter codelength and hence a better description of the data. When analyzing the gene expression data, speed may be a major concern, since one has to test ({skk/n}) possible groupings of k genes, with n in the order of thousands and usually less than 10. The two-partcodes may then be used for pre-screening of the gene groupings, to remove the obviously poor performers, and then the NML model could be applied to obtain the final selection from a smaller pool of candidates. The running time for all our experiments reported here is in the order of tens of minutes.

The use of the MDL principle for classification with the class of Boolean models provides an effective classification method as demonstrated with the important cancer classification example based on gene expression data. The NML model for the class M(θ, k, f) was used for the selection of informative feature genes. When using the sets of feature genes, selected by NML model, we achieved classification error rates significantly lower than those reported recently for the same data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Barron, A., Rissanen, J., Bin, Y. (1998) The minimum description length principle in coding and modeling. IEEE Trans. on Information Theory, Special commemorative issue: Information Theory 1948–1998, 44:6, 2743–2760.

    Google Scholar 

  • Dudoit, S., Fridlyand, J., Speed, T.P. (2000) Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Dept. of Statistics University of California, Berkeley, Technical Report 576.

    Google Scholar 

  • Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286, 531–537.

    Article  PubMed  CAS  Google Scholar 

  • Kim, S., Dougherty, E.R. (2000) Coefficient of determination in nonlinear signal processing. Signal Processing, 80, 2219–2235.

    Google Scholar 

  • Hieter, P., Boguski, M. (1997) Functional genomics: it’s all how you read it. Science 278, 601–602.

    Article  PubMed  CAS  Google Scholar 

  • Jacob, F., Monod, J. (1961) Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology 3, 318–356.

    Article  PubMed  CAS  Google Scholar 

  • Kim, S., Dougherty, E.R., Chen, Y., Sivakumar, K., Meltzer, P., Trent, J.M., Bitnner, M. (2000). Multivariate measurement of gene expression relationships. Genomics, 67, 201–209.

    Article  PubMed  CAS  Google Scholar 

  • Linde, Y., Buzo, A., Gray, R.M. (1980) An algorithm for vector quantization design. IEEE Transactions on Communications, 28, 84–95.

    Article  Google Scholar 

  • Rissanen, J. (1978) Modelling by shortest data description. Automatica, 14, 465–471.

    Article  Google Scholar 

  • Rissanen, J. (1984) Universal coding, information, prediction and estimation. IEEE Trans. on Information Theory, 30, 629–636.

    Article  Google Scholar 

  • Rissanen, J. (1986) Stochastic complexity and modeling. Ann. Statist., 14, 1080–1100.

    Google Scholar 

  • Rissanen, J. (2000) MDL Denoising. IEEE Trans. on Information Theory, IT-46:7, 2537–2543.

    Google Scholar 

  • Rissanen, J. (2001) Strong optimality of the normalized ML models as universal codes and information in data. IEEE Trans. on Information Theory, IT-47:5, 1712–1717.

    Google Scholar 

  • Russel, P.J. (2000) Fundamentals of genetics. 2nd edition, San Francisco: Addison Wesly Longman Inc.

    Google Scholar 

  • Schena, M., Shalon, D., Davis, R.W., Brown, P.O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470.

    PubMed  CAS  Google Scholar 

  • Shtarkov, Yu. M. (1987) Universal sequential coding of single messages. Translated from Problems of Information Transmission, 23:3, 3–17.

    Google Scholar 

  • Tabus, I., Astola, J. (2001) On the Use of MDL Principle in Gene Expression Prediction. Journal of Applied Signal Processing, 2001:4, 297–303.

    Google Scholar 

  • Tabus, I., Astola, J. (2000) MDL Optimal Design for Gene Expression Prediction from Microarray Measurements. Tampere University of Technology, Technical Report, ISBN.952-15-0529-X.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Kluwer Academic Publishers

About this chapter

Cite this chapter

Tabus, I., Rissanen, J., Astola, J. (2003). Normalized Maximum Likelihood Models for Boolean Regression with Application to Prediction and Classification in Genomics. In: Zhang, W., Shmulevich, I. (eds) Computational and Statistical Approaches to Genomics. Springer, Boston, MA. https://doi.org/10.1007/0-306-47825-0_10

Download citation

  • DOI: https://doi.org/10.1007/0-306-47825-0_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-7023-5

  • Online ISBN: 978-0-306-47825-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics