Skip to main content

Quality-Based Clustering of Functional Data: Applications to Time Course Microarray Data

  • Conference paper
  • First Online:
Advances in Data Analysis, Data Handling and Business Intelligence

Abstract

Cluster methods are typically applied to time course gene expression data to find co-regulated genes which can finally help to reveal pathways and interactions between genes. Clustering is either carried out on the raw data or on functional data. In functional data analysis a curve is fit to each observation in order to account for time dependency. As gene expression over time is biologically a continuous process it can be represented by a continuous function. The different curve shapes found in a dataset can have important interpretations and characteristic patterns can be found by clustering the estimated regression coefficients.

In this simulation study on artificial data the well-known K-Means algorithm as well as the quality-based cluster algorithm QT-Clust are applied to both the raw data as well as functional data. The performance of the different methods is evaluated when different types of noise are added to the data. All cluster algorithms used are implemented in R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abraham, C., Cornillon, P.-A., Matzner-Lober, E. & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scandinavian Journal of Statistics, 30(3), 581–595.

    Article  MATH  MathSciNet  Google Scholar 

  • Androulakis, I., Yang, E., & Almon, R. (2007). Analysis of time-series gene expression data: Methods, challenges, and opportunities. Annual Review of Biomedical Engineering, 9, 205–228.

    Article  Google Scholar 

  • de Hoon, M. J.L., Imoto, S., & Miyano, S. (2002). Statistical analysis of a small set of time-ordered gene expression data using linear splines. Bioinformatics, 18(11), 1477–1485.

    Article  Google Scholar 

  • Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95, 14863–14868.

    Article  Google Scholar 

  • Fraley, C., & Raftery, A. (1998). How many clusters? Which clustering method? Answers via model–based cluster analysis. The Computer Journal, 41(8), 578–588.

    Article  MATH  Google Scholar 

  • Hakamada, K., Okamoto, M. & Hanai, T. (2006). Novel technique for preprocessing high dimensional time-course data from DNA microarray: mathematical model–based clustering. Bioinformatics, 22(7) 843–848.

    Article  Google Scholar 

  • Heyer, L. J., Kruglyak, S. & Yooseph, S. (1999). Exploring expression data: identification and analysis of coex pressed genes. Genome Research, 9, 1106–1115.

    Article  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Kerr, G., Ruskin, H. J., Crane, M., & Doolan, P. (2008). Techniques for clustering gene expression data. Computers in Biology and Medicine, 38(3), 283–293.

    Article  Google Scholar 

  • Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics and Data Analysis, 51(2) 526–544.

    Article  MATH  MathSciNet  Google Scholar 

  • R Development Core Team. (2009). R: A language and environment for statistical computing. Nienna, Austria (ISBN: 3-900051-07-0).

    Google Scholar 

  • Ramsey, J. O., & Silverman, B.W. (1997). Functional data analysis. New York: Springer. (ISBN 0-387-94956-9).

    Google Scholar 

  • Scharl, T. & Leisch, F. (2006). The stochastic qt-clust algorithm: evaluation of stability and variance on time-course microarray data. In A. Rizzi & M. Vichi (Eds.), Compstat 2006—proceedings in computational statistics (pp. 1015–1022). Heidelberg: Physica.

    Google Scholar 

  • Scharl, T., & Leisch, F. (2008). Using neighborhood graphs for the investigation of E. coli gene clusters. In M. Ahdesmäki et al. (Eds.), Proceedings of the 5th international workshop on computational systems biology, WCSB 2008 (June 11-13, 2008, Leipzig, Germany) (pp. 157–160). Tampere, Finland: Tampere University of Technology.

    Google Scholar 

  • Serban, N., & Wasserman, L. (2005). Cats: Clustering after transformation and smoothing. Journal of the American Statistical Association, 100(471), 990–999.

    Article  MATH  MathSciNet  Google Scholar 

  • Sheng, Q., Moreau, Y., Smet, F. D., Marchal, K., & Moor, B. D. (2005). Advances in cluster analysis of microarray data. In F. Azuaje, & J. Dopazo (Eds.), Data analysis and visualization in genomics and proteomics. New York: Wiley (ISBN 0-470-09439-7).

    Google Scholar 

  • Smet, F. D., Mathys, J., Marchal, K., Thijs, G., Moor, B. D. & Moreau, Y. (2002). Adaptive quality-based clustering of gene expression profiles. Bioinformatics, 18(5) 735–746.

    Article  Google Scholar 

  • Tarpey, T. (2003). Clustering functional data. Journal of Classification, 20, 93–114.

    Article  MATH  MathSciNet  Google Scholar 

  • Tarpey, T. (2007). Linear transformations and the k–means clustering algorithm: Applications to clustering curves. The American Statistician, 61, 34–40.

    Article  MathSciNet  Google Scholar 

  • Thalamuthu, A., Mukhopadhyay, I., Zheng, X., & Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 22(19), 2405–2412.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Austrian K ind /K net Center of Biopharmaceutical Technology (ACBT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theresa Scharl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scharl, T., Leisch, F. (2009). Quality-Based Clustering of Functional Data: Applications to Time Course Microarray Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_62

Download citation

Publish with us

Policies and ethics