Quality-Based Clustering of Functional Data: Applications to Time Course Microarray Data

Scharl, Theresa; Leisch, Friedrich

doi:10.1007/978-3-642-01044-6_62

Theresa Scharl^5,6 &
Friedrich Leisch

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2899 Accesses
1 Citations

Abstract

Cluster methods are typically applied to time course gene expression data to find co-regulated genes which can finally help to reveal pathways and interactions between genes. Clustering is either carried out on the raw data or on functional data. In functional data analysis a curve is fit to each observation in order to account for time dependency. As gene expression over time is biologically a continuous process it can be represented by a continuous function. The different curve shapes found in a dataset can have important interpretations and characteristic patterns can be found by clustering the estimated regression coefficients.

In this simulation study on artificial data the well-known K-Means algorithm as well as the quality-based cluster algorithm QT-Clust are applied to both the raw data as well as functional data. The performance of the different methods is evaluated when different types of noise are added to the data. All cluster algorithms used are implemented in R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abraham, C., Cornillon, P.-A., Matzner-Lober, E. & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scandinavian Journal of Statistics, 30(3), 581–595.
Article MATH MathSciNet Google Scholar
Androulakis, I., Yang, E., & Almon, R. (2007). Analysis of time-series gene expression data: Methods, challenges, and opportunities. Annual Review of Biomedical Engineering, 9, 205–228.
Article Google Scholar
de Hoon, M. J.L., Imoto, S., & Miyano, S. (2002). Statistical analysis of a small set of time-ordered gene expression data using linear splines. Bioinformatics, 18(11), 1477–1485.
Article Google Scholar
Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95, 14863–14868.
Article Google Scholar
Fraley, C., & Raftery, A. (1998). How many clusters? Which clustering method? Answers via model–based cluster analysis. The Computer Journal, 41(8), 578–588.
Article MATH Google Scholar
Hakamada, K., Okamoto, M. & Hanai, T. (2006). Novel technique for preprocessing high dimensional time-course data from DNA microarray: mathematical model–based clustering. Bioinformatics, 22(7) 843–848.
Article Google Scholar
Heyer, L. J., Kruglyak, S. & Yooseph, S. (1999). Exploring expression data: identification and analysis of coex pressed genes. Genome Research, 9, 1106–1115.
Article Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Kerr, G., Ruskin, H. J., Crane, M., & Doolan, P. (2008). Techniques for clustering gene expression data. Computers in Biology and Medicine, 38(3), 283–293.
Article Google Scholar
Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics and Data Analysis, 51(2) 526–544.
Article MATH MathSciNet Google Scholar
R Development Core Team. (2009). R: A language and environment for statistical computing. Nienna, Austria (ISBN: 3-900051-07-0).
Google Scholar
Ramsey, J. O., & Silverman, B.W. (1997). Functional data analysis. New York: Springer. (ISBN 0-387-94956-9).
Google Scholar
Scharl, T. & Leisch, F. (2006). The stochastic qt-clust algorithm: evaluation of stability and variance on time-course microarray data. In A. Rizzi & M. Vichi (Eds.), Compstat 2006—proceedings in computational statistics (pp. 1015–1022). Heidelberg: Physica.
Google Scholar
Scharl, T., & Leisch, F. (2008). Using neighborhood graphs for the investigation of E. coli gene clusters. In M. Ahdesmäki et al. (Eds.), Proceedings of the 5th international workshop on computational systems biology, WCSB 2008 (June 11-13, 2008, Leipzig, Germany) (pp. 157–160). Tampere, Finland: Tampere University of Technology.
Google Scholar
Serban, N., & Wasserman, L. (2005). Cats: Clustering after transformation and smoothing. Journal of the American Statistical Association, 100(471), 990–999.
Article MATH MathSciNet Google Scholar
Sheng, Q., Moreau, Y., Smet, F. D., Marchal, K., & Moor, B. D. (2005). Advances in cluster analysis of microarray data. In F. Azuaje, & J. Dopazo (Eds.), Data analysis and visualization in genomics and proteomics. New York: Wiley (ISBN 0-470-09439-7).
Google Scholar
Smet, F. D., Mathys, J., Marchal, K., Thijs, G., Moor, B. D. & Moreau, Y. (2002). Adaptive quality-based clustering of gene expression profiles. Bioinformatics, 18(5) 735–746.
Article Google Scholar
Tarpey, T. (2003). Clustering functional data. Journal of Classification, 20, 93–114.
Article MATH MathSciNet Google Scholar
Tarpey, T. (2007). Linear transformations and the k–means clustering algorithm: Applications to clustering curves. The American Statistician, 61, 34–40.
Article MathSciNet Google Scholar
Thalamuthu, A., Mukhopadhyay, I., Zheng, X., & Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 22(19), 2405–2412.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Austrian K_ind/K_net Center of Biopharmaceutical Technology (ACBT).

Author information

Authors and Affiliations

Institut für Statistik und Wahrscheinlichkeitstheorie, Technische Universität Wien, Wiedner Hauptstr. 8-10, 1040, Vienna, Austria
Theresa Scharl
Department für Biotechnologie, Universität für Bodenkultur Wien, Muthgasse 18, 1190, Vienna, Austria
Theresa Scharl

Authors

Theresa Scharl
View author publications
You can also search for this author in PubMed Google Scholar
Friedrich Leisch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theresa Scharl .

Editor information

Editors and Affiliations

Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Andreas Fink
Dept. Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom
Berthold Lausen
Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Wilfried Seidel
FB 12 Mathematik und Informatik, Datenbionik AG, Universität Marburg, Hans-Meerwein-Straße, Marburg, 35032, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scharl, T., Leisch, F. (2009). Quality-Based Clustering of Functional Data: Applications to Time Course Microarray Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-01044-6_62
Published: 31 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics