Skip to main content

Multi-objective Genetic Algorithm Based Clustering Approach and Its Application to Gene Expression Data

  • Conference paper
Advances in Information Systems (ADVIS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3261))

Included in the following conference series:

Abstract

Gene clustering is a common methodology for analyzing similar data based on expression trajectories. Clustering algorithms in general need the number of clusters as a priori, and this is mostly hard to estimate, even by domain experts. In this paper, we use Niched Pareto k-means Genetic Algorithm (GA) for clustering m-RNA data. After running the multi-objective GA, we get the pareto-optimal front that gives alternatives for the optimal number of clusters as a solution set. We analyze the clustering results under two cluster validity techniques commonly cited in the literature, namely DB index and SD index. This gives an idea about ranking the optimal numbers of clusters for each validity index. We tested the proposed clustering approach by conducting experiments using three data sets, namely figure2data, cancer (NCI60) and Leukaemia data. The obtained results are promising; they demonstrate the applicability and effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barash, Y., Friedman, N.: Context-specific Bayesian clustering for gene expression data. In: Proc. of RECOMB, pp. 12–21 (2001)

    Google Scholar 

  2. Ben-Dor, Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computatonal Biology (1999)

    Google Scholar 

  3. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1, 224–227 (1979)

    Article  Google Scholar 

  4. Deb, K., et al.: A Fast Elitist Non-Dominated Sorting Genetic Algorithm for Multi-Objective Optimization: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, Springer, Heidelberg (2000)

    Google Scholar 

  5. Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  6. Grabmeier, J., et al.: Techniques of Cluster Algorithms in Data Mining. In: Data Mining and Knowledge Discovery, vol. 6, pp. 303–360. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  7. Gene Expression Data of the Genomic Resources, University of Stanford (Downloaded in May 2004), Available, http://genome-www.stanford.edu/serum/data.html

  8. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  9. Halkidi, M., Vazirgiannis, M., Batistakis, I.: Quality scheme assessment in the clustering process. In: Proceedings of PKDD, Lyon, France (2000)

    Google Scholar 

  10. Halkidi, M., Vazirgiannis, M.: Clustering Validity Assessment: Finding the optimal partitioning of a data set. In: Proceedings of IEEE ICDM, California (November 2001)

    Google Scholar 

  11. Hartigan, J.A.: Clustering Algorithms. John Wiley and Sons, New York (1975)

    MATH  Google Scholar 

  12. Horn, J., Nafpliotis, N., Goldberg, D.E.: A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of IEEE CEC, IEEE World Congress on Computational Computation, Piscataway, NJ, vol. 1, pp. 82–87 (1994)

    Google Scholar 

  13. Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologies 29, 190–241 (1976)

    MATH  MathSciNet  Google Scholar 

  14. Iyer, V.R., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283(5398), 83–87 (1999)

    Article  Google Scholar 

  15. Jain, K., et al.: Data Clustering: A Review. ACM Surveys 31(3) (1999)

    Google Scholar 

  16. Kohonen, T.: Self-organizing Maps. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  17. Liu, Y., Özyer, T., Alhajj, R., Barker, K.: Validity Analysis of Clustering Obtained Using Multi-Objective Genetic Algorithm. In: Proc. of IEEE ISDA (2004)

    Google Scholar 

  18. Lu, Y., et al.: FGKA: A Fast Genetic K-means Clustering Algorithm. In: Proc. of ACM Symposium on Applied Computing, Nicosia, Cyprus, pp. 162–163 (2004)

    Google Scholar 

  19. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: University of California Press (ed.) Proc. of Berkeley Symposium on Math Stat Probability, Cam LML, Neyman J, pp. 281–297 (1965)

    Google Scholar 

  20. Morgan, B.J.T.,, A.P.: Non-uniqueness and inversions in cluster analysis. Applied Statisics 44, 114–134

    Google Scholar 

  21. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Comp App. Math 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  22. Scherf, U., et al.: A Gene Expression Database for the Molecular Pharmacology of Cancer. Nat Genet 24, 236–244 (2000)

    Google Scholar 

  23. Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data: Current Topics in Computational Biology. MIT Press, Cambridge (2001)

    Google Scholar 

  24. Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. of. Nat’l. Acad. Sci. USA 96, 2907–2912 (1999)

    Article  Google Scholar 

  25. Tamura, K., et al.: Necessary and Sufficient Conditions for Local and Global Non-Dominated Solutions in Decision Problems with Multi-objectives. Journal of Optimization Theory and Applications 27, 509–523 (1979)

    Article  MathSciNet  Google Scholar 

  26. Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1998)

    MATH  Google Scholar 

  27. Yeung, K.Y., et al.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Özyer, T., Liu, Y., Alhajj, R., Barker, K. (2004). Multi-objective Genetic Algorithm Based Clustering Approach and Its Application to Gene Expression Data. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30198-1_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23478-4

  • Online ISBN: 978-3-540-30198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics