Skip to main content

Abstract

This paper is a review of promising applications of pyramidal classification to biological data. We show that overlapping and ordering properties can give new insights that can not be achieved using more classical methods. We examplify our point using three applications: (i) a genome scale sequence analysis, (ii) a new progressive multiple sequence alignment method, (iii) a cluster analysis of transcriptomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AUDE, J.-C., DIAZ-LAZCOZ, Y., CODANI, J.-J. and RISLER, J.-L. (1999): Application of the pyramidal clustering method to biological objects. Computer and Chemistry 23(3–4), 303–315.

    Article  Google Scholar 

  • BARRETT, T., SUZEK, T.O., TROUP, D.B., WILHITE, S.E., NGAU, W.-C., LEDOUX, P., RUDNEV, D., LASH, A.E., FUJIBUCHI, W. and EDGAR R. (2005): NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Research, Database issue 33, D562–D566.

    Google Scholar 

  • BATEMAN, A., COIN, L., DURBIN, R., FINN, R.D., HOLLICH, V., GRIFFTHSJONES, S., KHANNA, A., MARSHALL, M., MOXON, S., SONNHAMMER, E.L.L., STUDHOLME, D.J., YEATS, C. and EDDY, S.R. (2004): The Pfam protein families database. Nucleic Acids Research 32, 138–141.

    Article  Google Scholar 

  • BATZOGLOU, S. (2005): The many faces of sequence alignment. Briefings in Bioinformatics 6(1), 6–22.

    Article  Google Scholar 

  • BERTRAND, P. and DIDAY, E. (1990): Une généralisation des arbres hiérarchiques: les représentations pyramidales. Rev. Statistique Appliquée 38(3), 53–78.

    Google Scholar 

  • BERTRAND, P. and JANOWITZ, M.F. (2002): Pyramids and Weak Hierarchies in The Ordinal Model for Clustering. Discrete Appl. Math., 122, 55–81.

    Article  MATH  Google Scholar 

  • BULYK, M.L. (2003): Computational prediction of transcription-factor binding site locations. Genome Biol., 5(1), 201.

    Article  Google Scholar 

  • CARPENTIER, A.-S., RIVA, A., TISSEUR, P., DIDIER, G. and HENAUT A. (2004): The operons, a criterion to compare the reliability of transcriptome analysis tools: ICA is more reliable than ANOVA, PLS and PCA. Comput Biol Chem. 28(1), 3–10.

    Article  MATH  Google Scholar 

  • CODANI, J.-J., COMET, J.-P., AUDE, J.-C., GLEMET, E., WOZNIAK, A., RISLER, J.-L., HENAUT, A. and SLONIMSKI, P.P. (1999): Automatic analysis of large scale pairwise alignments of protein sequences. In: A.G. Craig and J.D. Hoheisel (Eds.): Methods in Microbiology: Automation, Genomic and Functional Analysis. Academic Press, (28) 229–244.

    Google Scholar 

  • DIDAY, E. (1984): Une représentation visuelle des classes empiétantes: les pyramides. INRIA, Rapport de Recherche No. 291.

    Google Scholar 

  • DO, C.B. and MAHABHASYAM, M.SP. and BRODNO, M. and BATZOGLOU, S. (2005): ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340.

    Article  Google Scholar 

  • EDGAR, R.C. (2004): MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797.

    Article  Google Scholar 

  • EISEN, M.B., SPELLMAN, P.T., BROWN, P.O. and BOTSTEIN, D. (1998): Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 95(25), 14863–14868.

    Article  Google Scholar 

  • FENG, D.F. and DOOLITTLE, R.F. (1987): Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360.

    Article  Google Scholar 

  • JONES, D.T. (1999): Protein Secondary Structure Prediction Based on position-specific Scoring Matrices. J. Mol. Biol. 292, 195–202.

    Article  Google Scholar 

  • KATOH, K., KUMA, K., TOH, H. and MIYATA, T. (2005): MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33(2), 511–518.

    Article  Google Scholar 

  • KOONIN, E., MUSHEGIAN, A., GALPERIN M. and WALKER D. (1997): Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol. 25, 619–637.

    Article  Google Scholar 

  • LEE, C., GRASSO, C. and SHARLOW, M.F. (2002): Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464.

    Article  Google Scholar 

  • LOUIS, A. (2001): La maitrise de l’information scientifique, clé de l’après séquencage Thèse de l’Université Versailles Saint-Quentin.

    Google Scholar 

  • LOUIS, A., OLLIVIER, E., AUDE, J.-C. and RISLER, J.-L. (2001): Massive sequence comparisons as a help in annotating genomic sequences. Genome Research 11, 1296–1303.

    Article  Google Scholar 

  • MORGENSTERN, B., DRESS, A. and WERNER, T. (1996): DIALIGN: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Nat. Acad. Sci. 32, 571–592.

    Google Scholar 

  • OLTVAI, Z.N. and BARABASI, A.L. (2002): Systems biology. Life’s complexity pyramid. Science 298(5594):763–4.

    Article  Google Scholar 

  • PARK, J. and TEICHMANN, S. (1998): Divclus: an automatic method in the gean-fammer package that finds homologous domains in single-and multi-domain proteins. Bioinformatics 14, 144–150.

    Article  Google Scholar 

  • PHILLIPS, A., JANIES, D. and WHEELER, W. (2000): Multiple sequence alignment in phylogenetic analysis. Molecular Phylogenetics and Evolution 16(3), 317–330.

    Article  Google Scholar 

  • SABATTI, C., ROHLIN, L., OH, M.K. and LIAO, J.C. (2002): Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30(13), 2886–93.

    Article  Google Scholar 

  • SAITOU, N. and NEI, M. (1987): The Neighbor-Joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425.

    Google Scholar 

  • SCHENA, M., SHALON, D., DAVIS, R.W. and BROWN, P.O. (1995): Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270(5235), 368–371.

    Article  Google Scholar 

  • SMITH, R.F. and SMITH, T.F. (1992): Pattern-Induced Multi-sequence Alignment (PIMA) algorithm employing secondary structure-dependent gap-penalties for comparative protein modelling. Protein Engineering 5, 35–41.

    Article  Google Scholar 

  • SPEED, T. (2003): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall / CRC, Boca Raton FL.

    MATH  Google Scholar 

  • THOMAS, P.D., CAMPBELL, M.J., KEJARIWAL, A., MI, H., KARLAK, B., DAVERMAN, R., DIEMER, K., MURUGANUJAN, A. and NARECHANIA, A. (2003): PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141. Supplementary Materials.

    Article  Google Scholar 

  • THOMPSON, J.D., HIGGINS, D.G. and GIBSON, T.J. (1994): Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680.

    Article  Google Scholar 

  • VAN MALLE, I., LASTERS, I. and WYNS, L. (2004): Align-m-a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20(9), 1428–1435.

    Article  Google Scholar 

  • VESCOVO, L., AUDE, J.-C., POLAILLON, G. and RISLER, J-L. (2004): Progressive multiple alignment based on pyramidal classification and applied to multi-domain proteins, proceedings of the 12th International Conference on Intelligent Systems for Molecular Biology 2004, Glasgow, Scotland.

    Google Scholar 

  • VESCOVO, L., AUDE, J.-C. and POLAILLON, G. (2005): Guide structure calculation: a critical step for the accuracy of progressive multiple sequence alignment algorithms. proceedings of the 4th European Conference of Computational Biology 2005, Madrid, Espagne.

    Google Scholar 

  • YOSHIHARA, S., GENG, X., OKAMOTO, S., YURA, K., MURATA, T., GO, M., OHMORI, M. and IKEUCHI M. (2001): Mutational analysis of genes involved in pilus structure, motility and transformation competency in the unicellular motile cyanobacterium Synechocystis sp. PCC 6803. Plant Cell Physiol. 42(1), 63–73.

    Article  Google Scholar 

  • YOSHIMURA, H., YANAGISAWA, S., KANEHISA, M. and OHMORI, M. (2002): Screening for the target gene of cyanobacterial cAMP receptor protein SYCRP1. Molecular microbiology 43(4), 843–853.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Polaillon, G., Vescovo, L., Michaut, M., Aude, JC. (2007). Mining Biological Data Using Pyramids. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_37

Download citation

Publish with us

Policies and ethics