Skip to main content

A Parallel Hierarchical Agglomerative Clustering Technique for Billingual Corpora Based on Reduced Terms with Automatic Weight Optimization

  • Conference paper
Advanced Data Mining and Applications (ADMA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

  • 2209 Accesses

Abstract

Multilingual corpora are becoming an essential resource for work in multilingual natural language processing. The aim of this paper is to investigate the effects of applying a clustering technique to parallel multilingual texts. It is interesting to look at the differences of the cluster mappings and the tree structures of the clusters. The effect of reducing the set of terms considered in clustering parallel corpora is also studied. After that, a genetic-based algorithm is applied to optimize the weights of terms considered in clustering the texts to classify unseen examples of documents. Specifically, the aim of this work is to introduce the tools necessary for this task and display a set of experimental results and issues which have become apparent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. Technical Report 00-34, University of Minnesota

    Google Scholar 

  2. Zhao, Y., Karypis, G.: Evaluation of Hierarchical Clustering Algorithms for Document Datasets. ACM Press, New York (2002)

    Book  Google Scholar 

  3. Moore, J., Han, E., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B.: Web Page Categorisation and Feature Selection using Association Rule and Principal Component Clustering. In: 7th Workshop on Information Technologies and Systems (1997)

    Google Scholar 

  4. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Research and Development in Information Retrieval, pp. 46–54 (1998)

    Google Scholar 

  5. Romaric, B.M.: Multilingual Document Clusters Discovery. In: RIAO, pp. 116–125 (2004)

    Google Scholar 

  6. Kikui, G., Hayashi, Y., Suzaki, S.: Cross-lingual Information Retrieval on the WWW. In: Multilinguality in Software Engineering: The AI Contribution (1996)

    Google Scholar 

  7. Xu, J., Weischedel, R.: Cross-lingual Information Retrieval Using Hidden Markov Models. In: The Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 2000) (2000)

    Google Scholar 

  8. Nakov, P.: BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian. In: Proceedings of Workshop on Balkan Language Resources and Tools (2003)

    Google Scholar 

  9. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  10. Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Analysis and Machine Intelligence, 224–227 (1979)

    Google Scholar 

  11. Alfred, R., Paskaleva, E., Kazakov, D., Bartlett, M.: HAC For Cross-language Information Retrieval. International Journal of Translation 19(1), 139–162

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alfred, R. (2009). A Parallel Hierarchical Agglomerative Clustering Technique for Billingual Corpora Based on Reduced Terms with Automatic Weight Optimization. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03348-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03347-6

  • Online ISBN: 978-3-642-03348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics