Skip to main content

Discovering Synonyms Based on Frequent Termsets

  • Conference paper
Rough Sets and Intelligent Systems Paradigms (RSEISP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4585))

Abstract

Synonymy has been of high importance in information retrieval and automatic indexing. Recently, in the view of special needs for domain ontology building and maintenance, the problem returns with a higher demand. In the presented paper, we present a novel text mining approach to discovering synonyms or close meaning terms. The offered measures of closeness of terms (or their contexts) are expressed by means of data mining notions; namely, frequent termsets and association rules. The measures can be calculated by using data mining techniques, such as the well known Apriori algorithm. The approach is domain-independent and large-scale. It is, however, restricted to the recognition of parts of speech. In that sense the approach is language dependent, up to the language dependency of the parts of speech tagging process. The experimental results obtained with the approach are presented.

The work has been performed within the project granted by France Telecom.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th Int’l Conf. on Very Large Databases, Santiago, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  2. Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: The ESF Exploratory Workshop on Pattern Detection and Discovery in Data Mining, Imperial College, London (2002)

    Google Scholar 

  3. Baxendal, P.B.: An empirical model for computer indexing. In: Machine Indexing, American U., Washington, DC, pp. 207–218 (1962)

    Google Scholar 

  4. Delgado, M., Martin-Bautista, M.J., Sanchez, D., Amparo Vila Miranda, M.: Mining Text Data: Special Features and Patterns. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 140–153. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. General Architecture for Text Engineering. http://gate.ac.uk/projects.html

  6. Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntatic and Window Based Approaches. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus processing for Lexical Acquisition, pp. 205–216. MIT Press, Cambridge (1995)

    Google Scholar 

  7. Hepple, M.: Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000) (2000)

    Google Scholar 

  8. Hotho, A., Maedche, A., Staab, S., Zacharias, V.: On Knowledgeable Unsupervised Text Mining. In: Proc. of the DaimlerChrysler Workshop on Text Mining, Ulm (2002)

    Google Scholar 

  9. Hamon, T., Nazarenko, A., Gros, C.: A step towards the detection of semantic variants of terms in technical documents. In: Proc. of the 36th Ann. meeting of ACL (1998)

    Google Scholar 

  10. Lewis, P.A.W., Baxendale, P.B., Bennett, J.L.: Statistical Discrimination of the Synonymy/Antonymy Relationship Between Words. J. of the ACM 14(1), 20–44 (1967)

    Article  Google Scholar 

  11. Kryszkiewicz, M.: Concise Representation of Frequent Patterns based on Disjunction-Free Generators. In: Proc. of the 2001 IEEE International Conference on Data Mining (ICDM), pp. 305–312. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  12. Kryszkiewicz, M., Gajek, M.: Concise Representation of Frequent Patterns based on Generalized Disjunction-Free Generators. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 159–171. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Maedche, A., Staab, S.: Mining Ontologies from Text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  14. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Comm. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  15. Stevens, J.S., Husted, T., Cutting, D., Carlson, P.: Apache Lucene Overview (2006) http://lucene.apache.org/java/docs/index.pdf

  16. Velardi, P., Fabriani, P., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proc. of FOIS, pp. 270–284. ACM Press, New York (2001)

    Chapter  Google Scholar 

  17. Wu, H., Zhou, M.: Optimizing Synonym Extraction Using Monolingual and Bilingual Resources. Ann. Meeting of the ACL. In: Proc. of the 2nd Int’l workshop on Paraphrasing, vol. 16, pp. 72–79 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marzena Kryszkiewicz James F. Peters Henryk Rybinski Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., Delteil, A. (2007). Discovering Synonyms Based on Frequent Termsets. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73451-2_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73450-5

  • Online ISBN: 978-3-540-73451-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics