Discovering Synonyms Based on Frequent Termsets

Rybinski, Henryk; Kryszkiewicz, Marzena; Protaziuk, Grzegorz; Jakubowski, Adam; Delteil, Alexandre

doi:10.1007/978-3-540-73451-2_54

Henryk Rybinski¹,
Marzena Kryszkiewicz¹,
Grzegorz Protaziuk¹,
Adam Jakubowski¹ &
…
Alexandre Delteil²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4585))

Included in the following conference series:

International Conference on Rough Sets and Intelligent Systems Paradigms

1273 Accesses
12 Citations

Abstract

Synonymy has been of high importance in information retrieval and automatic indexing. Recently, in the view of special needs for domain ontology building and maintenance, the problem returns with a higher demand. In the presented paper, we present a novel text mining approach to discovering synonyms or close meaning terms. The offered measures of closeness of terms (or their contexts) are expressed by means of data mining notions; namely, frequent termsets and association rules. The measures can be calculated by using data mining techniques, such as the well known Apriori algorithm. The approach is domain-independent and large-scale. It is, however, restricted to the recognition of parts of speech. In that sense the approach is language dependent, up to the language dependency of the parts of speech tagging process. The experimental results obtained with the approach are presented.

The work has been performed within the project granted by France Telecom.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th Int’l Conf. on Very Large Databases, Santiago, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: The ESF Exploratory Workshop on Pattern Detection and Discovery in Data Mining, Imperial College, London (2002)
Google Scholar
Baxendal, P.B.: An empirical model for computer indexing. In: Machine Indexing, American U., Washington, DC, pp. 207–218 (1962)
Google Scholar
Delgado, M., Martin-Bautista, M.J., Sanchez, D., Amparo Vila Miranda, M.: Mining Text Data: Special Features and Patterns. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 140–153. Springer, Heidelberg (2002)
Chapter Google Scholar
General Architecture for Text Engineering. http://gate.ac.uk/projects.html
Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntatic and Window Based Approaches. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus processing for Lexical Acquisition, pp. 205–216. MIT Press, Cambridge (1995)
Google Scholar
Hepple, M.: Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000) (2000)
Google Scholar
Hotho, A., Maedche, A., Staab, S., Zacharias, V.: On Knowledgeable Unsupervised Text Mining. In: Proc. of the DaimlerChrysler Workshop on Text Mining, Ulm (2002)
Google Scholar
Hamon, T., Nazarenko, A., Gros, C.: A step towards the detection of semantic variants of terms in technical documents. In: Proc. of the 36th Ann. meeting of ACL (1998)
Google Scholar
Lewis, P.A.W., Baxendale, P.B., Bennett, J.L.: Statistical Discrimination of the Synonymy/Antonymy Relationship Between Words. J. of the ACM 14(1), 20–44 (1967)
Article Google Scholar
Kryszkiewicz, M.: Concise Representation of Frequent Patterns based on Disjunction-Free Generators. In: Proc. of the 2001 IEEE International Conference on Data Mining (ICDM), pp. 305–312. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Kryszkiewicz, M., Gajek, M.: Concise Representation of Frequent Patterns based on Generalized Disjunction-Free Generators. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 159–171. Springer, Heidelberg (2002)
Chapter Google Scholar
Maedche, A., Staab, S.: Mining Ontologies from Text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)
Chapter Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Comm. ACM 8(10), 627–633 (1965)
Article Google Scholar
Stevens, J.S., Husted, T., Cutting, D., Carlson, P.: Apache Lucene Overview (2006) http://lucene.apache.org/java/docs/index.pdf
Velardi, P., Fabriani, P., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proc. of FOIS, pp. 270–284. ACM Press, New York (2001)
Chapter Google Scholar
Wu, H., Zhou, M.: Optimizing Synonym Extraction Using Monolingual and Bilingual Resources. Ann. Meeting of the ACL. In: Proc. of the 2nd Int’l workshop on Paraphrasing, vol. 16, pp. 72–79 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

ICS, Warsaw University of Technology,
Henryk Rybinski, Marzena Kryszkiewicz, Grzegorz Protaziuk & Adam Jakubowski
France Telecome R & D,
Alexandre Delteil

Authors

Henryk Rybinski
View author publications
You can also search for this author in PubMed Google Scholar
Marzena Kryszkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Protaziuk
View author publications
You can also search for this author in PubMed Google Scholar
Adam Jakubowski
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Delteil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Marzena Kryszkiewicz James F. Peters Henryk Rybinski Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., Delteil, A. (2007). Discovering Synonyms Based on Frequent Termsets. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_54

Download citation

DOI: https://doi.org/10.1007/978-3-540-73451-2_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73450-5
Online ISBN: 978-3-540-73451-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics