Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 265))

Abstract

Since 80% of all information in the World Wide Web (WWW) is in textual form, most of the search activities of the users are based on groups of search words forming queries that represent their information needs. The quality of the returned results -usually evaluated using measures such as precision and recall- mostly depends on the quality of the chosen query terms. Therefore, their relatedness must be evaluated accordingly using and matched against the documents to be found. In order to do so properly, in this paper, the notion of n-term co-occurrences will be introduced and distinguished from the related concepts of n-grams and higher-order co-occurrences. Finally, their applicability for search, clustering and data mining processes will be considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. November 2013 Web Server Survey (2013), http://news.netcraft.com/archives/2013/11/01/november-2013-web-server-survey.html (last retrieved on March 01, 2014)

  2. Grimes, S.: Unstructured Data and the 80 Percent Rule (2008), http://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule (last retrieved on March 01, 2014)

  3. Agrawal, R., Yu, X., King, I., Zajac, R.: Enrichment and Reductionism: Two Approaches for Web Query Classification. In: Lu, B.-L., Zhang, L., Kwok, J., et al. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 148–157. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Website of Google Autocomplete, Web Search Help (2013), http://support.google.com/websearch/bin/answer.py?hl=en&answer=106230 (last retrieved on March 01, 2014)

  5. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Frei, H.-P., Harman, D., Schäuble, P., Wilkinson, R. (eds.) Proc. of the 19th AnnualInternational ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1996, Zurich, pp. 4–11 (1996)

    Google Scholar 

  6. Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)

    Google Scholar 

  7. Keiichiro, H., et al.: Query expansion based on predictive algorithms for collaborative filtering. In: Proc. of the 24th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 414–415 (2001)

    Google Scholar 

  8. Han, L., Chen, G.: HQE: A hybrid method for query expansion. Expert Systems with Applications Journal 36, 7985–7991 (2009)

    Article  Google Scholar 

  9. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  10. Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  11. Heyer, G., Quasthoff, U., Wittig, T.: Text Mining: Wissensrohstoff Text: Konzepte, Algorithmen, Ergebnisse. W3L-Verlag, Dortmund (2006)

    Google Scholar 

  12. Büchler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturie-ten Daten. Master’s thesis, University of Leipzig (2006)

    Google Scholar 

  13. Dice, L.R.: Measures of the Amount of Ecologic Association Between Species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  14. Jaccard, P.: Étude Comparative de la Distribution Floraledansune Portion des Alpeset des Jura. Bulletin de la SociétéVaudoise des Sciences Naturelles 37, 547–579 (1901)

    Google Scholar 

  15. Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. of the Second International Workshop on Computational Approaches to Collocations, Wien (2002)

    Google Scholar 

  16. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)

    Google Scholar 

  17. Michel, J., et al.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 14 331(6014), 176–182 (2011)

    Google Scholar 

  18. Biemann, C., Bordag, S., Quasthoff, U.: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences. In: Proc. of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 967–970 (2004)

    Google Scholar 

  19. Witschel, H.F.: Terminologie-Extraktion - Möglichkeiten der Kombination statistischer und musterbasierter Verfahren. Ergon-Verlag (2004)

    Google Scholar 

  20. Luhn, H.P.: Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  21. Website of DocAnalyser (2014), http://www.docanalyser.de (last retrieved on March 01, 2014)

  22. Kubek, M., Unger, H.: Detecting Source Topics by Analysing Directed Co-occurrence Graphs. In: Proc. 12th Intl. Conf. on Innovative Internet Community Systems, GI Lecture Notes in Informatics, vol. P-204, pp. 202–211. Köllen Verlag, Bonn (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Kubek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kubek, M., Unger, H. (2014). On N-term Co-occurrences. In: Boonkrong, S., Unger, H., Meesad, P. (eds) Recent Advances in Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 265. Springer, Cham. https://doi.org/10.1007/978-3-319-06538-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06538-0_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06537-3

  • Online ISBN: 978-3-319-06538-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics