Skip to main content

Some Advances in Data-Mining Techniques

  • Conference paper
  • First Online:
Next Generation Information Technologies and Systems (NGITS 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1649))

  • 234 Accesses

Abstract

Research in the MIDAS project at Stanford explores new ideas in data-mining. One early result was a new algorithm for Web search, that resulted in a recently turned commercial search engine, called Google.

A second area of interest is in generalizing the techniques such as “a-priori,” which were developed by Rakesh Agrawal and his associates at IBM Research in Almaden to allow “market-basket analysis,” or “association-rule mining.” The latter problem deals with finding items that customers frequently buy together. We have developed a framework called “query flocks.” In this system, we can phrase highly complex data-mining queries, including many that are not handled well by commercial SQL systems.We then compile the “query flock” into a sequence of SQL queries that are simple enough to be optimized by commercial systems.

A third interesting challenge is summarizing the knowledge of the Web in a form that resembles conven- tional relational data. We describe some experiments that have been carried out to exploit the redundancy of the Web and discover the patterns in which facts of a certain kind tend to exist.

Finally, we shall talk about extending the techniques for association-rule mining to extract relationships that are not based on “high support,” i.e., sets of items that appear very frequently in market baskets. Important example include intelligence-gathering, where we want to find terms that are highly correlated in documents, but that do not appear in very many documents. The MIDAS group has recently developed some techniques to process very large amounts of data and detect efficiently items that are highly correlated but not very frequent. We can even find implications, similar to causal relationships, without requiring high support for the associated items.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ullman, J.D. (1999). Some Advances in Data-Mining Techniques. In: Pinter, R.Y., Tsur, S. (eds) Next Generation Information Technologies and Systems. NGITS 1999. Lecture Notes in Computer Science, vol 1649. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48521-X_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-48521-X_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66225-9

  • Online ISBN: 978-3-540-48521-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics