Some Advances in Data-Mining Techniques

Ullman, Jeffrey D.

doi:10.1007/3-540-48521-X_1

Jeffrey D. Ullman⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1649))

Included in the following conference series:

International Workshop on Next Generation Information Technologies and Systems

234 Accesses

Abstract

Research in the MIDAS project at Stanford explores new ideas in data-mining. One early result was a new algorithm for Web search, that resulted in a recently turned commercial search engine, called Google.

A second area of interest is in generalizing the techniques such as “a-priori,” which were developed by Rakesh Agrawal and his associates at IBM Research in Almaden to allow “market-basket analysis,” or “association-rule mining.” The latter problem deals with finding items that customers frequently buy together. We have developed a framework called “query flocks.” In this system, we can phrase highly complex data-mining queries, including many that are not handled well by commercial SQL systems.We then compile the “query flock” into a sequence of SQL queries that are simple enough to be optimized by commercial systems.

A third interesting challenge is summarizing the knowledge of the Web in a form that resembles conven- tional relational data. We describe some experiments that have been carried out to exploit the redundancy of the Web and discover the patterns in which facts of a certain kind tend to exist.

Finally, we shall talk about extending the techniques for association-rule mining to extract relationships that are not based on “high support,” i.e., sets of items that appear very frequently in market baskets. Important example include intelligence-gathering, where we want to find terms that are highly correlated in documents, but that do not appear in very many documents. The MIDAS group has recently developed some techniques to process very large amounts of data and detect efficiently items that are highly correlated but not very frequent. We can even find implications, similar to causal relationships, without requiring high support for the associated items.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
Jeffrey D. Ullman

Authors

Jeffrey D. Ullman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Matam - Advanced Technology Center, IBM Research Laboratory in Haifa, Haifa, 31905, Israel
Ron Y. Pinter
SurroMed, Inc., 1060 East Meadow Circle, Palo Alto, CA, 94303, USA
Shalom Tsur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ullman, J.D. (1999). Some Advances in Data-Mining Techniques. In: Pinter, R.Y., Tsur, S. (eds) Next Generation Information Technologies and Systems. NGITS 1999. Lecture Notes in Computer Science, vol 1649. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48521-X_1

Download citation

DOI: https://doi.org/10.1007/3-540-48521-X_1
Published: 18 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66225-9
Online ISBN: 978-3-540-48521-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics