Skip to main content

Generating seeded trees from data sets

  • Spatial Joins
  • Conference paper
  • First Online:
Advances in Spatial Databases (SSD 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 951))

Included in the following conference series:

Abstract

In this paper we study the problem of how to perform spatial joins between two data sets with no pre-computed spatial indices. No techniques appear to exist to date that specifically target this problem. Our solution is also useful in the context of query optimization for complex spatial queries. In addition, we demonstrate that simple sampling techniques can be effective in reducing spatial join costs.

We extend the work in [LR94, LR95] and introduce the bootstrap-seeding technique, which allows seeded trees to be constructed directly from input data sets. We can thus dynamically construct two seeded trees for two data sets and perform a spatial join between them. The task of bootstrap-seeding comprises the subtasks of determining the number and the contents of the slots, and constructing the tree. Simple sampling techniques are used to determine the slot contents efficiently.

Our experiments show that spatial joins using our methods are very comparable in performance to that of joins between the same data sets with pre-computed R-trees, and confirm the viability of our method. When joining two data sets with different sizes, our studies suggest that it would be beneficial to bootstrap an initial seeded tree for the smaller data set, and then to construct a seeded tree for the larger data set using copy-seeding and the seed level filtering technique.

This work was supported in part by the Consortium for International Earth Science Information Networking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. H. Ahrens. Sequential random sampling. ACM Transactions on Mathematical Software, 11(2):157–169, June 1985.

    Google Scholar 

  2. Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient processing of spatial joins using R-trees. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 237–246, May 1993.

    Google Scholar 

  3. Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An efficient and robust access method for points and rectangles. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 322–332, May 1990.

    Google Scholar 

  4. A. F. Cardenas. Analysis and performance of inverted data base structures. Communications of ACM, 18(5):253–263, May 1975.

    Google Scholar 

  5. Brian Everitt. Cluster Analysis. Edward Arnold, London, third edition edition, 1993.

    Google Scholar 

  6. Christos Faloutsos, Timos Sellis, and Nick Roussopoulos. Analysis of object oriented spatial access methods. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 427–439, 1987.

    Google Scholar 

  7. Antonin Guttman. R-trees: A dynamic index structure for spatial searching. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 47–57, Aug. 1984.

    Google Scholar 

  8. Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data, An Introduction to Cluster Analysis. John Wiley & Sons, Inc., New York, 1990.

    Google Scholar 

  9. Wei Lu and Jiawei Han. Distance-associated join indices for spatial range search. In Proceedings of International Conference on Data Engineering, pages 284–292, 1992.

    Google Scholar 

  10. Ming-Ling Lo and C. V. Ravishankar. Spatial joins using seeded trees. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 209–220, Minneapolis, MN, May 1994.

    Google Scholar 

  11. Ming-Ling Lo and C. V. Ravishankar. Seeded trees for spatial joins: Structure and implementation. Technical report, Department of EECS, University of Michigan, Ann Arbor, Michigan, 1995.

    Google Scholar 

  12. J. A. Orenstein. Redundancy in spatial databases. In Proceedings of ACM SIGMOD International Conference on Management of Data, Portland, OR, 1989.

    Google Scholar 

  13. Jack Orenstein. A comparison of spatial query processing techniques for native and parameter spaces. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 343–352, 1990.

    Google Scholar 

  14. Jack Orenstein. An algorithm for computing the overlay of k-dimensional spaces. In O. Gunther and H.-J Schek, editors, Advances in Spatial Databases (SSD '91), pages 381–400, Zurich, Switzerland, August 28–30 1991. Springer-Verlag.

    Google Scholar 

  15. D Rotem. Spatial join indices. In Proceedings of International Conference on Data Engineering, pages 500–509, Kobe, Japan 1991.

    Google Scholar 

  16. T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of Very Large Data Bases, pages 3–11, Brighton, England, 1987.

    Google Scholar 

  17. P. Valduriez. Join indices. ACM Transactions on Database Systems, 12(2), 1987.

    Google Scholar 

  18. Jeffery Scott Vitter. Faster methods for random sampling. Communications of the ACM, 27(7):703–718, July 1984.

    Google Scholar 

  19. J. S. Vitter. Random sampling with reservoir. ACM Transactions on Mathematical Software, 11:37–57, March 1985.

    Google Scholar 

  20. S. B. Yao. Approximating block access in database organizations. Comm. of ACM, 20:260–261, Apr. 1977

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Max J. Egenhofer John R. Herring

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lo, ML., Ravishankar, C.V. (1995). Generating seeded trees from data sets. In: Egenhofer, M.J., Herring, J.R. (eds) Advances in Spatial Databases. SSD 1995. Lecture Notes in Computer Science, vol 951. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60159-7_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-60159-7_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60159-3

  • Online ISBN: 978-3-540-49536-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics