A Novel High-Dimensional Index Method Based on the Mathematical Features

Zhang, Yu; Li, Jiayu; Yuan, Ye

doi:10.1007/978-3-319-42553-5_22

Yu Zhang¹⁸,
Jiayu Li¹⁸ &
Ye Yuan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9784))

Included in the following conference series:

International Conference on Big Data Computing and Communications

1538 Accesses

Abstract

Nowadays the nearest neighbor (NN) search in the high dimensional space can be applied in many fields and it becomes the focus of information science. Usually, R-near neighbor that sets a fixed query range R is used in place of NN search. However, the traditional methods for R-near neighbor can not achieve the satisfactory performance in the high dimensional space due to the curse of dimensionality. Moreover, some methods is based on probabilistic guarantees so it does not provide the 100 % accuracy guarantee. To improve the problem, in this paper, we propose a novel idea to build the index structure. This method is based on the mathematical features of the coordinates of the data points. Specifically, we employ the mean value and the standard deviation of the coordinate to index the data point. This method can efficiently solve the R-NN search with the 100 % accuracy guarantee in the high dimensional space. Extensive experimental results demonstrate the effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM Special Interest Group on Management of Data(SIGMOD), pp. 47–57 (1984)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Google Scholar
Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: Idistance: an adaptive \(B^{+}\)-tree based Indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)
Article Google Scholar
Berchtold, S., Bohm, C., Kriegel, H.-P.: The pyramid-technique: towards indexing beyond the curse of dimensionality. In: Proceedings of the ACM SIGMOD, pp. 142–153 (1998)
Google Scholar
Zhuang, Y.T., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)
Article Google Scholar
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of International Conference on Very Large Databases, pp. 194–205 (1998)
Google Scholar
Lawder, J.K., King, P.J.H.: Using space-filling curves for multi-dimensional indexing. In: Jeffery, K., Lings, B. (eds.) BNCOD 2000. LNCS, vol. 1832, pp. 20–35. Springer, Heidelberg (2000)
Chapter Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of International Conference on Very Large Databases, pp. 426–435 (1997)
Google Scholar
Beckmann, N., Kriegel, R. Schneider Seeger, B.: The \(R^{*}\)-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD, pp. 322–331 (1990)
Google Scholar
Sellis, T., Roussopoulos, N., Faloutsos, C.: The \(R^{+}\)-tree: a dynamic index for multidimensional objects. In: Proceedings of International Conference on Very Large Databases, pp. 507–518 (1987)
Google Scholar
Bohm, C.: A cost model for query processing in high-dimensional data. ACM Trans. Database Syst. 25, 129–178 (2000)
Article Google Scholar
Robinson, J.: The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of the ACM SIGMOD, pp. 10–18 (1981)
Google Scholar
Jinyang, H.V., Jagadish, W.L., Ooi, B.C.: DSH: data sensitive hashing for high-dimensional k-NN search. In: Proceedings of the ACM SIGMOD, pp. 1127–1138 (2014)
Google Scholar
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the ACM SIGMOD, pp. 563–576 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Northeastern University, Shenyang, China
Yu Zhang, Jiayu Li & Ye Yuan

Authors

Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiayu Li
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jiayu Li or Ye Yuan .

Editor information

Editors and Affiliations

Department of Computer Science, University of N. Carolina at Charlotte, Charlotte, North Carolina, USA
Yu Wang
Northeastern University, College of Information Science and Engineering, Shenyang, Liaoning, China
Ge Yu
Department of Electrical & Computer Engineering, Rutgers University, Piscataway, New Jersey, USA
Yanyong Zhang
Department of Electrical and Computer Engineering, University of Houston Department of Engineering, Houston, Texas, USA
Zhu Han
College of Information Science and Engineering, Northeastern University, Shenyang , Liaoning, China
Guoren Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Li, J., Yuan, Y. (2016). A Novel High-Dimensional Index Method Based on the Mathematical Features. In: Wang, Y., Yu, G., Zhang, Y., Han, Z., Wang, G. (eds) Big Data Computing and Communications. BigCom 2016. Lecture Notes in Computer Science(), vol 9784. Springer, Cham. https://doi.org/10.1007/978-3-319-42553-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-42553-5_22
Published: 19 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42552-8
Online ISBN: 978-3-319-42553-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics