Skip to main content

Selecting Representative Objects from Large Database by Using K-Skyband and Top-k Dominating Queries in MapReduce Environment

  • Conference paper
Advanced Data Mining and Applications (ADMA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

Abstract

We consider a problem to select representative distinctive objects in a numerical database, which is an important problem in an early stage of knowledge discovery process. Skyline query and its variants are functions to find such representative objects. Skyline query selects representative objects that are not dominated by any other object in the dataset. Though skyline query is useful function, it cannot control the size of selected objects. In order to solve the problem, “top-k dominating query” and “K-skyband queries” have been introduced. However, conventional algorithms for computing those functions are not well suited for parallel distributed environment. In this paper, we consider a method for computing both queries in a parallel distributed framework called MapReduce, which is a popular framework to handle “big data”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balke, W.-T., Güntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 256–273. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of SIGMOD

    Google Scholar 

  3. Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of ICDE

    Google Scholar 

  4. Chan, C.Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: Finding k-dominant skyline in high dimensional space. In: Proceedings of ACM SIGMOD

    Google Scholar 

  5. Chan, C.-Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: On high dimensional skylines. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 478–495. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of ICDE

    Google Scholar 

  7. Dellis, E., Seeger, B.: Efficient computation of reverse skyline queries. In: Proceedings of VLDB

    Google Scholar 

  8. Gong, Z., Sun, G.-Z., Yuan, J., Zhong, Y.: Efficient top-k query algorithms using K-skyband partition. In: Mueller, P., Cao, J.-N., Wang, C.-L. (eds.) INFOSCALE 2009. LNICST, vol. 18, pp. 288–305. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Jiang, D., Tung, A.K.H., Chen, G.: Map-join-reduce: Toward scalable and efficient data analysis on large clusters. IEEE Transactions Knowledge Data Engineering, TKDE (2011)

    Google Scholar 

  10. Li, C., Ooi, B.C., Tung, A.K.H., Wang, S.: Dada: A data cube for dominant relationship analysis. In: Proceedings of SIGMOD

    Google Scholar 

  11. Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: Efficient skyline computation over sliding windows. In: Proceedings of ICDE

    Google Scholar 

  12. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of SIGMOD

    Google Scholar 

  13. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Transactions on Database Systems (2005)

    Google Scholar 

  14. Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using mapreduce. In: Proceedings of VLDB

    Google Scholar 

  15. Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: Proceedings of VLDB

    Google Scholar 

  16. Tao, Y., Lin, W., Xiao, X.: Minimal mapreduce algorithm. In: Proceedings of SIGMOD

    Google Scholar 

  17. Tao, Y., Xiao, X., Pei, J.: Subsky: Efficient computation of skylines in subspaces. In: Proceedings of ICDE

    Google Scholar 

  18. Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: Proceedings of SIGMOD

    Google Scholar 

  19. Vlachou, A., Doulkeridis, C., Kotidis, Y., Vazirgiannis, M.: Skypeer: Efficient subspace skyline computation over distributed data. In: Proceedings of ICDE

    Google Scholar 

  20. Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of VLDB

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Siddique, M.A., Tian, H., Morimoto, Y. (2014). Selecting Representative Objects from Large Database by Using K-Skyband and Top-k Dominating Queries in MapReduce Environment. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14717-8_44

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14716-1

  • Online ISBN: 978-3-319-14717-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics