Skip to main content

Data Similarity-Aware Computation Infrastructure for the Cloud

  • Chapter
  • First Online:
Searchable Storage in Cloud Computing
  • 435 Accesses

Abstract

The cloud is emerging for scalable and efficient cloud services. In order to meet the needs of handling massive data and decreasing data migration, the computation infrastructure requires efficient data placement and proper management for cached data. We propose an efficient and cost-effective multilevel caching scheme, called MERCURY, as computation infrastructure of the cloud. The idea behind MERCURY is to explore and exploit data similarity and support efficient data placement. In order to accurately and efficiently capture the data similarity, we leverage low-complexity Locality-Sensitive Hashing (LSH). In our design, in addition to the problem of space inefficiency, we identify that a conventional LSH scheme also suffers from the problem of homogeneous data placement. To address these two problems, we design a novel Multicore-enabled LSH (MC-LSH) that accurately captures the differentiated similarity across data. The similarity-aware MERCURY hence partitions data into L1 cache, L2 cache, and main memory based on their distinct localities, which help optimize cache utilization and minimize the pollution in the last-level cache. Besides extensive evaluation through simulations, we also implemented MERCURY in a system. Experimental results based on real-world applications and datasets demonstrate the efficiency and efficacy of our proposed schemes (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y. Hua, X. Liu, D. Feng, Data similarity-aware computation infrastructure for the cloud. IEEE Trans. Comput. (TC) 63(1), 3–16 (2014)

    Article  MathSciNet  Google Scholar 

  2. IDC iView, Extracting Value from Chaos (2011)

    Google Scholar 

  3. Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)

    Article  Google Scholar 

  4. M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Article  Google Scholar 

  5. S. Bykov, A. Geller, G. Kliot, J. Larus, R. Pandya, J. Thelin, Orleans: cloud computing for everyone, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)

    Google Scholar 

  6. S. Wu, F. Li, S. Mehrotra, B. Ooi, Query optimization for massively parallel data processing, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)

    Google Scholar 

  7. L. Soares, D. Tam, M. Stumm, Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer, in Proceedings of the MICRO (2009), pp. 258–269

    Google Scholar 

  8. S. Biswas, D. Franklin, A. Savage, R. Dixon, T. Sherwood, F. Chong, Multi-execution: multicore caching for data-similar executions, in Proceedings of the ISCA (2009)

    Google Scholar 

  9. M. Chaudhuri, Pagenuca: selected policies for page-grain locality management in large shared chip-multiprocessor caches, in Proceedings of the HPCA (2009), pp. 227–238

    Google Scholar 

  10. S. Srikantaiah, R. Das, A.K. Mishra, C.R. Das, M. Kandemir, A case for integrated processor-cache partitioning in chip multiprocessors, in Proceedings of the SC (2009)

    Google Scholar 

  11. X. Ding, K. Wang, X. Zhang, SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores, in Proceedings of the EuroSys (2011)

    Google Scholar 

  12. Y. Chen, S. Byna, X. Sun, Data access history cache and associated data prefetching mechanisms, in Proceedings of the SC (2007)

    Google Scholar 

  13. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Enabling software management for multicore caches with a lightweight hardware support, in Proceedings of the SC (2009)

    Google Scholar 

  14. D. Zhan, H. Jiang, S.C. Seth, STEM: spatiotemporal management of capacity for intra-core last level caches, in Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)

    Google Scholar 

  15. D. Zhan, H. Jiang, S.C. Seth, Locality & utility co-optimization for practical capacity management of shared last level caches, in Proceedings of the ACM International Conference on Supercomputing (2012)

    Google Scholar 

  16. J. Stuecheli, D. Kaseridis, D. Daly, H. Hunter, L. John, The virtual write queue: coordinating DRAM and last-level cache policies, in Proceedings of the ISCA (2010)

    Google Scholar 

  17. Y. Hua, X. Liu, D. Feng, MERCURY: a scalable and similarity-aware scheme in multi-level cache hierarchy, in Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2012)

    Google Scholar 

  18. P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the STOC (1998)

    Google Scholar 

  19. A. Forin, B. Neekzad, N. Lynch, Giano: the two-headed system simulator, Technical Report MSR-TR-2006-130 (Microsoft Research, Redmond, 2006)

    Google Scholar 

  20. S. Biswas, D. Franklin, T. Sherwood, F. Chong, Conflict-avoidance in multicore caching for data-similar executions, in Proceedings of the ISPAN (2009)

    Google Scholar 

  21. PostgreSQL, http://www.postgresql.org/

  22. R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: minimizing cache conflicts in multi-core processors for databases. Proc. VLDB 2(1), 373–384 (2009)

    Article  Google Scholar 

  23. T.R.B. Bershad, D. Lee, B. Chen, Avoiding conflict misses dynamically in large direct-mapped caches, in Proceedings of the ASPLOS (1994)

    Google Scholar 

  24. Y. Yan, X. Zhang, Z. Zhang, Cacheminer: a runtime approach to exploit cache locality on smp. IEEE Trans. Parallel Distrib. Syst. 11(4), 357–374 (2000)

    Article  Google Scholar 

  25. K. Zhang, Z. Wang, Y. Chen, H. Zhu, X. Sun, Pac-plru: a cache replacement policy to salvage discarded predictions from hardware prefetchers, Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2011), pp. 265–274

    Google Scholar 

  26. G. Suh, S. Devadas, L. Rudolph, Analytical cache models with applications to cache partitioning, in Proceedings of the ACM ICS (2001)

    Google Scholar 

  27. The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/Covertype

  28. D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of the FAST (2003)

    Google Scholar 

  29. E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)

    Google Scholar 

  30. SPEC2000, http://www.spec.org/cpu2000/

  31. S. Carr, K. Kennedy, Compiler blockability of numerical algorithms, Proceedings of the Supercomputing Conference (1992)

    Google Scholar 

  32. E.E.R.M.S. Lam, M.E. Wolf, The cache performance and optimizations of blocked algorithms, in Proceedings of the ASPLOS (1991)

    Google Scholar 

  33. M.S.L.T.C. Mowry, A. Gupta, Design and evaluation of a compiler algorithm for prefetching, in Proceedings of the ASPLOS (1992)

    Google Scholar 

  34. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems, in Proceedings of the HPCA (2008)

    Google Scholar 

  35. Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the VLDB(2007), pp. 950–961

    Google Scholar 

  36. R. Shinde, A. Goel, P. Gupta, D. Dutta, Similarity search and locality sensitive hashing using ternary content addressable memories, in Proceedings of the SIGMOD (2010), pp. 375–386

    Google Scholar 

  37. A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, Proceedings of the ACM International Conference on Multimedia (2008)

    Google Scholar 

  38. G. Taylor, P. Davies, M. Farmwald, The TLB slice-a low-cost high-speed address translation mechanism, in Proceedings of the ISCA (1990)

    Google Scholar 

  39. A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  40. L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)

    Article  Google Scholar 

  41. B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  42. Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of the SIGMOD (2009)

    Google Scholar 

  43. Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the ICPP (2008), pp. 644–651

    Google Scholar 

  44. TPC, http://www.tpc.org/

  45. Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61(6), 817–830 (2012)

    Article  MathSciNet  Google Scholar 

  46. Z. Zhang, Z. Zhu, X. Zhang, Cached dram for ilp processor memory access latency reduction. IEEE Micro 21(4), 22–32 (2001)

    Article  Google Scholar 

  47. S. Byna, Y. Chen, X. Sun, R. Thakur, W. Gropp, Parallel I/O prefetching using MPI file caching and I/O signatures, in Proceedings of the SC (2008)

    Google Scholar 

  48. Z. Zhang, Z. Zhu, X. Zhang, Design and optimization of large size and low overhead off-chip caches. IEEE Trans. Comput. 53(7), 843–855 (2004)

    Article  Google Scholar 

  49. N. Hardavellas, M. Ferdman, B. Falsafi, A. Ailamaki, Near-optimal cache block placement with reactive nonuniform cache architectures. IEEE Micro 30(1), 20–28 (2010)

    Article  Google Scholar 

  50. J. Torrellas, A. Tucker, A. Gupta, Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary, in Proceedings of the ACM SIGMETRICS (1993)

    Google Scholar 

  51. H. Lee, S. Cho, B. Childers, Cloudcache: expanding and shrinking private caches, in Proceedings of the HPCA (2011), pp. 219–230

    Google Scholar 

  52. X. Zhang, S. Dwarkadas, K. Shen, Hardware execution throttling for multi-core resource management, in Proceedings of the USENIX Annual Technical Conference (2009)

    Google Scholar 

  53. A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez, Scavenger: a new last level cache architecture with global block priority, in Proceedings of the MICRO (2007), pp. 421–432

    Google Scholar 

  54. J. Chhugani, A. Nguyen, V. Lee, W. Macy, M. Hagog, Y. Chen, A. Baransi, S. Kumar, P. Dubey, Efficient implementation of sorting on multi-core SIMD CPU architecture, in Proceedings of the VLDB (2008)

    Google Scholar 

  55. S. Park, T. Kim, J. Park, J. Kim, H. Im, Parallel skyline computation on multicore architectures, in Proceedings of the ICDE (2009)

    Google Scholar 

  56. S. Das, S. Antony, D. Agrawal, A. El Abbadi, Thread cooperation in multicore architectures for frequency counting over multiple data streams, in Proceedings of the VLDB (2009)

    Google Scholar 

  57. J. Cieslewicz, K. Ross, Adaptive aggregation on chip multiprocessors, in Proceedings of the VLDB (2007)

    Google Scholar 

  58. L. Qiao, V. Raman, F. Reiss, P. Haas, G. Lohman, Main-memory scan sharing for multi-core CPUs, in Proceedings of the VLDB (2008)

    Google Scholar 

  59. W. Han, J. Lee, Dependency-aware reordering for parallelizing query optimization in multi-core CPUs, in Proceedings of the SIGMOD (2009)

    Google Scholar 

  60. S. Tatikonda, S. Parthasarathy, Mining tree-structured data on multicore systems, in Proceedings of the VLDB (2009)

    Google Scholar 

  61. C. Kim, T. Kaldewey, V. Lee, E. Sedlar, A. Nguyen, N. Satish, J. Chhugani, A. Di Blas, P. Dubey, Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs, in Proceedings of the VLDB (2009)

    Google Scholar 

  62. M. Kleanthous, Y. Sazeides, CATCH: a mechanism for dynamically detecting cache-content-duplication and its application to instruction caches, in Proceedings of the DATE (2008)

    Google Scholar 

  63. A. Alameldeen, D. Wood, Adaptive cache compression for high-performance processors, in Proceedings of the ISCA (2004)

    Google Scholar 

  64. J. Chang, G. Sohi, Cooperative caching for chip multiprocessors, in Proceedings of the ISCA (2006)

    Google Scholar 

  65. C. Kim, D. Burger, S. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in Proceedings of the ASPLOS (2002)

    Google Scholar 

  66. Z. Chishti, M. Powell, T. Vijaykumar, Distance associativity for high-performance energy-efficient non-uniform cache architectures, in Proceedings of the MICRO (2003)

    Google Scholar 

  67. R. Manikantan, K. Rajan, R. Govindarajan, Nucache: an efficient multicore cache organization based on next-use distance, in Proceedings of the HPCA (2011), pp. 243–253

    Google Scholar 

  68. N. Lakshminarayana, J. Lee, H. Kim, Age based scheduling for asymmetric multiprocessors, in Proceedings of the ACM/IEEE Supercomputing Conference (2009)

    Google Scholar 

  69. J. Zhou, J. Cieslewicz, K. Ross, M. Shah, Improving database performance on simultaneous multithreading processors, in Proceedings of the VLDB (2005)

    Google Scholar 

  70. S. Boyd-Wickizer, R. Morris, M.F. Kaashoek, Reinventing scheduling for multicore systems, in Proceedings of the HotOS (2009)

    Google Scholar 

  71. L. Shalev, J. Satran, E. Borovik, M. Ben-Yehuda, IsoStack: highly efficient network processing on dedicated cores, in Proceedings of the USENIX Annual Technical Conference (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hua, Y., Liu, X. (2019). Data Similarity-Aware Computation Infrastructure for the Cloud. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2721-6_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2720-9

  • Online ISBN: 978-981-13-2721-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics