Abstract
The cloud is emerging for scalable and efficient cloud services. In order to meet the needs of handling massive data and decreasing data migration, the computation infrastructure requires efficient data placement and proper management for cached data. We propose an efficient and cost-effective multilevel caching scheme, called MERCURY, as computation infrastructure of the cloud. The idea behind MERCURY is to explore and exploit data similarity and support efficient data placement. In order to accurately and efficiently capture the data similarity, we leverage low-complexity Locality-Sensitive Hashing (LSH). In our design, in addition to the problem of space inefficiency, we identify that a conventional LSH scheme also suffers from the problem of homogeneous data placement. To address these two problems, we design a novel Multicore-enabled LSH (MC-LSH) that accurately captures the differentiated similarity across data. The similarity-aware MERCURY hence partitions data into L1 cache, L2 cache, and main memory based on their distinct localities, which help optimize cache utilization and minimize the pollution in the last-level cache. Besides extensive evaluation through simulations, we also implemented MERCURY in a system. Experimental results based on real-world applications and datasets demonstrate the efficiency and efficacy of our proposed schemes (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Y. Hua, X. Liu, D. Feng, Data similarity-aware computation infrastructure for the cloud. IEEE Trans. Comput. (TC) 63(1), 3–16 (2014)
IDC iView, Extracting Value from Chaos (2011)
Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)
M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
S. Bykov, A. Geller, G. Kliot, J. Larus, R. Pandya, J. Thelin, Orleans: cloud computing for everyone, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)
S. Wu, F. Li, S. Mehrotra, B. Ooi, Query optimization for massively parallel data processing, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)
L. Soares, D. Tam, M. Stumm, Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer, in Proceedings of the MICRO (2009), pp. 258–269
S. Biswas, D. Franklin, A. Savage, R. Dixon, T. Sherwood, F. Chong, Multi-execution: multicore caching for data-similar executions, in Proceedings of the ISCA (2009)
M. Chaudhuri, Pagenuca: selected policies for page-grain locality management in large shared chip-multiprocessor caches, in Proceedings of the HPCA (2009), pp. 227–238
S. Srikantaiah, R. Das, A.K. Mishra, C.R. Das, M. Kandemir, A case for integrated processor-cache partitioning in chip multiprocessors, in Proceedings of the SC (2009)
X. Ding, K. Wang, X. Zhang, SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores, in Proceedings of the EuroSys (2011)
Y. Chen, S. Byna, X. Sun, Data access history cache and associated data prefetching mechanisms, in Proceedings of the SC (2007)
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Enabling software management for multicore caches with a lightweight hardware support, in Proceedings of the SC (2009)
D. Zhan, H. Jiang, S.C. Seth, STEM: spatiotemporal management of capacity for intra-core last level caches, in Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)
D. Zhan, H. Jiang, S.C. Seth, Locality & utility co-optimization for practical capacity management of shared last level caches, in Proceedings of the ACM International Conference on Supercomputing (2012)
J. Stuecheli, D. Kaseridis, D. Daly, H. Hunter, L. John, The virtual write queue: coordinating DRAM and last-level cache policies, in Proceedings of the ISCA (2010)
Y. Hua, X. Liu, D. Feng, MERCURY: a scalable and similarity-aware scheme in multi-level cache hierarchy, in Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2012)
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the STOC (1998)
A. Forin, B. Neekzad, N. Lynch, Giano: the two-headed system simulator, Technical Report MSR-TR-2006-130 (Microsoft Research, Redmond, 2006)
S. Biswas, D. Franklin, T. Sherwood, F. Chong, Conflict-avoidance in multicore caching for data-similar executions, in Proceedings of the ISPAN (2009)
PostgreSQL, http://www.postgresql.org/
R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: minimizing cache conflicts in multi-core processors for databases. Proc. VLDB 2(1), 373–384 (2009)
T.R.B. Bershad, D. Lee, B. Chen, Avoiding conflict misses dynamically in large direct-mapped caches, in Proceedings of the ASPLOS (1994)
Y. Yan, X. Zhang, Z. Zhang, Cacheminer: a runtime approach to exploit cache locality on smp. IEEE Trans. Parallel Distrib. Syst. 11(4), 357–374 (2000)
K. Zhang, Z. Wang, Y. Chen, H. Zhu, X. Sun, Pac-plru: a cache replacement policy to salvage discarded predictions from hardware prefetchers, Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2011), pp. 265–274
G. Suh, S. Devadas, L. Rudolph, Analytical cache models with applications to cache partitioning, in Proceedings of the ACM ICS (2001)
The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/Covertype
D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of the FAST (2003)
E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)
SPEC2000, http://www.spec.org/cpu2000/
S. Carr, K. Kennedy, Compiler blockability of numerical algorithms, Proceedings of the Supercomputing Conference (1992)
E.E.R.M.S. Lam, M.E. Wolf, The cache performance and optimizations of blocked algorithms, in Proceedings of the ASPLOS (1991)
M.S.L.T.C. Mowry, A. Gupta, Design and evaluation of a compiler algorithm for prefetching, in Proceedings of the ASPLOS (1992)
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems, in Proceedings of the HPCA (2008)
Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the VLDB(2007), pp. 950–961
R. Shinde, A. Goel, P. Gupta, D. Dutta, Similarity search and locality sensitive hashing using ternary content addressable memories, in Proceedings of the SIGMOD (2010), pp. 375–386
A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, Proceedings of the ACM International Conference on Multimedia (2008)
G. Taylor, P. Davies, M. Farmwald, The TLB slice-a low-cost high-speed address translation mechanism, in Proceedings of the ISCA (1990)
A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)
B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of the SIGMOD (2009)
Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the ICPP (2008), pp. 644–651
TPC, http://www.tpc.org/
Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61(6), 817–830 (2012)
Z. Zhang, Z. Zhu, X. Zhang, Cached dram for ilp processor memory access latency reduction. IEEE Micro 21(4), 22–32 (2001)
S. Byna, Y. Chen, X. Sun, R. Thakur, W. Gropp, Parallel I/O prefetching using MPI file caching and I/O signatures, in Proceedings of the SC (2008)
Z. Zhang, Z. Zhu, X. Zhang, Design and optimization of large size and low overhead off-chip caches. IEEE Trans. Comput. 53(7), 843–855 (2004)
N. Hardavellas, M. Ferdman, B. Falsafi, A. Ailamaki, Near-optimal cache block placement with reactive nonuniform cache architectures. IEEE Micro 30(1), 20–28 (2010)
J. Torrellas, A. Tucker, A. Gupta, Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary, in Proceedings of the ACM SIGMETRICS (1993)
H. Lee, S. Cho, B. Childers, Cloudcache: expanding and shrinking private caches, in Proceedings of the HPCA (2011), pp. 219–230
X. Zhang, S. Dwarkadas, K. Shen, Hardware execution throttling for multi-core resource management, in Proceedings of the USENIX Annual Technical Conference (2009)
A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez, Scavenger: a new last level cache architecture with global block priority, in Proceedings of the MICRO (2007), pp. 421–432
J. Chhugani, A. Nguyen, V. Lee, W. Macy, M. Hagog, Y. Chen, A. Baransi, S. Kumar, P. Dubey, Efficient implementation of sorting on multi-core SIMD CPU architecture, in Proceedings of the VLDB (2008)
S. Park, T. Kim, J. Park, J. Kim, H. Im, Parallel skyline computation on multicore architectures, in Proceedings of the ICDE (2009)
S. Das, S. Antony, D. Agrawal, A. El Abbadi, Thread cooperation in multicore architectures for frequency counting over multiple data streams, in Proceedings of the VLDB (2009)
J. Cieslewicz, K. Ross, Adaptive aggregation on chip multiprocessors, in Proceedings of the VLDB (2007)
L. Qiao, V. Raman, F. Reiss, P. Haas, G. Lohman, Main-memory scan sharing for multi-core CPUs, in Proceedings of the VLDB (2008)
W. Han, J. Lee, Dependency-aware reordering for parallelizing query optimization in multi-core CPUs, in Proceedings of the SIGMOD (2009)
S. Tatikonda, S. Parthasarathy, Mining tree-structured data on multicore systems, in Proceedings of the VLDB (2009)
C. Kim, T. Kaldewey, V. Lee, E. Sedlar, A. Nguyen, N. Satish, J. Chhugani, A. Di Blas, P. Dubey, Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs, in Proceedings of the VLDB (2009)
M. Kleanthous, Y. Sazeides, CATCH: a mechanism for dynamically detecting cache-content-duplication and its application to instruction caches, in Proceedings of the DATE (2008)
A. Alameldeen, D. Wood, Adaptive cache compression for high-performance processors, in Proceedings of the ISCA (2004)
J. Chang, G. Sohi, Cooperative caching for chip multiprocessors, in Proceedings of the ISCA (2006)
C. Kim, D. Burger, S. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in Proceedings of the ASPLOS (2002)
Z. Chishti, M. Powell, T. Vijaykumar, Distance associativity for high-performance energy-efficient non-uniform cache architectures, in Proceedings of the MICRO (2003)
R. Manikantan, K. Rajan, R. Govindarajan, Nucache: an efficient multicore cache organization based on next-use distance, in Proceedings of the HPCA (2011), pp. 243–253
N. Lakshminarayana, J. Lee, H. Kim, Age based scheduling for asymmetric multiprocessors, in Proceedings of the ACM/IEEE Supercomputing Conference (2009)
J. Zhou, J. Cieslewicz, K. Ross, M. Shah, Improving database performance on simultaneous multithreading processors, in Proceedings of the VLDB (2005)
S. Boyd-Wickizer, R. Morris, M.F. Kaashoek, Reinventing scheduling for multicore systems, in Proceedings of the HotOS (2009)
L. Shalev, J. Satran, E. Borovik, M. Ben-Yehuda, IsoStack: highly efficient network processing on dedicated cores, in Proceedings of the USENIX Annual Technical Conference (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hua, Y., Liu, X. (2019). Data Similarity-Aware Computation Infrastructure for the Cloud. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_7
Download citation
DOI: https://doi.org/10.1007/978-981-13-2721-6_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2720-9
Online ISBN: 978-981-13-2721-6
eBook Packages: Computer ScienceComputer Science (R0)