Data Similarity-Aware Computation Infrastructure for the Cloud

Hua, Yu; Liu, Xue

doi:10.1007/978-981-13-2721-6_7

Yu Hua³ &
Xue Liu⁴

435 Accesses

Abstract

The cloud is emerging for scalable and efficient cloud services. In order to meet the needs of handling massive data and decreasing data migration, the computation infrastructure requires efficient data placement and proper management for cached data. We propose an efficient and cost-effective multilevel caching scheme, called MERCURY, as computation infrastructure of the cloud. The idea behind MERCURY is to explore and exploit data similarity and support efficient data placement. In order to accurately and efficiently capture the data similarity, we leverage low-complexity Locality-Sensitive Hashing (LSH). In our design, in addition to the problem of space inefficiency, we identify that a conventional LSH scheme also suffers from the problem of homogeneous data placement. To address these two problems, we design a novel Multicore-enabled LSH (MC-LSH) that accurately captures the differentiated similarity across data. The similarity-aware MERCURY hence partitions data into L1 cache, L2 cache, and main memory based on their distinct localities, which help optimize cache utilization and minimize the pollution in the last-level cache. Besides extensive evaluation through simulations, we also implemented MERCURY in a system. Experimental results based on real-world applications and datasets demonstrate the efficiency and efficacy of our proposed schemes (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Y. Hua, X. Liu, D. Feng, Data similarity-aware computation infrastructure for the cloud. IEEE Trans. Comput. (TC) 63(1), 3–16 (2014)
Article MathSciNet Google Scholar
IDC iView, Extracting Value from Chaos (2011)
Google Scholar
Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)
Article Google Scholar
M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
Article Google Scholar
S. Bykov, A. Geller, G. Kliot, J. Larus, R. Pandya, J. Thelin, Orleans: cloud computing for everyone, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)
Google Scholar
S. Wu, F. Li, S. Mehrotra, B. Ooi, Query optimization for massively parallel data processing, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)
Google Scholar
L. Soares, D. Tam, M. Stumm, Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer, in Proceedings of the MICRO (2009), pp. 258–269
Google Scholar
S. Biswas, D. Franklin, A. Savage, R. Dixon, T. Sherwood, F. Chong, Multi-execution: multicore caching for data-similar executions, in Proceedings of the ISCA (2009)
Google Scholar
M. Chaudhuri, Pagenuca: selected policies for page-grain locality management in large shared chip-multiprocessor caches, in Proceedings of the HPCA (2009), pp. 227–238
Google Scholar
S. Srikantaiah, R. Das, A.K. Mishra, C.R. Das, M. Kandemir, A case for integrated processor-cache partitioning in chip multiprocessors, in Proceedings of the SC (2009)
Google Scholar
X. Ding, K. Wang, X. Zhang, SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores, in Proceedings of the EuroSys (2011)
Google Scholar
Y. Chen, S. Byna, X. Sun, Data access history cache and associated data prefetching mechanisms, in Proceedings of the SC (2007)
Google Scholar
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Enabling software management for multicore caches with a lightweight hardware support, in Proceedings of the SC (2009)
Google Scholar
D. Zhan, H. Jiang, S.C. Seth, STEM: spatiotemporal management of capacity for intra-core last level caches, in Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)
Google Scholar
D. Zhan, H. Jiang, S.C. Seth, Locality & utility co-optimization for practical capacity management of shared last level caches, in Proceedings of the ACM International Conference on Supercomputing (2012)
Google Scholar
J. Stuecheli, D. Kaseridis, D. Daly, H. Hunter, L. John, The virtual write queue: coordinating DRAM and last-level cache policies, in Proceedings of the ISCA (2010)
Google Scholar
Y. Hua, X. Liu, D. Feng, MERCURY: a scalable and similarity-aware scheme in multi-level cache hierarchy, in Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2012)
Google Scholar
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the STOC (1998)
Google Scholar
A. Forin, B. Neekzad, N. Lynch, Giano: the two-headed system simulator, Technical Report MSR-TR-2006-130 (Microsoft Research, Redmond, 2006)
Google Scholar
S. Biswas, D. Franklin, T. Sherwood, F. Chong, Conflict-avoidance in multicore caching for data-similar executions, in Proceedings of the ISPAN (2009)
Google Scholar
PostgreSQL, http://www.postgresql.org/
R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: minimizing cache conflicts in multi-core processors for databases. Proc. VLDB 2(1), 373–384 (2009)
Article Google Scholar
T.R.B. Bershad, D. Lee, B. Chen, Avoiding conflict misses dynamically in large direct-mapped caches, in Proceedings of the ASPLOS (1994)
Google Scholar
Y. Yan, X. Zhang, Z. Zhang, Cacheminer: a runtime approach to exploit cache locality on smp. IEEE Trans. Parallel Distrib. Syst. 11(4), 357–374 (2000)
Article Google Scholar
K. Zhang, Z. Wang, Y. Chen, H. Zhu, X. Sun, Pac-plru: a cache replacement policy to salvage discarded predictions from hardware prefetchers, Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2011), pp. 265–274
Google Scholar
G. Suh, S. Devadas, L. Rudolph, Analytical cache models with applications to cache partitioning, in Proceedings of the ACM ICS (2001)
Google Scholar
The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/Covertype
D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of the FAST (2003)
Google Scholar
E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)
Google Scholar
SPEC2000, http://www.spec.org/cpu2000/
S. Carr, K. Kennedy, Compiler blockability of numerical algorithms, Proceedings of the Supercomputing Conference (1992)
Google Scholar
E.E.R.M.S. Lam, M.E. Wolf, The cache performance and optimizations of blocked algorithms, in Proceedings of the ASPLOS (1991)
Google Scholar
M.S.L.T.C. Mowry, A. Gupta, Design and evaluation of a compiler algorithm for prefetching, in Proceedings of the ASPLOS (1992)
Google Scholar
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems, in Proceedings of the HPCA (2008)
Google Scholar
Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the VLDB(2007), pp. 950–961
Google Scholar
R. Shinde, A. Goel, P. Gupta, D. Dutta, Similarity search and locality sensitive hashing using ternary content addressable memories, in Proceedings of the SIGMOD (2010), pp. 375–386
Google Scholar
A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, Proceedings of the ACM International Conference on Multimedia (2008)
Google Scholar
G. Taylor, P. Davies, M. Farmwald, The TLB slice-a low-cost high-speed address translation mechanism, in Proceedings of the ISCA (1990)
Google Scholar
A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)
Article Google Scholar
B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article Google Scholar
Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of the SIGMOD (2009)
Google Scholar
Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the ICPP (2008), pp. 644–651
Google Scholar
TPC, http://www.tpc.org/
Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61(6), 817–830 (2012)
Article MathSciNet Google Scholar
Z. Zhang, Z. Zhu, X. Zhang, Cached dram for ilp processor memory access latency reduction. IEEE Micro 21(4), 22–32 (2001)
Article Google Scholar
S. Byna, Y. Chen, X. Sun, R. Thakur, W. Gropp, Parallel I/O prefetching using MPI file caching and I/O signatures, in Proceedings of the SC (2008)
Google Scholar
Z. Zhang, Z. Zhu, X. Zhang, Design and optimization of large size and low overhead off-chip caches. IEEE Trans. Comput. 53(7), 843–855 (2004)
Article Google Scholar
N. Hardavellas, M. Ferdman, B. Falsafi, A. Ailamaki, Near-optimal cache block placement with reactive nonuniform cache architectures. IEEE Micro 30(1), 20–28 (2010)
Article Google Scholar
J. Torrellas, A. Tucker, A. Gupta, Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary, in Proceedings of the ACM SIGMETRICS (1993)
Google Scholar
H. Lee, S. Cho, B. Childers, Cloudcache: expanding and shrinking private caches, in Proceedings of the HPCA (2011), pp. 219–230
Google Scholar
X. Zhang, S. Dwarkadas, K. Shen, Hardware execution throttling for multi-core resource management, in Proceedings of the USENIX Annual Technical Conference (2009)
Google Scholar
A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez, Scavenger: a new last level cache architecture with global block priority, in Proceedings of the MICRO (2007), pp. 421–432
Google Scholar
J. Chhugani, A. Nguyen, V. Lee, W. Macy, M. Hagog, Y. Chen, A. Baransi, S. Kumar, P. Dubey, Efficient implementation of sorting on multi-core SIMD CPU architecture, in Proceedings of the VLDB (2008)
Google Scholar
S. Park, T. Kim, J. Park, J. Kim, H. Im, Parallel skyline computation on multicore architectures, in Proceedings of the ICDE (2009)
Google Scholar
S. Das, S. Antony, D. Agrawal, A. El Abbadi, Thread cooperation in multicore architectures for frequency counting over multiple data streams, in Proceedings of the VLDB (2009)
Google Scholar
J. Cieslewicz, K. Ross, Adaptive aggregation on chip multiprocessors, in Proceedings of the VLDB (2007)
Google Scholar
L. Qiao, V. Raman, F. Reiss, P. Haas, G. Lohman, Main-memory scan sharing for multi-core CPUs, in Proceedings of the VLDB (2008)
Google Scholar
W. Han, J. Lee, Dependency-aware reordering for parallelizing query optimization in multi-core CPUs, in Proceedings of the SIGMOD (2009)
Google Scholar
S. Tatikonda, S. Parthasarathy, Mining tree-structured data on multicore systems, in Proceedings of the VLDB (2009)
Google Scholar
C. Kim, T. Kaldewey, V. Lee, E. Sedlar, A. Nguyen, N. Satish, J. Chhugani, A. Di Blas, P. Dubey, Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs, in Proceedings of the VLDB (2009)
Google Scholar
M. Kleanthous, Y. Sazeides, CATCH: a mechanism for dynamically detecting cache-content-duplication and its application to instruction caches, in Proceedings of the DATE (2008)
Google Scholar
A. Alameldeen, D. Wood, Adaptive cache compression for high-performance processors, in Proceedings of the ISCA (2004)
Google Scholar
J. Chang, G. Sohi, Cooperative caching for chip multiprocessors, in Proceedings of the ISCA (2006)
Google Scholar
C. Kim, D. Burger, S. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in Proceedings of the ASPLOS (2002)
Google Scholar
Z. Chishti, M. Powell, T. Vijaykumar, Distance associativity for high-performance energy-efficient non-uniform cache architectures, in Proceedings of the MICRO (2003)
Google Scholar
R. Manikantan, K. Rajan, R. Govindarajan, Nucache: an efficient multicore cache organization based on next-use distance, in Proceedings of the HPCA (2011), pp. 243–253
Google Scholar
N. Lakshminarayana, J. Lee, H. Kim, Age based scheduling for asymmetric multiprocessors, in Proceedings of the ACM/IEEE Supercomputing Conference (2009)
Google Scholar
J. Zhou, J. Cieslewicz, K. Ross, M. Shah, Improving database performance on simultaneous multithreading processors, in Proceedings of the VLDB (2005)
Google Scholar
S. Boyd-Wickizer, R. Morris, M.F. Kaashoek, Reinventing scheduling for multicore systems, in Proceedings of the HotOS (2009)
Google Scholar
L. Shalev, J. Satran, E. Borovik, M. Ben-Yehuda, IsoStack: highly efficient network processing on dedicated cores, in Proceedings of the USENIX Annual Technical Conference (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, Hubei, China
Yu Hua
McGill University, Montreal, QC, Canada
Xue Liu

Authors

Yu Hua
View author publications
You can also search for this author in PubMed Google Scholar
Xue Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hua, Y., Liu, X. (2019). Data Similarity-Aware Computation Infrastructure for the Cloud. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_7

Download citation

DOI: https://doi.org/10.1007/978-981-13-2721-6_7
Published: 09 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2720-9
Online ISBN: 978-981-13-2721-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics