Skip to main content
Log in

A partitioning framework for Cassandra NoSQL database using Rendezvous hashing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Due to the gradual expansion in data volume used in social networks and cloud computing, the term “Big data” has appeared with its challenges to store the immense datasets. Many tools and algorithms appeared to handle the challenges of storing big data. NoSQL databases, such as Cassandra and MongoDB, are designed with a novel data management system that can handle and process huge volumes of data. Partitioning data in NoSQL databases is considered one of the critical challenges in database design. In this paper, a MapReduce Rendezvous Hashing-Based Virtual Hierarchies (MR-RHVH) framework is proposed for scalable partitioning of Cassandra NoSQL database. The MapReduce framework is used to implement MR-RHVH on Cassandra to enhance its performance in highly distributed environments. MR-RHVH distributes the nodes to rendezvous regions based on a proposed Adopted Virtual Hierarchies strategy. Each region is responsible for a set of nodes. In addition, a proposed bloom filter evaluator is used to ensure the accurate allocation of keys to nodes in each region. Moreover, a number of experiments were performed to evaluate the performance of MR-RHVH framework, using YCSB for database benchmarking. The results show high scalability rate and less time consuming for MR-RHVH framework over different recent systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516

    Article  Google Scholar 

  2. \(10-K\) Annual Report. SEC Filings. Facebook. 28 Jan 2016. Retrieved 26 Mar 2016

  3. Agrawal R, Ailamaki A, Bernstein PA, Brewer EA, Carey MJ, Chaudhuri S et al (2008) The Claremont report on database research. SIGMOD Rec 37(3):9–19

    Article  Google Scholar 

  4. Cruz F, Maia F, Matos M, Oliveira R, Paulo Ja, Pereira J, Vilaça R (2013) MeT: Workload aware elasticity for NoSQL. In: Proceeding EuroSys ’13 Proceedings of the 8th ACM European Conference on Computer Systems, New York, NY, USA, pp 183–196

  5. HBase Development Team (2013) HBase: BigTable-like structured storage for Hadoop HDFS [EB/OL]. http://wiki.apache.org/hadoop/Hbase/. Accessed 20 Mar 2013

  6. Chodorow K, Dirolf M (2010) MongoDB: the definitive guide, 1st edn, O’Reilly Media, p 216, ISBN 978-1-4493-8156-1

  7. Chang F, Dean J, Ghemawat S, Hsieh WC et al (2008) BigTable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) J 26(2):205–218

    Google Scholar 

  8. DeCandia G, Hastorun D, Jampani M et al (2007) Dynamo: Amazon’s highly available key-value storeC. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles, SOSP 2007, 205–220, Stevenson, Washington, USA, October 14–17

  9. Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. Oper Syst Rev 44(2):35–40

    Article  Google Scholar 

  10. Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world-wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, ’97, ACM, New York, NY, USA, pp 654–663

  11. Chen Z, Yang S, Tan S, Zhang G, Yang H (2013) Hybrid range consistent hash partitioning strategy—a new data partition strategy for NoSQL database. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2013, IEEE, pp 1161–1169

  12. Turk A, Selvitopi RO, Ferhatosmanoglu H, Aykanat C (2014) Temporal workload-aware replicated partitioning for social networks. IEEE Trans Knowl Data Eng 26(11):2832–2845

    Article  Google Scholar 

  13. Huang X, Wang J, Zhong Y, Song S, Yu PS (2015) Optimizing data partition for scaling out NoSQL cluster. Concur Comput: Pract Exp 27(18):5793–5809

    Article  Google Scholar 

  14. Schall D, Härder T (2015) Dynamic physiological partitioning on a shared-nothing database Cluster. In: IEEE 31st International Conference on Data Engineering (ICDE), 2015, IEEE, pp 1095–1106

  15. Yao Z, Ravishankar CV, Tripathi S (2001) Hash-based virtual hierarchies for caching in hybrid content-delivery networks. The University of California, Riverside, Department of Computer Science and Engineering, California

    Google Scholar 

  16. Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud computing, ACM, pp 143–154

  17. Abramova V, Bernardino J, Furtado P (2014) Testing cloud benchmark scalability with cassandra. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 434–441

  18. Srinivasan L, Varma V (2015) Adaptive load-balancing for consistent hashing in heterogeneous clusters. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015, IEEE, pp 1135–1138

  19. Wang X, Loguinov D (2007) Load-balancing performance of consistent hashing: asymptotic analysis of random node join. IEEE/ACM Trans Netw 15(4):892–905

    Article  Google Scholar 

  20. Dede E, Sendir B, Kuzlu P, Weachock J, Govindaraju M, Ramakrishnan L (2016) Processing Cassandra datasets with Hadoop-streaming based approaches. IEEE Trans Serv Comput 9(1):46–58

    Article  Google Scholar 

  21. Kuhlenkamp J, Klems M, Röss O (2014) Benchmarking scalability and elasticity of distributed database systems. Proc VLDB Endow 7(12):1219–1230

    Article  Google Scholar 

  22. Braam PJ et al (2004) The Lustre storage architecture. ftp://ftp.uniduisburg.de/pub/linux/filesys/Lustre/lustre.pdf, 2004

  23. Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. IEEE/ACM Trans Netw (TON) 6(1):1–14

    Article  Google Scholar 

  24. Seada K, Helmy A (2004) Rendezvous regions: a scalable architecture for service location and data-centric storage in large-scale wireless networks. In: Proceedings of the 18th International on Parallel and Distributed Processing Symposium, 2004, IEEE, p 218

  25. Kurihara Yuki (2015) Digest::MurmurHash. GitHub.com. Retrieved 18 Mar 2015

  26. Jenkins B (2012) SpookyHash: a 128-bit noncryptographic hash

  27. Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  MATH  Google Scholar 

  28. Li Zhe, Ross Kenneth A (1995) Perf join: an alternative to two-way semijoin and bloomjoin. In: CIKM ’95: Proceedings of the 4th International Conference on Information and Knowledge Management, pp 137–144, 1995

  29. Bringer J, Morel C, Rathgeb C (2015) Security analysis of bloom filter-based iris biometric template protection. In: International Conference on Biometrics (ICB), 2015, IEEE, pp 527–534

  30. DataStaX (2016) -https://datastax.github.io/python-driver/api/cassandra/policies.html-retrieved. Accessed Jan 4 2016

  31. VMware VSpher (2016). Server Virtualization with VMware vSphere | VMware India”. www.vmware.com. Retrieved 08 Mar 2016

  32. Xue R, Guan Z, Gao S, Ao L (2014) NM2H: Design and implementation of NoSQL extension for HDFS metadata management. In: IEEE 17th International Conference on Computational Science and Engineering (CSE), 2014, IEEE, pp 1282–1289

  33. Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 190–197

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sally M. Elghamrawy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elghamrawy, S.M., Hassanien, A.E. A partitioning framework for Cassandra NoSQL database using Rendezvous hashing. J Supercomput 73, 4444–4465 (2017). https://doi.org/10.1007/s11227-017-2027-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2027-5

Keywords

Navigation