Skip to main content

Locality-Aware GC Optimisations for Big Data Workloads

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems. OTM 2017 Conferences (OTM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10574))

Abstract

Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help processing massive amounts of data. In addition, managed runtimes (e.g. JVM) are now widely used to support the execution of these NoSQL storage solutions, particularly when dealing with Big Data key-value store-driven applications. The benefits of such runtimes can however be limited by automatic memory management, i.e., Garbage Collection (GC), which does not consider object locality, resulting in objects that point to each other being dispersed in memory. In the long run this may break the service-level of applications due to extra page faults and degradation of locality on system-level memory caches. We propose, LAG1 (short for Locality-Aware G1), an extension of modern heap layouts to promote locality between groups of related objects. This is done with no previous application profiling and in a way that is transparent to the programmer, without requiring changes to existing code. The heap layout and algorithmic extensions are implemented on top of the Garbage First (G1) garbage collector (the new by-default collector) of the HotSpot JVM. Using the YCSB benchmarking tool to benchmark HBase, a well-known and widely used Big Data application, we show negligible overhead in frequent operations such as the allocation of new objects, and significant improvements when accessing data, supported by higher hits in system-level memory structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The first level of the CPU data cache.

  2. 2.

    The data Translation-Lookaside-Buffer.

  3. 3.

    The mechanism used in HotSpot to create Stop-the-World pauses. Garbage collection cycles run inside a safepoint, during which all application threads are stopped.

  4. 4.

    L1 is the 1st level of CPU cache: 32 KB in size and 64 B per line in modern models.

  5. 5.

    L2 is the 2nd level of CPU cache: 256 KB in size and 64 B per line in modern models.

  6. 6.

    http://linux.die.net/man/1/perf.

  7. 7.

    A page-walk consists on querying page-table entries, to see if the address the CPU is trying to load is present in physical memory.

References

  1. http://hbase.apache.org/. Visited 16 Feb 2017

  2. http://openjdk.java.net/. Visited 16 Feb 2017

  3. http://www.oracle.com/technetwork/database/database-technologies/nosqldb/overview/index.html. Visited 16 Feb 2017

  4. Bruno, R., Oliveira, L.P., Ferreira, P.: NG2C: pretenuring garbage collection with dynamic generations for hotspot big data applications. In: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, ISMM 2017, NY, USA, pp. 2–13 (2017), http://doi.acm.org/10.1145/3092255.3092272

  5. Bu, Y., Borkar, V., Xu, G., Carey, M.J.: A bloat-aware design for big data applications. In: Proceedings of the 2013 International Symposium on Memory Management, ISMM 2013, pp. 119–130. ACM (2013)

    Google Scholar 

  6. Chen, W.K., Bhansali, S., Chilimbi, T., Gao, X., Chuang, W.: Profile-guided proactive garbage collection for locality optimization. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 332–340. ACM (2006)

    Google Scholar 

  7. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)

    Google Scholar 

  8. Detlefs, D., Flood, C., Heller, S., Printezis, T.: Garbage-first garbage collection. In: Proceedings of the 4th International Symposium on Memory Management, ISMM 2004, NY, USA, pp. 37–48 (2004), http://doi.acm.org/10.1145/1029873.1029879

  9. Gidra, L., Thomas, G., Sopena, J., Shapiro, M.: A study of the scalability of stop-the-world garbage collectors on multicores. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013, pp. 229–240. ACM (2013)

    Google Scholar 

  10. Gidra, L., Thomas, G., Sopena, J., Shapiro, M., Nguyen, N.: Numagic: a garbage collector for big data on big NUMA machines. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 661–673. ACM (2015)

    Google Scholar 

  11. Huang, X., Blackburn, S.M., McKinley, K.S., Moss, J.E.B., Wang, Z., Cheng, P.: The garbage collection advantage. In: Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2004, New York, USA, p. 69. ACM, New York (2004)

    Google Scholar 

  12. Ilham, A.A., Murakami, K.: Evaluation and optimization of java object ordering schemes. In: 2011 International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2011)

    Google Scholar 

  13. Jones, R., Hosking, A., Moss, J.E.B.: The Garbage Collection Handbook: The Art of Automatic Memory Management, 1st edn. Chapman & Hall/CRC (2011)

    Google Scholar 

  14. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  15. Maas, M., Asanović, K., Harris, T., Kubiatowicz, J.: Taurus: a holistic language runtime system for coordinating distributed managed-language applications. In: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016, NY, USA, pp. 457–471. ACM, New York (2016)

    Google Scholar 

  16. Moon, D.A.: Garbage collection in a large lisp system. In: Proceedings of the 1984 ACM Symposium on LISP and Functional Programming, NY, USA, pp. 235–246. ACM, New York (1984)

    Google Scholar 

  17. Nguyen, K., Wang, K., Bu, Y., Fang, L., Hu, J., Xu, G.H.: FACADE: a compiler and runtime for (almost) object-bounded big data applications. In: ASPLOS, pp. 675–690. ACM (2015)

    Google Scholar 

  18. Pina, L., Veiga, L., Hicks, M.W.: Rubah: DSU for java on a stock JVM. In: Black, A.P., Millstein, T.D. (eds.) Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, Part of SPLASH 2014, Portland, OR, USA, 20–24 October, 2014, pp. 103–119. ACM (2014), http://doi.acm.org/10.1145/2660193.2660220

  19. Redmond, E., Wilson, J.R.: Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement. Pragmatic Bookshelf (2012)

    Google Scholar 

  20. Silva, J.M., Simão, J., Veiga, L.: Ditto – deterministic execution replayability-as-a-service for Java VM on multiprocessors. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 405–424. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45065-5_21

    Chapter  Google Scholar 

  21. Simão, J., Garrochinho, T., Veiga, L.: A checkpointing-enabled and resource-aware java virtual machine for efficient and robust e-science applications in grid environments. Concurrency Comput. Pract. Exp. 24(13), 1421–1442 (2012), https://doi.org/10.1002/cpe.1879

  22. Singer, J., Brown, G., Watson, I., Cavazos, J.: Intelligent selection of application-specific garbage collectors. In: Proceedings of the 6th International Symposium on Memory Management, pp. 91–102. ACM (2007)

    Google Scholar 

  23. Soman, S., Krintz, C.: Application-specific garbage collection. J. Syst. Softw. 80, 1037–1056 (2007), http://dx.doi.org/10.1016/j.jss.2006.12.566

  24. Tay, Y.C., Zong, X., He, X.: An equation-based heap sizing rule. Perform. Eval. 70(11), 948–964 (2013)

    Article  Google Scholar 

  25. Ungar, D.: Generation scavenging: a non-disruptive high performance storage reclamation algorithm. ACM Sigplan Not. 19(5), 157–167 (1984)

    Article  Google Scholar 

  26. Veiga, L., Ferreira, P.: Incremental replication for mobility support in OBIWAN. In: ICDCS, pp. 249–256 (2002), https://doi.org/10.1109/ICDCS.2002.1022262

  27. Veiga, L., Ferreira, P.: Poliper: policies for mobile and pervasive environments. In: Kon, F., Costa, F.M., Wang, N., Cerqueira, R. (eds.) Proceedings of the 3rd Workshop on Adaptive and Reflective Middleware, ARM 2003, Toronto, Ontario, Canada, 19 October 2004, pp. 238–243. ACM (2004), http://doi.acm.org/10.1145/1028613.1028623

  28. Wilson, P.R., Lam, M.S., Moher, T.G.: Effective static-graph reorganization to improve locality in garbage-collected systems. SIGPLAN Not. 26(6), 177–191 (1991)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e a Tecnologia with reference PTDC/EEI-SCR/6945/2014, and by the ERDF through COMPETE 2020 Programme, within project POCI-01-0145-FEDER-016883. This work was partially supported by Instituto Superior de Engenharia de Lisboa and Instituto Politécnico de Lisboa. This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luís Veiga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Patrício, D., Bruno, R., Simão, J., Ferreira, P., Veiga, L. (2017). Locality-Aware GC Optimisations for Big Data Workloads. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10574. Springer, Cham. https://doi.org/10.1007/978-3-319-69459-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69459-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69458-0

  • Online ISBN: 978-3-319-69459-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics