Skip to main content
Log in

Scalability of write-ahead logging on multicore and multisocket hardware

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The shift to multi-core and multi-socket hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important yet-to-be addressed potential bottlenecks, especially in OLTP workloads making frequent small changes to data. In this paper, we identify four logging-related impediments to database system scalability. Each issue challenges different level in the software architecture: (a) the high volume of small-sized I/O requests may saturate the disk, (b) transactions hold locks while waiting for the log flush, (c) extensive context switching overwhelms the OS scheduler with threads executing log I/Os, and (d) contention appears as transactions serialize accesses to in-memory log data structures. We demonstrate these problems and address them with techniques that, when combined, comprise a holistic, scalable approach to logging. Our solution achieves a 20–69% speedup over a modern database system when running log-intensive workloads, such as the TPC-B and TATP benchmarks, in a single-socket multiprocessor server. Moreover, it achieves log insert throughput over 2.2 GB/s for small log records on the single-socket server, roughly 20 times higher than the traditional way of accessing the log using a single mutex. Furthermore, we investigate techniques on scaling the performance of logging to multi-socket servers. We present a set of optimizations which partly ameliorate the latency penalty that comes with multi-socket hardware, and then we investigate the feasibility of applying a distributed log buffer design at the socket level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bouganim, L., Jónsson, B.T., Bonnet, P.: uFLIP: understanding flash IO patterns. In: CIDR’09: Fourth Biennial Conference on Innovative Data Systems Research, pp. 48–54. Asilomar, USA (2009)

  2. Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: USENIX Annual Technical Conference (2004)

  3. Carey, M.J., DeWitt, D.J., Franklin, M.J., Hall, N.E., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., White, S.J., Zwilling, M.J.: Shoring up persistent applications. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, Minneapolis, USA, pp. 383–394. ACM, New York (1994)

  4. Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: Proceedings of the 35th SIGMOD international conference on management of data, pp. 73–86. ACM, New York (2009)

  5. Daniels, D.S., Spector, A.Z., Thompson, D.S.: Distributed logging for transaction processing. In: Proceedings of the 1987 ACM SIGMOD international conference on management of data, San Francisco, CA, USA, pp. 82–96. ACM, New York (1987)

  6. Dewitt, D.J., Ghandeharizadeh, S., Schneider, D.A., Bricker, A., Hsiao, H.I., Rasmussen, R.: The Gamma database machine project. IEEE Trans. Knowl. Data Eng. 2(1), pp. 44–62. IEEE, Piscataway, NJ, USA (1990)

    Google Scholar 

  7. DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, Boston, MA, USA, pp. 1–8. ACM, New York (1984)

  8. Gawlick, D., Kinkade, D.: Varieties of concurrency control in IMS/VS fast path. IEEE Database Eng. Bull. 8(2), pp. 3–10. Washington, DC, USA (1985)

    Google Scholar 

  9. Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, Boston, Montreal, Quebec, Canada, pp. 173–182. ACM, New York (1996)

  10. Hardavellas, N., Pandis, I., Johnson, R.F., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR’07: Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, pp. 79–87 (2007)

  11. Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, Vancouver, Canada, pp. 981–992. ACM, New York (2008)

  12. Helland, P., Sammer, H., Lyon, J., Carr, R., Garrett, P., Reuter, A.: Group commit timers and high volume transaction systems. In: HPTS’87: 2nd International Workshop on High Performance Transaction Systems, Pacific Grove, CA, USA, pp. 301–329

  13. Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, Barcelona, Spain, pp. 206–215. ACM, New York (2004)

  14. Johnson, R., Pandis, I., Ailamaki, A.: Improving OLTP scalability using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)

    Google Scholar 

  15. Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, pp. 24–35. ACM, New York (2009)

  16. Johnson R.F., Pandis I., Stoica R., Athanassoulis M., Ailamaki A.: Aether: a scalable approach to logging. PVLDB 3(1–2), 681–692 (2010)

    Google Scholar 

  17. Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: extending shared-disk clusters with shared caches. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 683–686. Morgan Kaufmann Publishers Inc., San Francisco (2001)

  18. Lamport L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)

    Article  MATH  Google Scholar 

  19. Lee, S.W., Moon, B., Park, C., Kim, J.M., Kim, S.W.: A case for flash memory SSD in enterprise database applications. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, Boston, Vancouver, Canada, pp. 1075–1086. ACM, New York (2008)

  20. Lomet, D.: Recovery for shared disk systems using multiple redo logs. Technical report CRL-90-4, Digital Equipment Corporation, Cambridge Research Lab (1990)

  21. Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical report CRL-92-4, Digital Equipment Corporation, Cambridge Research Lab (1992)

  22. Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: Proceedings of the 16th International conference on very large data bases, pp. 392–405. Morgan Kaufmann Publishers Inc., San Francisco (1990)

  23. Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS 17(1), 94–162 (1992)

    Google Scholar 

  24. Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, Las Vegas, Nevada, USA, pp. 253–262. ACM, New York (2005)

  25. Neuvonen, S., Wolski, A., Manner, M., Raatikka, V.: Telecom application transaction processing benchmark (TATP). See http://tatpbenchmark.sourceforge.net/

  26. Oracle: Asynchronous commit: Oracle database advanced application developer’s guide. Available at http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_sqlproc.htm

  27. Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: PDSIA’99: International Workshop on parallel and distributed computing for symbolic and irregular applications, Sendai, Japan, pp. 182–204 (1999)

  28. Pandis, I., Johnson, R.F., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1–2), pp. 928–939 (2010)

    Google Scholar 

  29. Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. Technical report, EPFL (2011)

  30. PostgreSQL: Asynchronous commit: PostgreSQL 8.4.2 documentation. Available at http://www.postgresql.org/files/documentation/pdf/8.4/postgresql-8.4.2-A4.pdf

  31. Rafii, A., DuBois, D.: Performance tradeoffs of group commit logging. In: CMG Conference (1989)

  32. Scott, M.L.: Non-blocking timeout in scalable queue-based spin locks. In: Proceedings of the twenty-first annual symposium on principles of distributed computing, Monterey, California, pp. 31–40. ACM, New York (2002)

  33. Shavit, N., Touitou, D.: Elimination trees and the construction of pools and stacks: preliminary version. In: Proceedings of the seventh annual ACM symposium on parallel algorithms and architectures, SPAA’95, Santa Barbara, CA, USA, pp. 54–63. ACM, New York (1995)

  34. Soisalon-Soininen, E., Ylönen, T.: Partial strictness in two-phase locking. In: Proceedings of the 5th International Conference on Database Theory, pp. 139–147. Springer, London (1995)

  35. Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: Proceedings of the 33rd international conference on very large data bases, Vienna, Austria, pp. 1150–1160 (2007)

  36. Thomson A., Abadi D.J.: The case for determinism in database systems. PVLDB 3(1–2), 70–80 (2010)

    Google Scholar 

  37. TPC benchmark B standard specification, revision 2.0 (1994). Available at http://www.tpc.org/tpcb

  38. TPC benchmark C (OLTP) standard specification, revision 5.9 (2007). Available at http://www.tpc.org/tpcc

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan Johnson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, R., Pandis, I., Stoica, R. et al. Scalability of write-ahead logging on multicore and multisocket hardware. The VLDB Journal 21, 239–263 (2012). https://doi.org/10.1007/s00778-011-0260-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-011-0260-8

Keywords

Navigation