Skip to main content

Colocation of Potential Parallelism in a Distributed Adaptive Run-Time System for Parallel Haskell

  • Conference paper
  • First Online:
Trends in Functional Programming (TFP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11457))

Included in the following conference series:

  • 423 Accesses

Abstract

This paper presents a novel variant of work stealing for load balancing in a distributed graph reducer, executing a semi-explicit parallel dialect of Haskell. The key concept of this load-balancer is colocating related sparks (potential parallelism) using maximum prefix matching on the encoding of the spark’s ancestry within the computation tree, reconstructed at run time, in spark selection decisions. We evaluate spark colocation in terms of performance and scalability on a set of five benchmarks on a Beowulf-class cluster of multi-core machines using up to 256 cores. In comparison to the baseline mechanism, we achieve speedup increase of up to 46% for three out of five applications, due to improved locality and load balance throughout the execution as demonstrated by profiling data. For one less scalable program and one program with excessive amounts of very fine-grained parallelism we observe drops in speedup by \(17\%\) and \(42\%\), respectively. Overall, spark colocation results in reduced mean time to fetch the required data and in higher degree of parallelism of finer granularity, which is most beneficial on higher PE numbers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Parallelism is exploited over pure functions and I/O is handled orthogonally by a separate thread.

  2. 2.

    This is reasonable as PE1 is the main PE and PE2 starts with no work.

  3. 3.

    Median is used as it is more robust to outliers.

  4. 4.

    http://mathworld.wolfram.com/TotientFunction.html.

  5. 5.

    http://mathworld.wolfram.com/WorpitzkysIdentity.html.

  6. 6.

    For other benchmarks SC consistently leads to more and smaller threads.

References

  1. Backus, J.: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. CACM 21(8), 613–641 (1978)

    Article  MathSciNet  Google Scholar 

  2. Belikov, E.: Language run-time systems: an overview. In: Proceedings of Imperial College Computing Student Workshop, OpenAccess Series in Informatics (OASIcs), vol. 49, pp. 3–12. Leibniz-Zentrum fuer Informatik (2015)

    Google Scholar 

  3. Belikov, E., Deligiannis, P., Totoo, P., Aljabri, M., Loidl, H.-W.: A survey of high-level parallel programming models. Technical report HW-MACS-TR-0103, Department of Computer Science, Heriot-Watt University, December 2013

    Google Scholar 

  4. Bevan, D.: An efficient reference counting solution to the distributed garbage collection problem. Parallel Comput. 9(2), 179–192 (1989)

    Article  Google Scholar 

  5. Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1995), pp. 207–216 (1995)

    Article  Google Scholar 

  6. Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pp. 187–194. ACM (1981)

    Google Scholar 

  7. Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM SIGPLAN Notices, vol. 40, pp. 519–538. ACM (2005)

    Google Scholar 

  8. Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the 17th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 21–28 (2005)

    Google Scholar 

  9. Church, A., Rosser, J.B.: Some properties of conversion. Trans. Am. Math. Soc. 39(3), 472–482 (1936)

    Article  MathSciNet  Google Scholar 

  10. Fluet, M., Rainey, M., Reppy, J., Shaw, A., Xiao, Y.: Manticore: a heterogeneous parallel language. In: Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming, pp. 37–44. ACM (2007)

    Google Scholar 

  11. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM: Parallel Virtual Machine: A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)

    Book  Google Scholar 

  12. Hammond, K.: Glasgow parallel Haskell (GpH). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 768–779. Springer, Heidelberg (2011). https://doi.org/10.1007/978-0-387-09766-4_46

    Chapter  Google Scholar 

  13. Hammond, K.: Why parallel functional programming matters: panel statement. In: Romanovsky, A., Vardanega, T. (eds.) Ada-Europe 2011. LNCS, vol. 6652, pp. 201–205. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21338-0_17

    Chapter  Google Scholar 

  14. Hu, Z., Hughes, J., Wang, M.: How functional programming mattered. Natl. Sci. Rev. 2(3), 349–370 (2015)

    Article  Google Scholar 

  15. Hudak, P., Hughes, J., Peyton Jones, S., Wadler, P.: A history of Haskell: being lazy with class. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, pp. 1–12. ACM (2007)

    Google Scholar 

  16. Hughes, J.: Why functional programming matters. Comp. J. 32(2), 98–107 (1989)

    Article  Google Scholar 

  17. Janjic, V., Hammond, K.: Granularity-aware work-stealing for computationally-uniform grids. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), pp. 123–134. IEEE (2010)

    Google Scholar 

  18. Kranz, D.A., Halstead Jr., R.H., Mohr, E.: Mul-T: a high-performance parallel Lisp. ACM SIGPLAN Not. 24, 81–90 (1989)

    Article  Google Scholar 

  19. Loidl, H.-W., Trinder, P., Butz, C.: Tuning task granularity and data locality of data parallel GpH programs. Parallel Process. Lett. 11(04), 471–486 (2001)

    Article  Google Scholar 

  20. Loogen, R., Ortega-Mallén, Y., Peña-Marí, R.: Parallel functional programming in Eden. J. Funct. Program. 15(3), 431–475 (2005)

    Article  Google Scholar 

  21. Marlow, S.: Parallel and Concurrent Programming in Haskell: Techniques for Multicore and Multithreaded Programming. O’Reilly, Sebastopol (2013)

    Google Scholar 

  22. Marlow, S., Maier, P., Loidl, H.-W., Aswad, M., Trinder, P.: Seq no more: better strategies for parallel Haskell. In: Proceedings of the 3rd ACM Symposium on Haskell, pp. 91–102 (2010)

    Google Scholar 

  23. Marlow, S., Peyton Jones, S.L., Singh, S.: Runtime support for multicore Haskell. ACM SIGPLAN Not. 44, 65–78 (2009)

    Article  Google Scholar 

  24. Marlow, S.: (Eds.) Haskell 2010 language report 2010. http://www.haskell.org/onlinereport/haskell2010

  25. Mohr, E., Kranz, D., Halstead Jr., R., et al.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)

    Article  Google Scholar 

  26. Partain, W.: The NoFib benchmark suite of Haskell programs. In: Launchbury, J., Sansom, P. (eds.) Functional Programming, Glasgow 1992, pp. 195–202. Springer, Heidelberg (1993). https://doi.org/10.1007/978-1-4471-3215-8_17

    Chapter  Google Scholar 

  27. Peyton Jones, S.L.: Parallel implementations of functional programming languages. Comput. J. 32(2), 175–186 (1989)

    Article  Google Scholar 

  28. Jones, S.L.P., Clack, C., Salkild, J.: High-performance parallel graph reduction. In: Odijk, E., Rem, M., Syre, J.-C. (eds.) PARLE 1989. LNCS, vol. 365, pp. 193–206. Springer, Heidelberg (1989). https://doi.org/10.1007/3540512845_40

    Chapter  Google Scholar 

  29. Totoo, P., Loidl, H.-W.: Lazy data-oriented evaluation strategies. In: Proceedings of 3rd ACM Workshop on Functional High-Performance Computing, pp. 63–74 (2014)

    Google Scholar 

  30. Trinder, P., Hammond, K., Loidl, H.-W., Peyton Jones, S.L.: Algorithm + strategy = parallelism. J. Funct. Program. 8(1), 23–60 (1998)

    Article  MathSciNet  Google Scholar 

  31. Trinder, P., Hammond, K., Mattson Jr., J., Partridge, A., Peyton Jones, S.: GUM: a portable parallel implementation of Haskell. In: Proceedings of PLDI, pp. 79–88 (1996)

    Google Scholar 

  32. Yang, J., He, Q.: Scheduling parallel computations by work stealing: a survey. Int. J. Parallel Prog. 46(2), 173–197 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the anonymous reviewers for comments that have substantially improved the presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evgenij Belikov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belikov, E., Loidl, HW., Michaelson, G. (2019). Colocation of Potential Parallelism in a Distributed Adaptive Run-Time System for Parallel Haskell. In: Pałka, M., Myreen, M. (eds) Trends in Functional Programming. TFP 2018. Lecture Notes in Computer Science(), vol 11457. Springer, Cham. https://doi.org/10.1007/978-3-030-18506-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18506-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18505-3

  • Online ISBN: 978-3-030-18506-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics