Colocation of Potential Parallelism in a Distributed Adaptive Run-Time System for Parallel Haskell

Belikov, Evgenij; Loidl, Hans-Wolfgang; Michaelson, Greg

doi:10.1007/978-3-030-18506-0_1

Evgenij Belikov¹⁶,
Hans-Wolfgang Loidl¹⁶ &
Greg Michaelson¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11457))

Included in the following conference series:

International Symposium on Trends in Functional Programming

423 Accesses

Abstract

This paper presents a novel variant of work stealing for load balancing in a distributed graph reducer, executing a semi-explicit parallel dialect of Haskell. The key concept of this load-balancer is colocating related sparks (potential parallelism) using maximum prefix matching on the encoding of the spark’s ancestry within the computation tree, reconstructed at run time, in spark selection decisions. We evaluate spark colocation in terms of performance and scalability on a set of five benchmarks on a Beowulf-class cluster of multi-core machines using up to 256 cores. In comparison to the baseline mechanism, we achieve speedup increase of up to 46% for three out of five applications, due to improved locality and load balance throughout the execution as demonstrated by profiling data. For one less scalable program and one program with excessive amounts of very fine-grained parallelism we observe drops in speedup by \(17\%\) and \(42\%\), respectively. Overall, spark colocation results in reduced mean time to fetch the required data and in higher degree of parallelism of finer granularity, which is most beneficial on higher PE numbers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Parallelism is exploited over pure functions and I/O is handled orthogonally by a separate thread.
2.
This is reasonable as PE1 is the main PE and PE2 starts with no work.
3.
Median is used as it is more robust to outliers.
4.
http://mathworld.wolfram.com/TotientFunction.html.
5.
http://mathworld.wolfram.com/WorpitzkysIdentity.html.
6.
For other benchmarks SC consistently leads to more and smaller threads.

References

Backus, J.: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. CACM 21(8), 613–641 (1978)
Article MathSciNet Google Scholar
Belikov, E.: Language run-time systems: an overview. In: Proceedings of Imperial College Computing Student Workshop, OpenAccess Series in Informatics (OASIcs), vol. 49, pp. 3–12. Leibniz-Zentrum fuer Informatik (2015)
Google Scholar
Belikov, E., Deligiannis, P., Totoo, P., Aljabri, M., Loidl, H.-W.: A survey of high-level parallel programming models. Technical report HW-MACS-TR-0103, Department of Computer Science, Heriot-Watt University, December 2013
Google Scholar
Bevan, D.: An efficient reference counting solution to the distributed garbage collection problem. Parallel Comput. 9(2), 179–192 (1989)
Article Google Scholar
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP 1995), pp. 207–216 (1995)
Article Google Scholar
Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pp. 187–194. ACM (1981)
Google Scholar
Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM SIGPLAN Notices, vol. 40, pp. 519–538. ACM (2005)
Google Scholar
Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the 17th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 21–28 (2005)
Google Scholar
Church, A., Rosser, J.B.: Some properties of conversion. Trans. Am. Math. Soc. 39(3), 472–482 (1936)
Article MathSciNet Google Scholar
Fluet, M., Rainey, M., Reppy, J., Shaw, A., Xiao, Y.: Manticore: a heterogeneous parallel language. In: Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming, pp. 37–44. ACM (2007)
Google Scholar
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM: Parallel Virtual Machine: A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)
Book Google Scholar
Hammond, K.: Glasgow parallel Haskell (GpH). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 768–779. Springer, Heidelberg (2011). https://doi.org/10.1007/978-0-387-09766-4_46
Chapter Google Scholar
Hammond, K.: Why parallel functional programming matters: panel statement. In: Romanovsky, A., Vardanega, T. (eds.) Ada-Europe 2011. LNCS, vol. 6652, pp. 201–205. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21338-0_17
Chapter Google Scholar
Hu, Z., Hughes, J., Wang, M.: How functional programming mattered. Natl. Sci. Rev. 2(3), 349–370 (2015)
Article Google Scholar
Hudak, P., Hughes, J., Peyton Jones, S., Wadler, P.: A history of Haskell: being lazy with class. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, pp. 1–12. ACM (2007)
Google Scholar
Hughes, J.: Why functional programming matters. Comp. J. 32(2), 98–107 (1989)
Article Google Scholar
Janjic, V., Hammond, K.: Granularity-aware work-stealing for computationally-uniform grids. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), pp. 123–134. IEEE (2010)
Google Scholar
Kranz, D.A., Halstead Jr., R.H., Mohr, E.: Mul-T: a high-performance parallel Lisp. ACM SIGPLAN Not. 24, 81–90 (1989)
Article Google Scholar
Loidl, H.-W., Trinder, P., Butz, C.: Tuning task granularity and data locality of data parallel GpH programs. Parallel Process. Lett. 11(04), 471–486 (2001)
Article Google Scholar
Loogen, R., Ortega-Mallén, Y., Peña-Marí, R.: Parallel functional programming in Eden. J. Funct. Program. 15(3), 431–475 (2005)
Article Google Scholar
Marlow, S.: Parallel and Concurrent Programming in Haskell: Techniques for Multicore and Multithreaded Programming. O’Reilly, Sebastopol (2013)
Google Scholar
Marlow, S., Maier, P., Loidl, H.-W., Aswad, M., Trinder, P.: Seq no more: better strategies for parallel Haskell. In: Proceedings of the 3rd ACM Symposium on Haskell, pp. 91–102 (2010)
Google Scholar
Marlow, S., Peyton Jones, S.L., Singh, S.: Runtime support for multicore Haskell. ACM SIGPLAN Not. 44, 65–78 (2009)
Article Google Scholar
Marlow, S.: (Eds.) Haskell 2010 language report 2010. http://www.haskell.org/onlinereport/haskell2010
Mohr, E., Kranz, D., Halstead Jr., R., et al.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)
Article Google Scholar
Partain, W.: The NoFib benchmark suite of Haskell programs. In: Launchbury, J., Sansom, P. (eds.) Functional Programming, Glasgow 1992, pp. 195–202. Springer, Heidelberg (1993). https://doi.org/10.1007/978-1-4471-3215-8_17
Chapter Google Scholar
Peyton Jones, S.L.: Parallel implementations of functional programming languages. Comput. J. 32(2), 175–186 (1989)
Article Google Scholar
Jones, S.L.P., Clack, C., Salkild, J.: High-performance parallel graph reduction. In: Odijk, E., Rem, M., Syre, J.-C. (eds.) PARLE 1989. LNCS, vol. 365, pp. 193–206. Springer, Heidelberg (1989). https://doi.org/10.1007/3540512845_40
Chapter Google Scholar
Totoo, P., Loidl, H.-W.: Lazy data-oriented evaluation strategies. In: Proceedings of 3rd ACM Workshop on Functional High-Performance Computing, pp. 63–74 (2014)
Google Scholar
Trinder, P., Hammond, K., Loidl, H.-W., Peyton Jones, S.L.: Algorithm + strategy = parallelism. J. Funct. Program. 8(1), 23–60 (1998)
Article MathSciNet Google Scholar
Trinder, P., Hammond, K., Mattson Jr., J., Partridge, A., Peyton Jones, S.: GUM: a portable parallel implementation of Haskell. In: Proceedings of PLDI, pp. 79–88 (1996)
Google Scholar
Yang, J., He, Q.: Scheduling parallel computations by work stealing: a survey. Int. J. Parallel Prog. 46(2), 173–197 (2018)
Article Google Scholar

Download references

Acknowledgements

We are grateful to the anonymous reviewers for comments that have substantially improved the presentation of this paper.

Author information

Authors and Affiliations

School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, Scotland, UK
Evgenij Belikov, Hans-Wolfgang Loidl & Greg Michaelson

Authors

Evgenij Belikov
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Wolfgang Loidl
View author publications
You can also search for this author in PubMed Google Scholar
Greg Michaelson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evgenij Belikov .

Editor information

Editors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
Michał Pałka
Chalmers University of Technology, Gothenburg, Sweden
Magnus Myreen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belikov, E., Loidl, HW., Michaelson, G. (2019). Colocation of Potential Parallelism in a Distributed Adaptive Run-Time System for Parallel Haskell. In: Pałka, M., Myreen, M. (eds) Trends in Functional Programming. TFP 2018. Lecture Notes in Computer Science(), vol 11457. Springer, Cham. https://doi.org/10.1007/978-3-030-18506-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-18506-0_1
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18505-3
Online ISBN: 978-3-030-18506-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)