Abstract
We propose LightHouse, a GPU code-generator for a graph language named Green-Marl for which a multicore CPU backend already exists. This allows a user to seamlessly generate both the multicore as well as the GPU backends from the same specification of a graph algorithm. This restriction of not modifying the language poses several challenges as we work with an existing abstract syntax tree of the language, which is not tailored to GPUs. LightHouse overcomes these challenges with various optimizations such as reducing the number of atomics and collapsing loops. We illustrate its effectiveness by generating efficient CUDA codes for four graph analytic algorithms, and comparing performance against their multicore OpenMP versions generated by Green-Marl. In particular, our generated CUDA code performs comparable to 4 to 64-threaded OpenMP versions for different algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
LightHouse code is available at http://pace.cse.iitm.ac.in/tools.php.
References
Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In: ICPP 2006, pp. 523–530 (2006)
Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: SC 2011, pp. 65:1–65:12. ACM (2011)
Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on GPUs. In: IISWC 2012, pp. 141–151. IEEE Computer Society (2012)
Checconi, F., Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A.R., Sabharwal, Y.: Breaking the speed, scalability barriers for graph exploration on distributed-memory machines. In: SC 2012, pp. 13:1–13:12 (2012)
Gharaibeh, A., Costa, L.B., Santos-Neto, E., Ripeanu, M.: A yoke of oxen and a thousand chickens for heavy lifting graph processing. In: PACT 2012 (2012)
Hong, S., Chafi, H., Sedlar, E., Olukotun, K.: Green-Marl: a DSL for easy and efficient graph analysis. In: ASPLOS 2012, pp. 349–362 ACM (2012)
Jablin, T.B., Jablin, J.A., Prabhu, P., Liu, F., August, D.I.: Dynamically managed data for CPU-GPU architectures. In: CGO 2012. ACM (2012)
Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casçaval, C.: How much parallelism is there in irregular applications? In: PPoPP 2009, pp. 3–14 (2009)
Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News 36(1), 233–243 (2008)
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. PLDI 42(6), 211–222 (2007)
Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++, June 2014. http://snap.stanford.edu/snap
Madduri, K., Bader, D., Berry, J., Crobak, J.: An experimental study of a parallel shortest path algorithm for solving large-scale graph instances. In: ALENEX (2007)
Nasre, R., Burtscher, M., Pingali, K.: Morph algorithms on GPUs. In: PPoPP 2013. ACM (2013)
Pearce, R., Gokhale, M., Amato, N.M.: Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: SC 2010, pp. 1–11 (2010)
Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.-H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: PLDI 2011, pp. 12–25. ACM (2011)
Prountzos, D., Manevich, R., Pingali, K.: Elixir: a system for synthesizing concurrent graph programs. In: OOPSLA 2012, pp. 375–394. ACM (2012)
Prountzos, D., Manevich, R., Pingali, K.: Synthesizing parallel graph programs via automated planning. In: PLDI, pp. 533–544. ACM (2015)
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: PLDI 2013, pp. 519–530. ACM (2013)
Shun, J., Blelloch, G.E.: Ligra: A lightweight graph processing framework for shared memory. In: PPoPP, pp. 135–146. ACM (2013)
Venkat, A., Shantharam, M., Hall, M., Strout, M.M.: Non-affine extensions to polyhedral code generation. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation, Optimization, CGO 2014, pp. 185:185–185:194. ACM, New York (2014)
Xiao, S., Feng, W.: Inter-block GPU communication via fast barrier synchronization. In: IPDPS, pp. 1–12. IEEE (2010)
Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on blueGene/L. In: ICS, p. 25. IEEE Computer Society (2005)
Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for GPU computing. In: ASPLOS. ACM (2011)
Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Shashidhar, G., Nasre, R. (2017). LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-52709-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)