LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs

Shashidhar, G.; Nasre, Rupesh

doi:10.1007/978-3-319-52709-3_18

G. Shashidhar¹⁶ &
Rupesh Nasre¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

1022 Accesses
1 Citations

Abstract

We propose LightHouse, a GPU code-generator for a graph language named Green-Marl for which a multicore CPU backend already exists. This allows a user to seamlessly generate both the multicore as well as the GPU backends from the same specification of a graph algorithm. This restriction of not modifying the language poses several challenges as we work with an existing abstract syntax tree of the language, which is not tailored to GPUs. LightHouse overcomes these challenges with various optimizations such as reducing the number of atomics and collapsing loops. We illustrate its effectiveness by generating efficient CUDA codes for four graph analytic algorithms, and comparing performance against their multicore OpenMP versions generated by Green-Marl. In particular, our generated CUDA code performs comparable to 4 to 64-threaded OpenMP versions for different algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
LightHouse code is available at http://pace.cse.iitm.ac.in/tools.php.

References

Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In: ICPP 2006, pp. 523–530 (2006)
Google Scholar
Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: SC 2011, pp. 65:1–65:12. ACM (2011)
Google Scholar
Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on GPUs. In: IISWC 2012, pp. 141–151. IEEE Computer Society (2012)
Google Scholar
Checconi, F., Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A.R., Sabharwal, Y.: Breaking the speed, scalability barriers for graph exploration on distributed-memory machines. In: SC 2012, pp. 13:1–13:12 (2012)
Google Scholar
Gharaibeh, A., Costa, L.B., Santos-Neto, E., Ripeanu, M.: A yoke of oxen and a thousand chickens for heavy lifting graph processing. In: PACT 2012 (2012)
Google Scholar
Hong, S., Chafi, H., Sedlar, E., Olukotun, K.: Green-Marl: a DSL for easy and efficient graph analysis. In: ASPLOS 2012, pp. 349–362 ACM (2012)
Google Scholar
Jablin, T.B., Jablin, J.A., Prabhu, P., Liu, F., August, D.I.: Dynamically managed data for CPU-GPU architectures. In: CGO 2012. ACM (2012)
Google Scholar
Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casçaval, C.: How much parallelism is there in irregular applications? In: PPoPP 2009, pp. 3–14 (2009)
Google Scholar
Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News 36(1), 233–243 (2008)
Article Google Scholar
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. PLDI 42(6), 211–222 (2007)
Google Scholar
Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++, June 2014. http://snap.stanford.edu/snap
Madduri, K., Bader, D., Berry, J., Crobak, J.: An experimental study of a parallel shortest path algorithm for solving large-scale graph instances. In: ALENEX (2007)
Google Scholar
Nasre, R., Burtscher, M., Pingali, K.: Morph algorithms on GPUs. In: PPoPP 2013. ACM (2013)
Google Scholar
Pearce, R., Gokhale, M., Amato, N.M.: Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: SC 2010, pp. 1–11 (2010)
Google Scholar
Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.-H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: PLDI 2011, pp. 12–25. ACM (2011)
Google Scholar
Prountzos, D., Manevich, R., Pingali, K.: Elixir: a system for synthesizing concurrent graph programs. In: OOPSLA 2012, pp. 375–394. ACM (2012)
Google Scholar
Prountzos, D., Manevich, R., Pingali, K.: Synthesizing parallel graph programs via automated planning. In: PLDI, pp. 533–544. ACM (2015)
Google Scholar
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: PLDI 2013, pp. 519–530. ACM (2013)
Google Scholar
Shun, J., Blelloch, G.E.: Ligra: A lightweight graph processing framework for shared memory. In: PPoPP, pp. 135–146. ACM (2013)
Google Scholar
Venkat, A., Shantharam, M., Hall, M., Strout, M.M.: Non-affine extensions to polyhedral code generation. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation, Optimization, CGO 2014, pp. 185:185–185:194. ACM, New York (2014)
Google Scholar
Xiao, S., Feng, W.: Inter-block GPU communication via fast barrier synchronization. In: IPDPS, pp. 1–12. IEEE (2010)
Google Scholar
Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on blueGene/L. In: ICS, p. 25. IEEE Computer Society (2005)
Google Scholar
Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for GPU computing. In: ASPLOS. ACM (2011)
Google Scholar
Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

IIT Madras, Chennai, India
G. Shashidhar & Rupesh Nasre

Authors

G. Shashidhar
View author publications
You can also search for this author in PubMed Google Scholar
Rupesh Nasre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to G. Shashidhar or Rupesh Nasre .

Editor information

Editors and Affiliations

University of Rochester , Rochester, New York, USA
Chen Ding
University of Rochester , Rochester, New York, USA
John Criswell
Huawei Inc. , Santa Clara, California, USA
Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shashidhar, G., Nasre, R. (2017). LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-52709-3_18
Published: 24 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics