Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications

Yamato, Yoji

doi:10.1007/s10844-019-00575-8

Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications

Published: 14 August 2019

Volume 54, pages 567–584, (2020)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Yoji Yamato¹

464 Accesses
9 Citations
Explore all metrics

Abstract

To overcome of the high cost of developing IoT (Internet of Things) services by vertically integrating devices and services, Open IoT has been developed to enable various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that can provide the data users need on demand and use them dynamically. We have also proposed an automatic GPU (graphics processing unit) offloading method as an elementary technology of Tacit Computing. However, our GPU offloading method can improve only a limited number of applications because it only optimizes the extraction of parallelizable loop statements. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet and Fourier Transform, which are general large applications for CPU, and find that it can process them 3 times and 5 times as quickly as only using CPUs within 10-hour tuning time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Web GIS and its architecture: a review

Article 29 November 2017

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

References

Beylkin, G., Fann, G., Harrison, R.J., Kurcz, C., Monzon, L. (2012). Multiresolution representation of operators with boundary conditions on simple domains. Elsevier Applied and Computational Harmonic Analysis, 33(1), 109–139.
Article MathSciNet Google Scholar
Clang Website. (2018). http://llvm.org/. Accessed 20 May 2019.
Hermann, M., Pentek, T., Otto, B. (2015). Design principles for Industrie 4.0 scenarios, Working Draft, Rechnische Universitat Dortmund. http://www.snom.mb.tu-dortmund.de/cms/de/forschung/Arbeitsberichte/Design-Principles-for-Industrie-4_0-Scenarios.pdf.
Holland, J.H. (1992). Genetic algorithms. Scientific american, 267(1), 66–73.
Article Google Scholar
Ishizaki, K. (2016). Transparent GPU exploitation for Java. In The fourth international symposium on computing and networking (CANDAR 2016).
Laplace Equation Source Website. (2018). https://github.com/parallel-forall/cudacasts/tree/master/ep3-first-openacc-program. Accessed 20 May 2019.
NAS.FT Website. (2018). https://www.nas.nasa.gov/publications/npb.html. Accessed 20 May 2019.
Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P.Y., Burger, D. (2014). A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41th annual international symposium on computer architecture (ISCA’14) (pp. 13–24).
Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In IEEE international conference on robotics and automation (ICRA) (p. 2015).
Sanders, J., & Kandrot, E. (2011). CUDA by example: an introduction to general-purpose GPU programming, Addison-Wesley ISBN-0131387685.
Shirahata, K., Sato, H., Matsuoka, S. (2010). Hybrid map task scheduling for GPU-based heterogeneous clusters. In IEEE second international conference on cloud computing technology and science (CloudCom) (pp. 733–740).
Shitara, A., Nakahama, T., Yamada, M., Kamata, T., Nishikawa, Y., Yoshimi, M., Amano, H. (2011). Vegeta: an implementation and evaluation of development-support middleware on multiple opencl platform. In IEEE second international conference on networking and computing (ICNC 2011) (pp. 141–147).
Stone, J.E., Gohara, D., Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12 (3), 66–73.
Article Google Scholar
Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P. (2002). Compiler support of the workqueuing execution model for Intel SMP architectures. In Fourth European workshop on OpenMP.
Sunaga, H., Yamato, Y., Ohnishi, H., Kaneko, M., Iio, M., Hirano, M. (2008). Service delivery platform architecture for the next-generation network, ICIN 2008, Session 9-A.
Tanaka, Y., Miki, M., Yoshimi, M., Hiroyasu, T. (2011). Evaluation of optimization method for fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, 2011(9), 1–6.
Google Scholar
Tomatsu, Y., Hiroyasu, T., Yoshimi, M., Miki, M. (2010). Gpot: intelligent compiler for GPGPU using combinatorial optimization techniques. In The 7th joint symposium between Doshisha University and Chonnam National University.
Tron Project Web Site. (2018). http://www.tron.org/. Accessed 20 May 2019.
Wienke, S., Springer, P., Terboven, C., an Mey, D. (2012). Open ACC-first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pp. 859–870.
Wolfe, M. (2010). Implementing the PGI accelerator model. In ACM the 3rd workshop on general-purpose computation on graphics processing units (pp. 43–50).
Wuhib, F., Stadler, R., Lindgren, H. (2012). Dynamic resource allocation with management objectives - implementation for an OpenStack cloud. In 2012 8th international conference and 2012 workshop on systems virtualiztion management, Proceedings of Network and service management (pp. 309–315).
Yamato, Y. (2007). Ubiquitous service composition technology for ubiquitous network environments. IPSJ Journal, 48(2), 562–577.
Google Scholar
Yamato, Y. (2015a). Use case study of HDD-SSD hybrid storage, distributed storage and HDD storage on OpenStack. In 19th international database engineering & applications symposium (IDEAS15) (pp. 228–229).
Yamato, Y. (2015b). OpenStack Hypervisor, container and baremetal servers performance comparison. IEICE Communication Express, 4(7), 228–232.
Article Google Scholar
Yamato, Y. (2015c). Automatic verification technology of software patches for user virtual environments on IaaS cloud, Journal of Cloud Computing, Springer, 2015, 4:4, https://doi.org/10.1186/s13677-015-0028-6.
Yamato, Y. (2016a). Cloud storage application area of HDD-SSD hybrid storage, distributed storage and HDD storage. IEEJ Transactions on Electrical and Electronic Engineering, 11(5), 674–675.
Article Google Scholar
Yamato, Y. (2016b). Performance-aware server architecture recommendation and automatic performance verification technology on IaaS cloud, Service oriented computing and applications, Springer.
Yamato, Y. (2017a). Server selection, configuration and reconfiguration technology for IaaS cloud with multiple server types, Journal of Network and Systems Management, Springer, https://doi.org/10.1007/s10922-017-9418-z.
Yamato, Y. (2017b). Optimum application deployment technology for heterogeneous IaaS cloud. Journal of Information Processing, 25(1), 56–58.
Article Google Scholar
Yamato, Y., & Sunaga, H. (2007). Context-aware service composition and component change-over using semantic web techniques. In IEEE international conference on web services (ICWS 2007) (pp. 687–694).
Yamato, Y., Tanaka, Y., Sunaga, H. (2006). Context-aware ubiquitous service composition technology. In The IFIP international conference on research and practical issues of enterprise information systems (CONFENIS 2006) (pp. 51–61).
Yamato, Y., Ohnishi, H., Sunaga, H. (2008). Development of service control server for web-telecom coordination service. In IEEE international conference on web services (ICWS 2008) (pp. 600–607).
Yamato, Y., Nishizawa, Y., Nagao, S., Sato, K. (2015a). Fast and reliable restoration method of virtual resources on OpenStack, IEEE Transactions on Cloud Computing, https://doi.org/10.1109/TCC.2015.2481392.
Yamato, Y., Katsuragi, S., Nagao, S., Miura, N. (2015b). Software maintenance evaluation of agile software development method based on OpenStack. IEICE Transactions on Information & Systems, E98-D(7), 1377–1380.
Article Google Scholar
Yamato, Y., Fukumoto, Y., Kumazaki, H. (2017). Predictive maintenance platform with sound stream analysis in edges. Journal of Information Processing, 25, 317–320.
Article Google Scholar
Yamato, Y., Demizu, T., Noguchi, H., Kataoka, M. (2018a). Automatic GPU offloading technology for open IoT environment. IEEE Internet of Things Journal.
Yamato, Y., Noguchi, H., Kataoka, M., Isoda, T., Demizu, T. (2018b). Proposal of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. In The 3rd international conference on smart computing and communication (SmartCom 2018) (pp. 39–54).
Yokohata, Y., Yamato, Y., Takemoto, M., Sunaga, H. (2006a). Service composition architecture for programmability and flexibility in ubiquitous communication networks. In IEEE international symposium on applications and the internet workshops (SAINTW’06) (pp. 142–145).
Yokohata, Y., Yamato, Y., Takemoto, M., Tanaka, E., Nishiki, K. (2006b). Context-aware content-provision service for shopping malls based on ubiquitous Service-Oriented network framework and authentication and access control agent framework. In IEEE consumer communications and networking conference (CCNC 2006) (pp. 1330–1331).

Download references

Author information

Authors and Affiliations

NTT Network Service Systems Laboratories, NTT Corporation, 3-9-11 Midori-cho, Musashino-shi, Tokyo, 180-8585, Japan
Yoji Yamato

Authors

Yoji Yamato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoji Yamato.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yamato, Y. Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. J Intell Inf Syst 54, 567–584 (2020). https://doi.org/10.1007/s10844-019-00575-8

Download citation

Received: 21 May 2019
Revised: 06 August 2019
Accepted: 08 August 2019
Published: 14 August 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10844-019-00575-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Web GIS and its architecture: a review

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Web GIS and its architecture: a review

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation