Abstract
We introduce a decomposition of the Tikhonov Regularization (TR) functional which split this operator into several TR functionals, suitably modified in order to enforce the matching of their solutions. As a consequence, instead of solving one problem we can solve several problems reproducing the initial one at smaller dimensions. Such approach leads to a reduction of the time complexity of the resulting algorithm. Since the subproblems are solved in parallel, this decomposition also leads to a reduction of the overall execution time. Main outcome of the decomposition is that the parallel algorithm is oriented to exploit the highest performance of parallel architectures where concurrency is implemented both at the coarsest and finest levels of granularity. Performance analysis is discussed in terms of the algorithm and software scalability. Validation is performed on a reference parallel architecture made of a distributed memory multiprocessor and a Graphic Processing Unit. Results are presented on the Data Assimilation problem, for oceanographic models.
Similar content being viewed by others
Notes
A partitioned algorithm is a scalar (or point) algorithm in which the operations have been grouped and reordered into matrix operations. A block algorithm is a generalization of a scalar algorithm in which the basic scalar operations become matrix operations, and a matrix property based on the nonzero structure becomes the corresponding property blockwise (LAPACK contains only partitioned algorithms that is, the main computations are block oriented and implemented by using BLAS-3) [15].
References
Antonelli, L., Carracciuolo, L., Ceccarelli, M., D’Amore, L., Murli, A.: Total variation regularization for edge preserving 3D SPECT imaging in high performance computing environments. Lecture Notes in Computer Science, Vol. 2330 LNCS, Issue PART 2, 171–180, (2002)
Arcucci, R., D’Amore, L., Carracciuolo, L.: On the problem-decomposition of scalable 4D-Var Data Assimilation models, International Conference on High Performance Computing and Simulation (HPCS), pp. 589–594. ISBN 978-1-4673-7812-3, (2015)
Campagna, R., D’Amore, L., Murli, A.: An efficient algorithm for regularization of Laplace transform inversion in real case. J. Comput. Appl. Math. 210(1–2), 84–98 (2007)
Carracciuolo, L., D’Amore, L., Murli, A.: Towards a parallel component for imaging in PETSc programming environment: a case study in 3-D echocardiography. Parallel Comput. 32(1), 67–83 (2006)
Chung, J., Nagy, J.G.: An efficient iterative approach for large scale separable nonlinear inverse problems. SIAM J. Sci. Comput. 31(6), 4654–4674 (2010)
D’Amore, L., Mele, V., Laccetti, G., Murli, A.: Mathematical Approach to the Performance Evaluation of Matrix Multiply Algorithm. Lecture Notes in Computer Science, Vol. 9574, pp. 25–34 (2016)
D’Amore, L., Arcucci, R., Carracciuolo, L., Murli, A.: A scalable approach to variational data assimilation. J. Sci. Comput. 2, 239–257 (2014)
D’Amore, L., Arcucci, R., Carracciuolo, L., Murli, A.: DD-oceanvar: a domain decomposition fully parallel data assimilation software in mediterranean sea. Procedia Comp Sci. 18, 1235–1244 (2013)
D’Amore, L., Arcucci, R., Marcellino, L., Murli, A.: HPC computation issues of the incremental 3D variational data assimilation scheme in OceanVar software. J. Numer. Anal. Ind. Appl. Math. 7(3–4), 91–105 (2012)
D’Amore, L., Casaburi, D., Galletti, A., Marcellino, L., Murli, A.: Integration of emerging computer technologies for an efficient image sequences analysis. Integr. Comput. Aided Eng. 18(4), 365–378 (2011)
D’Amore, L., Laccetti, G., Romano, D., Scotti, G.: Towards a parallel component in a GPU-CUDA environment: a case study with the L-BFGS Harwell routine. J. Comput. Math. 93(1), 59–76 (2015)
D’Amore, L., Campagna, R., Galletti, A., Marcellino, L., Murli, A.: A smoothing spline that approximates Laplace transform functions only known on measurements on the real axis. Inverse Probl. 28(2), 37 (2012)
D’Amore, L., Murli, A.: Regularization of a fourier series method for the Laplace transform inversion with real data authors of document. Inverse Probl. 18(4), 1185–1205 (2002)
D’Amore, L., Marcellino, L., Murli, A.: Image sequence inpainting: towards numerical software for detection and removal of local missing data via motion estimation. J. Comput. Appl. Math. 198(2), 396–413 (2007)
Demmel, J.W., Higham, N.J., Schreiber, R.: Block LU Factorization. RIACS Technical Report no. 92–03, (1992)
ETP4HPC Agenda, European Technology Platform for High Performance Computing. Strategic research agenda achieving HPC leadership in Europe, (2013)
Flatt, H.P., Kennedy, K.: Performance of parallel processors. Parallel Comput. 12, 1–20 (1989)
Freitag, M.A., Nichols, N.K., Budd, C.J.: L1-regularisation for ill-posed problems in variational data assimilation. PAMM Proc. Appl. Math. Mech. 10(1), 665–668 (2010)
Gallopoulos, E., Simoncini, V.: Iterative solution of multiple linear systems: In: Topping B.H.V., Papadrakaki M. (eds.) Theory, practice, parallelism, and applicationsIn Advances in Parallel and Vector Processing for Structural Mechanics, Proc. Second Intl. Conf. Computational Structures Technology, pp. 4751.Civil-Comp Press, Edinburgh (1994)
Hansen, P.C.: Rank Deficient and Discrete Ill-posed Problems. SIAM, Philadelphia (1998)
Kalnay, E.: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press, Cambridge (2003)
Laccetti, G., Lapegna, M., Mele, V., Romano, D., Murli, A.: A double adaptive algorithm for multidimensional integration on multicore based HPC systems. Int. J. of Parallel Progam. 40(4), 397–409 (2012)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)
MAGMA Software www.icl.cs.utk.edu/plasma
Murli, A., D’Amore, L., Laccetti, G., Gregoretti, F., Oliva, G.: A multi-grained distributed implementation of the parallel block conjugate gradient algorithm. Concurrency Comput. Pract. Exp. 22(15), 2053–2072 (2010)
Murli, A., Boccia, V., Carracciuolo, L., D’Amore, L., Laccetti, G., Lapegna, M.: Monitoring and migration of a PETSc-based parallel application for medical imaging in a grid computing PSE. IFIP Int. Federation Inf. Process. 239, 421–432 (2007)
Nichols, N.K.: Mathematical concepts of data assimilation. In: Lahoz, W., Khattatov, B., Menard, R. (eds.) Data assimilation: Making Sense of Observations, pp. 13–40. Springer, Berlin (2010)
Nvidia, TESLA K20 GPU Active Accelerator (2012). Board spec. Available: http://www.nvidia.in/content/PDF/kepler/Tesla-K20-Active-BD-06499-001-v02
O’Leary, D.P., Simmons, J.A.: A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems. SIAM J. Sci. Stat. Comput. 2, 474489 (1981)
Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8, 4371 (1982)
pARMS Software www-users.cs.umn.edu/saad/software/parms
PCIsig, tecnology specifications at http://pcisig.com/specifications/pciexpress/
PETSC Software www.mcs.anl.gov/petsc
PLASMA Software www.icl.cs.utk.edu/plasma
Reichel, L., Baglamana, J.: Decomposition methods for large linear discrete ill-posed problems. J. Comput. Appl. Math 198(2), 333–343 (2007)
Tikhonov, A.N., Solution of incorrectly formulated problems and the regularization method, Dokl. Akad. Nauk. SSSR 151, : 501504 = Soviet Math. Dokl. 4(1963), 1035–1038 (1963)
Acknowledgments
This work was developed within the research activity of the H2020-MSCA-RISE-2016 NASDAC Project N. 691184. This work has been realized thanks to the use of the S.Co.P.E. computing infrastructure at the University of Naples.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arcucci, R., D’Amore, L., Carracciuolo, L. et al. A Decomposition of the Tikhonov Regularization Functional Oriented to Exploit Hybrid Multilevel Parallelism. Int J Parallel Prog 45, 1214–1235 (2017). https://doi.org/10.1007/s10766-016-0460-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0460-3