Keywords

1 Introduction

Multiscale methods in computational fluid dynamics, in particular coupled molecular-continuum simulations [6, 7, 12, 22], allow to go beyond the limitations imposed by modelling accuracy or computational feasibility of a particular single-scale method. They are frequently applied for instance for nanostructure investigation in chemistry, especially for research in lithium-ion batteries [15].

In the molecular-continuum context, MD regions and continuum flow regions are coupled to extend the simulation capability over multiple temporal and spatial scales. This is useful for many challenging applications involving e.g. nanomembranes [20] or polymer physics [1, 18].

Due to applications that often require large domains, long time spans, or fine resolution, parallelism and scalability constitute important aspects of molecular-continuum methods and software. Several codes for massively parallel execution of molecular-continuum simulations on high performance computing (HPC) systems exist, for instance the CPL library [22], MaMiCo [14] or HACPar [19].

Many coupling schemes for molecular-continuum simulations have been investigated [2, 8, 12, 16, 24], based amongst others on the internal-flow multiscale method [2] or the heterogeneous multiscale method [8]. In the steady state case, time-averaging of hydrodynamic quantities sampled from MD is sufficient for the coupling to a continuum solver [7]. But a transient simulation with short coupling time intervals can easily become unstable due to fluctuating MD flow field quantities.

One approach to tackle this is multi-instance sampling [13], where averaged information comes from an ensemble of MD systems. This approach, however, is computationally very expensive.

It has been shown that noise removal techniques, i.e. filters, are another effective approach to reduce thermal fluctuations in the molecular-continuum setting. Grinberg used proper orthogonal decomposition (POD) of MD flow field data, demonstrating also HPC applicability of their method by running a coupled atomistic-continuum simulation on up to 294,912 compute cores [11]. Zimoń et al. [26] have investigated and compared different kinds of noise filters for several particle based flow simulations, such as harmonically pulsating flow or water flow through a carbon nanotube. They pointed out that the combination of POD with one of various other noise filtering algorithms yields significant improvements in denoising quality.

One of the main challenges in the field of coupled multiscale flow simulation is the software design and implementation, since goals such as flexibility, interchangeability of solvers and modularity on an algorithmic level often contradict to hardware and high performance requirements. Some very generic solutions exist, such as MUI [23] or the MUSCLE 2 [3] software, which can be used for various multiscale settings. We recently presented the MaMiCo coupling framework [13, 14]. It is system-specific for molecular-continuum coupling, but independent of actual MD and CFD solvers. MaMiCo provides interface definitions for arbitrary flow simulation software, hides coupling algorithmics from the solvers and supports 2D as well as 3D simulations.

In this paper, we present extensions of the MaMiCo tool for massively parallel particle simulation data analytics and noise reduction. We introduce a new interface that is intended primarily for MD flow quantity noise filtering, but in the same way usable for any kind of MD data post-processing for the molecular-continuum coupling. We also present an implementation of a HPC compatible and scalable POD noise filter for MaMiCo. Our interface is compatible with multi-instance MD computations in a natural way, thus it enables a novel combination of noise filtering and multi-instance sampling: A small ensemble of separate MD simulations delivers information about the behaviour of the simulated system on a molecular level, while the filtering efficiently extracts a smooth signal for the continuum solver. The goal of this paper is to investigate this new combination and to discuss the related software design issues.

In Sect. 2 we introduce the relevant theoretical background on MD (Sect. 2.1), the considered molecular-continuum coupling (Sect. 2.2) and POD (Sect. 2.3). Section 3 focuses on the implementation of the MaMiCo data analytics and filtering extension. In Sect. 4, we quantify the denoising quality of the filtering. We further analyse signal-to-noise ratios (SNR) for a three-dimensional oscillating Couette flow with synthetic noisy data (Sect. 4.1). To validate the resulting transient coupled molecular-continuum simulation, we use a Couette flow setup and examine its startup (Sect. 4.2). We also conduct performance measurements and scaling tests (Sect. 4.3) of our new coupled flow simulation on the LRZ Linux ClusterFootnote 1 platforms CoolMUC-2 and CoolMUC-3. Finally, we give a summary and provide an outlook to future work in Sect. 5.

2 Theoretical Background

2.1 Molecular Dynamics (MD)

For the sake of simplicity we restrict our considerations to a set of Lennard-Jones molecules here, without loosing generality since the coupling and noise reduction methodology and software is compatible with other particle systems such as dissipative particle dynamics [9] systems in the same way.

For a number N of molecules with positions \(\varvec{x}_i\), velocities \(\varvec{v}_i\), mass m and interacting forces \(\varvec{F}_i\), the behaviour of the system is determined by a Verlet-type numerical integration, using a particle system time step width \(dt_{\text {P}}\), of Newton’s equation of motion

$$\begin{aligned} \frac{d}{dt} \varvec{v}_i = \frac{1}{m} \varvec{F}_i, \qquad \frac{d}{dt} \varvec{x}_i = \varvec{v}_i. \end{aligned}$$
(1)

\(\varvec{F}_i\) is defined by Lennard-Jones parameters \(\epsilon \), \(\sigma \) and a cut-off-radius \(r_c\), see [13] for details.

The method is implemented using linked cells [10]. The implementation is MPI-parallelized in the simulation software SimpleMD [14] which is part of MaMiCo; note that this solver choice is for simplicity only as it has been shown that MaMiCo interfaces various MD packages, such as LAMMPS, ls1 mardyn, and ESPResSo/ESPResSo++.

2.2 Coupling and Quantity Transfer

The method used here to couple continuum and MD solver is based on [13, 16].

The simulation setup and overlapping domain decomposition is shown in Fig. 1. Nested time stepping is used, i.e. the particle system is advanced over n time steps during every time step of the continuum solver, \(dt_{\text {C}} := n \cdot dt_{\text {P}}\).

Fig. 1.
figure 1

Coupled Couette simulation transfer region setup. 2D slice through 3D domain, H is the wall distance. Here, only the four outer cell layers close to the MD boundary are shown. At the outer MD domain boundary, an additional boundary force \(F_b(r)\) is applied to the molecules. In the green outer MD cells, mass fluxes are transferred from continuum to MD. On the two blue cell layers, velocity values are imposed to MD. Finally, inside the red cells (arbitrarily large region), velocity and density values are sampled, post-processed, and sent back to the continuum solver. (Color figure online)

Particle \(\rightarrow \) continuum data transfer: Continuum quantities for the coupling, such as average velocity \(\varvec{u}\), are sampled cell-wise over a time interval \(dt_{\text {C}}\):

$$\begin{aligned} \varvec{u} = \frac{dt_{\text {P}}}{N dt_{\text {C}}} \sum _{k = 0}^{\frac{dt_{\text {C}}}{dt_{\text {P}}} - 1} \sum _{i = 1}^{N} \varvec{v}_i(t_0 + k dt_{\text {P}}) \end{aligned}$$
(2)

Afterwards, these quantities are optionally post-processed (separately for every MD instance), accumulated from all instances together, and represent the molecular flow field at continuum scale.

Continuum \(\rightarrow \) particle data transfer: Velocities coming from overlapping cells of the continuum flow solver are used to accelerate the molecules in the corresponding grid cells, they are applied via an additional forcing term with shifted time intervals, see [13] for details. Mass flux from the continuum into the particle simulation is realised by particle deletion and insertion with the USHER algorithm [5]. As boundary conditions for the MD-continuum boundary reflecting boundaries are used. We add the boundary force \(F_b(r)\) proposed by Zhou et al. [25]. It is given as an analytic fitting formula and estimates missing intermolecular force contributions over a wide range of densities and temperatures.

2.3 Noise Reduction: POD

The proper orthogonal decomposition (POD), also known as principal component analysis, is a general statistical method for high-dimensional data analysis of dynamic processes and a main technique for data dimensionality reduction. It was already used and investigated for analysis of velocity fields from atomistic simulations [11, 26]. Here we employ POD as a fast standard method and basic example to demonstrate the capabilities of our data analytics and noise reduction interface. However, note that various algorithms are known to yield significantly stronger denoising, such as the POD+ methods proposed by Zimoń et al. [26], e.g. POD together with wavelet transform or Wiener filtering. Also many multi-dimensional image processing filters like non-local means, anisotropic filtering or total variation denoising [4, 17] are promising for CFD data. They can be applied with the noise reduction interface in the same way as POD.

Based on the method of snapshots [21] to analyse a space-time window of fluctuating data, consider a window of N discrete time snapshots \(t \in \{t_1,...,t_N\} \subset \mathbb {R}\). A finite set \(\varOmega \) of discrete sampling points \(\varvec{x} \in \varOmega \subset \mathbb {R}^3\) defines a set of signal sources \(u(\varvec{x}, t)\). POD describes the function \(u(\varvec{x}, t)\) as a finite sum:

$$\begin{aligned} u(\varvec{x}, t) \approx \sum _{k=1}^{k_{max}} \phi _k(\varvec{x}) a_k(t) \end{aligned}$$
(3)

where the orthonormal basis functions (modes) \(\phi _k(\varvec{x})\) and \(a_k(t)\) represent the temporal and spatial components of the data, such that the first \(k_{max}\) modes are the best approximation of \(u(\varvec{x}, t)\). Choosing a sufficiently small \(k_{max}\) will approximate only the dominating components of the signal u and exclude noise. The temporal modes \(a_k(t)\) can be obtained by an eigenvalue decomposition of the temporal auto-correlation covariance matrix C,

$$\begin{aligned} C_{ij} = \sum _{\varvec{x} \in \varOmega } u(\varvec{x}, t_i) u(\varvec{x}, t_j) \quad i,j = 1,2,...,N. \end{aligned}$$
(4)

They in turn can be used to compute the spatial modes \(\phi _k(\varvec{x})\) with orthogonality relations.

In case of parallel execution, i.e. when subsets of \(\varOmega \) are distributed over several processors, Eq. (4) can be evaluated by communicating local versions of C in a global reduction operation. Although this is computationally expensive compared to purely local, independent POD executions, it is helpful to prevent inconsistencies between subdomains and enforce smoothness, especially in a coupled molecular-continuum setting. Every other step of the algorithm can be performed locally.

3 Implementation: MaMiCo Design and Extension

The MaMiCo tool [13, 14] is a C++ framework designed to couple arbitrary massively parallel (i.e., MPI-parallel) continuum and particle solvers in a modular and flexible way, employing a Cartesian grid of so-called macroscopic cells to enable the data exchange. It also provides interfaces to coupling algorithms and data exchange routines. In this paper we employ only the built-in MD simulation, called SimpleMD on the microscopic side. On the macroscopic side, a simple Lattice Boltzmann (LB) implementation, the LBCouetteSolver and an analytical CouetteSolver are used. The necessary communication between SimpleMD ranks and the ranks of the LBCouetteSolver, where the respective cells are located, is performed by MaMiCo.

Fig. 2.
figure 2

Extended MaMiCo system design. (Color figure online)

The extended system design of MaMiCo is shown in Fig. 2, where the latest developments have been marked in the central red box: A newly introduced interface NoiseReduction (Listing 1) is part of the quantity transfer and coupling algorithmics bundle. It is primarily intended for noise filtering, but able to manage arbitrary particle data analytics or post-processing tasks during the coupling, independently of the coupling scheme, actual particle and continuum solver in use. It is designed to be compatible with multi-instance MD computations in a natural way, as separate noise filters are instantiated for each MD system automatically. This yields a strong subsystem separation rather than algorithmic interdependency. However, explicit cross-instance communication is still possible if necessary.

We provide a dummy implementation, IdentityTransform, that performs no filtering at all, of the NoiseReduction interface, as well as our own massively parallel implementation of a particle data noise filter using POD. The POD implementation employs EigenFootnote 2 to perform linear algebra tasks such as the eigenvalue decomposition. It involves a single invocation of a global (per MD instance) MPI reduction operation to enable detection of supra-process flow data correlation and its separation from thermal noise. It is fully configurable using XML configuration files, where you can specify \(k_{max}\) and N.

figure a

4 Analysis of Simulation Results

Our test scenario, the Couette flow, consists of flow between two infinite parallel plates. A cubic simulation domain with periodic boundaries in x- and y-direction is used. The upper wall at a distance of \(z = H\) is at rest, while the lower wall at \(z = 0\) moves in x-direction with constant speed \(u_{\text {wall}} = 0.5\). The analytical flow solution for the start-up from unit density and zero velocity everywhere can be derived from the Navier–Stokes equations and is given by:

$$\begin{aligned} u_x(z,t) = u_{\text {wall}} \left( 1 - \frac{z}{H}\right) - \frac{2 u_{\text {wall}}}{\pi } \sum _{k=1}^\infty \frac{1}{k} \sin \left( k \pi \frac{z}{H} \right) e ^ { - k^2 \pi ^2 \nu t / H^{2} } \end{aligned}$$
(5)

where \(\nu \) is the kinematic viscosity of the fluid and \(u_y = 0, u_z = 0\).

We refer to the smallest scenario that we frequently use as MD-30, because it has a MD domain size of 30 \(\times \) 30 \(\times \) 30, embedded in a continuum simulation domain with \(H = 50\). We always use a continuum (and MD) cell size of 2.5 and time steps of \(dt_{\text {C}} = 0.5\), \(dt_{\text {P}} = 0.005\). The Lennard-Jones parameters and the molecule mass are set to \(m = \sigma = \epsilon = 1.0\). The MD domain is filled with 28 \(\times \) 28 \(\times \) 28 particles, this yields a density value \(\rho \approx 0.813\) and a kinematic viscosity \(\nu \approx 2.63\). MD-60 and MD-120 scenarios are defined by doubling or quadrupling domain sizes and particle numbers in all spatial directions, keeping everything else constant.

For a fluctuating x-velocity signal \(u(\varvec{x}, t)\) we define the signal-to-noise ratio

$$\begin{aligned} \text {SNR} = 10~log_{10} \left( \frac{\sum _{\forall \varvec{x}, t} \hat{u}(\varvec{x}, t)^2}{\sum _{\forall \varvec{x}, t} (\hat{u}(\varvec{x}, t) - u(\varvec{x}, t))^2} \right) \text {dB} \end{aligned}$$
(6)

with the analytical noiseless x-velocity \(\hat{u}\). As SNR is expressed using a logarithmic scale (and can take values equal to or less than zero - if noise level is equal to or greater than signal), absolute differences in SNR correspond to relative changes of squared signal amplitude ratios, so we define the gain of a noise reduction method as

$$\begin{aligned} \text {gain} = \text {SNR}_{\text {OUT}} - \text {SNR}_{\text {IN}}, \end{aligned}$$
(7)

with \(\text {SNR}_{\text {OUT}}\) and \(\text {SNR}_{\text {IN}}\) denoting the signal-to-noise ratios of denoised data and original fluctuating signal, respectively.

4.1 Denoising Harmonically Oscillating Flow

To point out and quantify the filtering quality of MaMiCo’s new POD noise filter component, we investigate a harmonically oscillating 3D Couette flow. We use the MD-30 scenario and set \(u_{\text {wall}}\) to a time dependent sine signal. Since we want to investigate only the noise filter here but not the coupled simulation, we use synthetic (analytical) MD data with additive Gaussian noise, without running a real MD simulation. This is not a physically valid flow profile as it disregards viscous shear forces caused by the time-dependent oscillating acceleration, but it is eligible to examine and demonstrate noise filtering performance. We show the influence of varying the POD parameters \(k_{max}\) and N in Fig. 3, where the x-component of velocity, for clarity in only one of the cells in the MD domain, is plotted over time; the SNR value is computed over all cells.

Fig. 3.
figure 3

SNR gain of our POD implementation in MaMiCo, for oscillating 3D Couette flow, using synthetic MD data. Maximum \(u_{\text {wall}}\) = 0.5. Red: x-component of velocity; Black: True noiseless signal for this cell (Color figure online)

The number \(k_{max}\) of POD modes used for filtering is considered to be a fixed simulation parameter in this paper, however note that the optimal value of \(k_{max}\) depends on the flow features and several methods to choose \(k_{max}\) adaptively at runtime by analysing the eigenspectra have been proposed [11, 26].

Figure 3f shows the best reduction of fluctuations with a SNR gain of 17.05 dB compared to the input signal Fig. 3a. Only one POD mode is sufficient here, since the sine frequency is low and the flow close to a steady-state, so that higher eigenvalues already describe noise components of the input.

Fig. 4.
figure 4

Oscillating 3D Couette, coupled one-way into a real MD simulation instead of using synthetic data.

We validate the synthetic MD data test series by repeating the same experiment with a real one-way coupled molecular-continuum simulation. Figure  4a shows the noisy x-velocity signal in one of the cells. In Fig. 4b we additionally enable multi-instance sampling using 8 MD instances and observe a smooth sine output with a SNR gain of 19.84 dB.

4.2 Start-Up of Coupled Couette Flow

We employ a one-way coupled LBCouetteSolver \(\rightarrow \) SimpleMD simulation running on 64 cores in a MD-30 scenario. The MD quantities that would be returned to the CouetteSolver are collected on the corresponding CouetteSolver rank. They are compared to the analytical solution in Fig. 5.

Figure  5a shows a very high fluctuation level, because \(u_{\text {wall}}\) = 0.5 is relatively low compared to thermal fluctuations of MD particles. Figures  5b and c compare multi-instance sampling and noise filtering. The 32 instance simulation is computationally more expensive, but yields a strong gain of 14.73 dB. Theoretically one would expect from multi-instance MD sampling with \(I = 32\) instances a reduction of the thermal noise standard deviation by factor \(\sqrt{I}\), and a reduction of squared noise amplitude by factor I, so the expected gain is \(10~log_{10} (I)~\text {dB} = 15.05~\text {dB}\), which is in good compliance with our experimental result. The simulation with POD is using two modes here, which is necessary as data in a singe mode does not capture the fast flow start-up. A smaller comparable gain of 11.15 dB is obtained here, using much less computational resources (see Sect. 4.3). The best result is achieved using a combination of multi-instance sampling and noise filtering shown in Fig. 5d. This novel approach yields a signal-to-noise gain of 22.63 dB for this test scenario, so that the experimentally produced velocity values closely match the analytical solution.

Thus the new combined coupling approach features benefits for both performance and precision. Its ability to extract considerably smoother flow field quantities permits coupling on shorter time scales.

Fig. 5.
figure 5

Couette startup flow profiles, one-way LB \(\rightarrow \) SimpleMD coupling, multi-instance MD versus noise filtering (Color figure online)

4.3 Performance and Scaling Tests

In Table 1 we investigate the performance of the noise filtering subsystem compared to the other components of the coupled simulation. POD always runs with same number of cores as MD. The MaMiCo time includes only efforts for coupling communication – particle insertion and velocity imposition is counted as MD time. The table entries are sampled over 100 coupling cycles each, excluding initializations, using a Couette flow simulation on the LRZ Linux Cluster.

The filter parameter N (time window size) strongly influences the POD runtime, as \(C \in \mathbb {R}^{N \times N}\) – and communication and eigenvalue decomposition is performed on C. This leads in practice to a complexity in the order of \(\mathcal {O}(N^3)\). Thus, choosing a sufficiently small N is important to limit the computational demand of the method.

Table 1. Impact of noise filter on overall coupled simulation performance. Ratios in the form \(\left[ \begin{array}{cc}\text {MD} &{} \text {LB} \\ \mathbf POD &{} \text {MaMiCo}\end{array}\right] \) – given as percentages of total runtime spent in the respective simulation component.
Fig. 6.
figure 6

Strong scaling of coupled simulation, MD-120, including noise filtering. All cores are used for LB, MD and POD, respectively.

However, the noise reduction runtime is always very low compared to the other simulation components. Particularly with regard to the high gain in signal-to-noise ratio that is reached with relatively little computational effort here, this is a very good result.

The scalability of the coupled simulation, including the noise filter, is evaluated on LRZ CoolMUC-3. The strong scaling tests in Fig. 6 are performed in a fixed MD-120 domain with up to 512 cores used. It is found that our new POD implementation does not significantly impede the scaling, even though it employs a MPI reduction operation. This operation is executed only once per coupling cycle, i.e. every 100 MD time steps, and is restricted to a single MD instance. These results demonstrate the eligibility of our MD data post-processing approach and POD implementation for high-performance computing applications.

5 Conclusions

We have introduced a new noise filtering subsystem into MaMiCo that enables massively parallel particle data post-processing in the context of transient molecular-continuum coupling. Thanks to MaMiCo’s modular design, it is compatible with any particle and continuum solver and can be utilized in conjunction with MD multi-instance sampling. Experiments with flow start-up profiles validate our coupled simulation. SNR considerations for oscillating flow demonstrate the filtering quality of our POD implementation. The noise filtering interface and implementation are very scalable and have only a minimal impact on the overall coupled simulation performance.

We observed SNR gains from the POD filter ranging roughly from 11 dB to 17 dB. This corresponds to a potential simulation performance increase by MD instance reduction by a factor of 10 to 50. However, we point out that a combination of both, POD and multi-instance sampling, yields even more smooth quantities and thus enables coupling on shorter time scales, while yielding a higher level of parallelism than a POD-only approach.

As our interface is very generic and flexible, a possible future work suggesting itself would be to conduct experiments with more noise filtering algorithms. Especially the area of image processing offers many promising methods which could be applied to CFD instead of image data. Besides, further data analytics and particle analysis tasks may be tackled in the future, such as flow profile extraction, pressure wave and density gradient detection, or specific molecular data collection modules, e.g. for determination of radial distribution functions. Resilient MD computations by a fault-tolerant multi-instance system with the ability to recover from errors are also conceivable, as well as a machine learning based extension extracting more smooth quantities by learning and detection of prevalent flow features.