LagrangeFlux Schemes: Reformulating SecondOrder Accurate LagrangeRemap Schemes for Better NodeBased HPC Performance
Schémas Lagrangeflux : reformuler les schémas LagrangeProjection d’ordre deux pour améliorer la performance HPC au niveau nœud de calcul
^{1}
CMLA, ENS Cachan, Université ParisSaclay, CNRS, 94235
Cachan – France
^{2}
Maison de la Simulation, USR 3441, CEA Saclay, 91191
GifsurYvette – France
^{3}
CEA DAM DIF, 91297
Arpajon – France
^{4}
CEA Saclay, DEN, 91191
GifsurYvette – France
^{5}
CGG, 27 avenue Carnot, 91300
Massy – France
email: devuyst@cmla.enscachan.fr – thibault.gasc@cea.fr – renaud.motte@cea.fr – mathieu.peybernes@cea.fr – raphael.poncet@cgg.com
^{*} Corresponding author
Received:
8
December
2015
Accepted:
23
September
2016
In a recent paper [Poncet R., Peybernes M., Gasc T., De Vuyst F. (2016) Performance modeling of a compressible hydrodynamics solver on multicore CPUs, in “Parallel Computing: on the road to Exascale”], we have achieved the performance analysis of staggered Lagrangeremap schemes, a class of solvers widely used for hydrodynamics applications. This paper is devoted to the rethinking and redesign of the Lagrangeremap process for achieving better performance using today’s computing architectures. As an unintended outcome, the analysis has lead us to the discovery of a new family of solvers – the socalled Lagrangeflux schemes – that appear to be promising for the CFD community.
Résumé
Dans un article récent [Poncet R., Peybernes M., Gasc T., De Vuyst F. (2016) Performance modeling of a compressible hydrodynamics solver on multicore CPUs, in “Parallel Computing: on the road to Exascale”], nous avons effectué l’analyse de la performance d’un schéma de type Lagrange+projection à variables décalées ; cette classe de solveurs est très utilisée pour les applications d’hydrodynamique. Dans cet article, on s’intéresse à la reformulation des solveurs Lagrangeprojection afin d’améliorer leur performance globale sur architectures de calcul standards. De manière inattendue, l’analyse nous a conduit vers la découverte d’une nouvelle famille de solveurs – appelés schémas Lagrangeflux – qui apparaissent comme très prometteurs dans la communauté CFD.
© F. De Vuyst et al., published by IFP Energies nouvelles, 2016
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Motivation and Introduction
For complex compressible flows involving multiphysics phenomena like e.g. highspeed elastoplasticity, multimaterial interaction, plasma, gasparticles, etc., a Lagrangian description of the flow is generally preferred. To achieve robustness, some spatial remapping on a regular mesh may be added. A particular case is the family of the socalled Lagrange+remap schemes [1–3], also referred to as remapped Lagrange solvers that apply a remap step on a reference (Eulerian) mesh after each Lagrangian time advance. Legacy codes implementing remapped Lagrange solvers usually define thermodynamical variables at cell centers and velocity at cell nodes (Fig. 1). In Poncet et al. [4], we have achieved a multicore nodebased performance analysis of a reference Lagrangeremap hydrodynamics solver used in industry. By analyzing each kernel (i.e. elementary computing function) of the whole algorithm, using rooflinetype models [5] on one side and refined Execution Cache Memory (ECM) models [6, 7] on the other side, we have been able not only to quantitatively predict the performance of the whole algorithm – with relative errors in the single digit range – but also to identify a set of features that limit the whole performance. This can be roughly summarized into three points:

For typical mesh sizes of real applications, spatially staggered variables involve a rather big amount of communication to/from CPU caches and memory with low arithmetic intensity, thus lowering the whole performance;

Usual Alternating Direction (AD) strategies (see the appendix in [8]) or AD remapping procedures also generate too much communication with a loss of CPU occupancy;

For multimaterial flows using VOFbased interface reconstruction methods, there is a strong loss of performance due to some array indirections and noncoalescent data in memory. Vectorization of such algorithms is also not trivial.
Figure 1 “Legacy” staggered Lagrangeremap scheme: thermodynamical variables are located at cell centers (circles) whereas velocity variables are located at cell nodes (squares). a) Eulerian cell, b) Lagrange step, c) Remapping step. 
From these observations and as a result of the analysis, we decided to “rethink” Lagrangeremap schemes, with possibly modifying some aspects of the solver in order to improve nodebased performance of the hydrocode. We have looked for alternative formulations that reduce communication and improve both arithmetic intensity and SIMD (Single Instruction – Multiple Data) property of the algorithm. In this paper, we describe the process of redesign of Lagrangeremap schemes leading to higher performance solvers. Actually, this redesign methodology also gave us ideas of innovative Eulerian solvers. The emerging methods, named Lagrangeflux schemes, appear to be promising in the extended computational fluid dynamics community.
The paper is organized as follows. In Section 1, we first formulate the requirements for the design of better Lagrangeremap schemes. In Section 2 we give a description of the Lagrange step and formulate it under a finite volume form. In Section 3 we focus on the remap step which is reformulated as a finite volume scheme with pure convective fluxes. This interpretation is applied in Section 4 to build the socalled Lagrangeflux schemes. We also discuss the important issue of achieving second order accuracy (in both space and time). Performance comparisons between the reference Lagrangeremap and the proposed Lagrangeflux scheme are presented in Section 5. In Section 6 we comment the possible extension to multimaterial flow with the use of lowdiffusive and accurate interfacecapturing methods. We will close the paper by some concluding remarks, work in progress and perspectives.
1 Requirements
Starting from “legacy’’ Lagrangeremap solvers and related observed performance measurements, we want to improve the performance of these solvers by modifying some of their features but under some constraints and requirements:

A Lagrangian solver (or description) must be used (allowing for multiphysics coupling);

To reduce communication, we prefer using collocated cellcentered variables rather than a staggered scheme;

To reduce communication, we prefer using a direct multidimensional remap solver rather than split alternating direction projections;

The method can be simply extended to secondorder accuracy (in space and time);

The solver must be able to be naturally extended to multimaterial flows.
Before going further, let us comment the above requirements. The second requirement should imply the use of a cellcentered Lagrange solver. Fairly recently, Després and Mazeran in [9] and Maire et al. [10] (with highorder extension in [11]) have proposed pure cellcentered Lagrangian solvers based on the reconstruction of nodal velocities. In our study, we will examine if it is possible to use approximate and simpler Lagrangian solvers in the Lagrange+remap context, in particular for the sake of performance. The fourth assertion requires a full multidimensional remapping step, probably taking into account geometric elements (deformation of cells and edges) if we want to ensure highorder accuracy remapping. To summarize, our requirements are somewhat contradictory, and we have to find a good compromise between some simplificationsapproximations and a loss of accuracy (or properties) of the numerical solver.
2 Lagrange Step
As examples, let us consider the compressible Euler equations for twodimensional problems. Denoting i ∈ {1,2}, p and E the density, velocity, pressure and specific total energy respectively, the mass, momentum and energy conservation equations read(1)where , , π _{2} = (p, 0)^{ T }, π _{3} = (0, p)^{ T } and π _{4} = p u . For the sake of simplicity, we will use a perfect gas equation of state . The speed of sound c is given by .
For any volume V _{ t } that moves with the fluid, from the Reynolds transport theorem we havewhere ν is the normal unit vector exterior to . This leads to a natural explicit finite volume scheme in the form(2)
In (2), the superscript “L” indicates the Lagrange evolution of the quantity. Any Eulerian cell K is deformed into the Lagrangian volume at time , and into the Lagrangian volume K^{n+1,L} at time t^{n+1}. The pressure flux terms through the edges A are evaluated at time in order to get secondorder accuracy in time. Of course, that means that we need a predictor step for the velocity field at time (not written here for simplicity).
Notations
To simplify, in all what follows we will use the notation .
3 Rethinking the Remapping Step
The remapping step considers the remapping fields defined at cell centers K ^{ n+1,L } on the initial (reference) Eulerian mesh with cells K. Let us denote a linear operator that reconstructs piecewise polynomial functions from a discrete field U ^{ n+1,L } defined at cell centers of the Lagrangian mesh at time t ^{ n+1}. The remapping process consists in projecting the reconstructed field on piecewiseconstant functions on the Eulerian mesh, according to the integral formula(3)
Practically, there are many ways to consider the projection operation (3). One can assemble elementary projection contributions by computing the volume intersections between the reference mesh and the deformed mesh. But this procedure requires the computation of all the geometrical elements. Moreover, the projection needs local tests of projection with conditional branching (think about the very different cases of a compression and expansion). Thus the procedure is not SIMD and potentially leads to a loss of performance. The incremental remapping can also be interpreted as a transport/advection algorithm, as emphasized by Dukowicz and Baumgardner [12] that appears to be better suited for SIMD treatments.
Let us now write a different original formulation of the remapping step. In this step, there is no time evolution of any quantity, and in some sense we have ∂ _{ t } U = 0, that we write
We decide to split up this equation into two substeps:
Each convection problem is wellposed on the time interval [0, Δt^{n}]. Let us now focus into these two steps and the way to solve them.
3.1 Backward Convection in Lagrangian Description
After the Lagrange step, if we solve the backward convection problem (4) over a time interval Δt ^{ n } using a Lagrangian description, we have(6)
Actually, from the cell K^{n+1,L} we go back to the original cell K with conservation of the conservative quantities. For (conservation of mass), we haveshowing the variation of density by volume variation. For , it is easy to see that both velocity and specific total energy are kept unchanged in this step:
Thus, this step is clearly computationally inexpensive.
3.2 Forward Convection in Eulerian Description
From the discrete field defined on the Eulerian cells K, we then solve the forward convection problem over a time step Δt^{n} under an Eulerian description. A standard finite volume discretization of the problem will lead to the classical time advance scheme(7)for some interface values defined from the local neighbor values . We finally get the expected Eulerian values at time t^{n+1}.
Notice that from (6) and (7) we have also(8)thus completely defining the remap step under the finite volume scheme form (8). One can notice that neither mesh intersections nor geometric considerations are required to achieve this remapping process. The finite volume form (8) is now suitable for a straightforward vectorized SIMD treatment. From (8) it is easy to achieve secondorder accuracy for the remapping step by usual finite volume tools (MUSCL reconstruction + secondorder accurate time advance scheme for example).
3.3 Full Lagrange+Remap Time Advance
Let us note that the Lagrange+remap scheme is actually a conservative finite volume scheme: putting (2) into (8) gives for all :(9)that can also be written(10)
We recognize in (10) pressurerelated fluxes and convective fluxes that define the whole numerical flux.
3.4 Comments
The finite volume formulation (10) is attractive and seems rather simple at first sight. But we should not forget that we have to compute a Lagrangian velocity vector field where the variables should be located at cell nodes to return a wellposed deformation. Moreover, expression (10) involves geometric elements like the length of the deformed edges . Among the rigorous collocated Lagrangian solvers, let us mention the GLACE scheme by DesprésMazeran [9] and the cellcentered EUCCLHYD solver by Maire et al. [10]. Both are rather computationally expensive and their secondorder accurate extension is not easy to achieve.
Although it is possible to couple these Lagrangian solvers with the fluxbalanced remapping formulation, it is also of interest to think about ways to simplify or approximate the Lagrange step without losing secondorder accuracy. One of the difficulty in the analysis of Lagrangeremap schemes is that, in some sense, space and time are coupled by the deformation process.
Below, we derive a formulation that leads to a clear separation between space and time, in order to simply control the order of accuracy. The idea is to make the time step tend to zero in the Lagrangeremap scheme (method of lines [13]), then exhibit the instantaneous spatial numerical fluxes through the Eulerian cell edges that will serve for the construction of an explicit finite volume scheme. Because the method needs an approximate Riemann solver in Lagrangian form, we will call it a Lagrangeflux scheme.
4 Derivation of a SecondOrder Accurate LagrangeFlux Scheme
From the intermediate conclusions of the discussion 3.4 above, we would like to be free from any rather expensive collocated Lagrangian solver. However, such a Lagrangian solver seems necessary to correctly and accurately define the deformation velocity field at time .
In what follows, we are trying to deal with time accuracy in a different manner. Let us come back to the Lagrange+remap formula (10). Let us consider a “small’’ time step t that fulfills the usual stability CFL condition. We have
By making t tend to zero, (t > 0), we have A^{L}(t/2) → A, , , , then we get a semidiscretization in space of the conservation laws. That can be seen as a particular method of lines [13]:(11)
We get a classical finite volume methodwith a numerical flux Φ_{ A } whose components are(12)
In (11), pressure fluxes and interface normal velocities can be computed from an approximate Riemann solver in Lagrangian coordinates (for example the Lagrangian HLL solver, [14]). Then, the interface states should be computed from an upwind process according to the sign of the normal velocity . This is interesting because the resulting flux has similarities with the socalled Advection Upstream Splitting Method (AUSM) flux family proposed by Liou [15], but the construction here is different and, in some sense, justifies the AUSM splitting.
To get higherorder accuracy in space, one can use a standard MUSCL reconstruction + slope limiting process involving classical slope limiters like for example Sweby’s limiter function [16]:(13)with β ∈ [1, 2] for achieving second order accuracy. At this stage, because there is no time discretization, everything is defined on the Eulerian mesh and fluxes are located at the edges of the Eulerian cells. This is one originality of this scheme compared to the legacy staggered Lagrangeremap scheme that has to use variables defined on the Lagrangian cells.
To get highorder accuracy in time, one can then apply a standard highorder time advance scheme (RungeKutta 2 (RK2), etc.). For the secondorder Heun scheme for example, we have the following algorithm:

Compute the time step Δt ^{ n } subject to some CFL condition;

Predictor step. MUSCL reconstruction + slope limitation on primitive variables ρ, u and p. From the discrete states , compute a discrete gradient for each cell K and interpolated values at cell interfaces;

Use a Lagrangian approximate Riemann solver to compute pressure fluxes and interface velocities from the MUSCL reconstructed values at the interface;

Compute the upwind edge values according to the sign of ;

Compute the numerical flux as defined in (12);

Compute the first order predicted states :

Corrector step. MUSCL reconstruction + slope limitation: from the discrete values , compute a discrete gradient for each cell K and interpolated values at cell interfaces;

Use a Lagrangian approximate Riemann solver to compute pressure fluxes and interface velocities from the MUSCL reconstructed values at the interface;

Compute the upwind edge values according to the sign of ;

Compute the numerical flux as defined in (12);

Compute the secondorder accurate states at time t ^{ n+1}:
One can appreciate the simplicity of the numerical solver compared to the legacy staggered Lagrangeremap algorithm. The complexity of the latter mainly due to various kernel (function) calls and too much communications is detailed in [4]. Here the predictor and corrector kernel functions have similar programming codes and there is no intermediate variables to save in memory.
4.1 Lagrangian HLL Approximate Solver
A HLL approximate Riemann solver [14] in Lagrangian coordinates can be used to easily compute interface pressure and velocity. For a local Riemann problem made of a left state U _{ L } and a right state U _{ R }, the contact pressure p ^{⋆} is given by the formula(14)and the normal contact velocity u ^{⋆} by(15)leading to simple formulas easily implementable in the secondorder Heun time integration scheme.
4.2 Numerical Experiments
As example, we test the Lagrangeflux scheme presented in Section 4 on few onedimensional shock tube problems. We use a RK2 time integrator and a MUSCL reconstruction with the Sweby slope limiter given in (13).
4.2.1 Sod’s Shock Tube [17]
The initial data defined on space interval [0, 1] is made of two constant states and with initial discontinuity at x = 0.5. We successively test the method on two uniform mesh grids made of 100 and 400 cells, respectively. The final computational time is T = 0.23 and we use a CFL number equal to 0.25 and a Sweby limiter with coefficient β = 1.5. On Figure 2, one can observe a nice behavior of the Euler solver, with sharp discontinuities and a low numerical diffusion into the rarefaction fan even for the coarse grid.
Figure 2 Secondorder Lagrangeflux scheme on reference Sod’s shock tube problem on two different mesh grids, 100 and 400 respectively. The solid red line is the analytic solution and blue points are the numerical solution. 
4.2.2 TwoRarefaction Shock Tube
The second reference example is a case of two movingaway rarefaction fans under nearvacuum conditions [14]. It is known that the Roe scheme breaks down for this case. The related Riemann problem is made of the left state and right state . The final time of T = 0.16. We again test the method on a coarse mesh (200 points) and a fine mesh (2000 points). Numerical results are given in Figure 3. The numerical scheme appears to be robust especially in nearvacuum zones where both density and pressure are close to zero.
Figure 3 Secondorder Lagrangeflux scheme on a double rarefaction nearvacuum case on two different mesh grids, 200 and 2000 points respectively. The solid red line is the analytic solution. 
4.2.3 Case With Sonic Rarefaction and Supersonic Contact
The following shock tube case with initial data and generates a sonic 1rarefaction, a supersonic 2contact discontinuity and a 3shock wave. The final time is T = 0.16 and we use 400 mesh points, CFL = 0.25. Numerical results show a good capture of the rarefaction wave, without any nonentropic expansionshock (Fig. 4).
Figure 4 Shock tube problem with sonic rarefaction fan, 400 mesh points, final time T is T = 0.18, the solid red line is the analytic solution. 
4.2.4 Case of ShockShock Hypersonic Shock Tube
This last shock tube problem is a violent flow case made of two hitting fluids with and . Both 1wave and 3wave are shock waves, and the right state has a Mach number of order 40. Final time is T = 0.16, we use 400 grid points and the limiter coefficient β is here 1 (equivalent to the minmod limiter). One can observe a nice behavior of the solver: there is no pressure or velocity oscillations at the contact discontinuity, and the numerical scheme preserves the positivity of density, pressure and internal energy (Fig. 5).
Figure 5 Shockshock case with subsonichypersonic shock, 400 mesh points, final time T is T = 0.16, the solid red line is the analytic solution. 
5 Performance Results
In this section, we compare the new Lagrangeflux scheme to the reference (staggered) Lagrangeremap scheme in terms of Millions of Cell Updates Per second (denoted hereafter as MCUPs). Tests are performed on a standard 2 × 8 cores Intel Sandy Bridge server E52670. Each core has a frequency of 2.6 GHz, and supports Intel’s AVX (Advanced Vector Extension) vector instructions. For multicore support, we use the multithreading programming interface OpenMP.
In the reference staggered Lagrangeremap solver [4], thermodynamic variables are defined at grid cell centers while velocity variables are defined at mesh nodes. Due to this staggered discretization and the Alternating Direction (AD) remapping procedures, this solver is decomposed into nine kernels. This decomposition mechanically decreases the mean Arithmetic Intensity (AI) of the solver.
On the other hand, the Lagrangeflux algorithm consists in only two kernels with a relative high arithmetic intensity which leads to two ComputeBound (CB) kernels. In the first kernel, named PredictionLagrangeFlux(), an appropriate Riemann solver is called, face fluxes are computed and variables are updated for the prediction. The second kernel, named CorrectionLagrangeFlux(), is close in terms of algorithmic steps, since it also uses a Riemann solver, computes fluxes and updates the variables for the correction part of the solver.
In order to assess the scalability and absolute performance of both schemes, we present in Table 1 a performance comparison study. First, we notice that the baseline performance – i.e. the single core absolute performance without vectorization – is quite similar for the two schemes, as can be seen in the first column. However, we the Lagrangeflux scheme has a better scalability, due to both vectorization and multithreading: our Lagrangeflux implementation achieves a speedup of 31.1X with 16 cores and AVX vectorization (a 2X speedup from AVX vectorization, which is ideal for this solver with many divisions instructions, and an almost 16X perfect speedup from the multicore usage) whereas the reference Lagrangeremap algorithm reaches a speedup of only 14.8X. This difference is mainly due to the memorybound kernels composing the reference Lagrangeremap scheme. Indeed, speedups due to AVX vectorization and multithreading are not ideal for kernels with relatively low intensity since memory bandwidth is shared between cores.
Performance comparison between the reference Lagrangeremap solver and the Lagrangeflux solver in MCUPs, using different machine configurations. Scalability (last column) is computed as the speedup of the multithreaded vectorized version compared to the baseline purely sequential version. Tests are performed for fine meshes, such that kernel data lies in DRAM memory. The Lagrangeflux solver exhibits superior scalability, because it has — by design — better arithmetic intensity.
6 Dealing With Multimaterial Flows
Although this is not the aim and the scope of the present paper, we would like to give an outline of the possible extension of Lagrangeflux schemes to compressible multimaterial/multifluid flows, i.e. flows that are composed of different immiscible fluids and separated by free boundaries.
For pure Lagrange+remap schemes, usually VOFbased Interface Reconstruction (IR) algorithms are used (Young’s PLIC, etc.). After the Lagrangian evolution, for the cells that host more than one fluid, fluid interfaces are reconstructed. During the remapping step, one has to evaluate the mass fluxes per material. From the computational point of view and computing performance, this process generally slows down the whole performance because of many array indirections in memory and specific treatment into mixed cells along with the material interfaces.
If the geometry of the Lagrangian cells is not completely known (as in the case of Lagrangeflux schemes), we have to proceed differently. A possibility is to use Interface Capturing (IC) schemes, e.g. conservative Eulerian schemes that evaluate the convected mass fluxes through Eulerian cell edges. This can be achieved by the use of antidiffusive/lowdiffusive advection solvers in the spirit of DesprésLagoutière’s limiteddownwind scheme [18] of VoFire [19]. In a recent work [20], we have analyzed the origin of known artifacts and numerical interface instabilities for this type of solvers and concluded that the reconstruction of fully multidimensional gradients with multidimensional gradient limiters was necessary. Thus, we decided to use lowdiffusive advection schemes with a Multidimensional Limiting Process (MLP) in the spirit of [21]. The resulting method is quite accurate, shapepreserving and free from any artifact. We show some numerical results in the next two subsections. Let us emphasize that the interface capturing strategy perfectly fits with the Lagrangeflux flow description, and the resulting schemes are really suitable for vectorization (SIMD feature) with data coalescence into memory.
6.1 Interface Capturing for Pure Advection Problems
Let us first present numerical results on a pure scalar linear advection problem. The forwardbackward advection case proposed by Rider and Kothe [22] is a hatshaped function which is advected and stretched into a rotating velocity field, leading to a filament structure. Then by applying the opposite velocity field, one have to retrieve the initial disk shape. In Figure 6, we show the numerical solutions obtained on a grid 500^{2} for both the passive scalar field of variable z ∈ [0, 1] and the quantity z(1 − z) that indicates the numerical spread rate of the diffuse interface. One can conclude the good behavior of the method, providing both stability, accuracy and shape preservation.
Figure 6 Validation of the lowdiffusive interface capturing scheme on the KotheRider advection case, mesh 500^{2}. Plot of the passive variable z ∈ [0, 1] on the left column, and interface smearing indicator μ = z(1 − z) in the right column. a) At initial time t = 0, b) at time t = 3, c) at time t = 6, d) at time t = 9, e) at final time t = 12. 
6.2 ThreeMaterial Hydrodynamics Problem
We then consider our interface capturing method for multifluid hydrodynamics. Because material mass fractions are advected, that isone can use the advection solver of these variables but we prefer dealing with the conservative form of the equationsin order to enforce mass conservation [23]. It is known that Eulerian interfacecapturing schemes generally produce spurious pressure oscillations at material interfaces [24, 25]. Some authors propose locally non conservative approaches [26, 27] to prevent from any pressure oscillations. Here we have a full conservative Eulerian strategy involving a specific limiting process which is free from any pressure oscillation at interfaces, providing strong robustness. This will be explained in a next paper.
The multimaterial Lagrangeflux scheme is tested on the reference “triple point” test case, found e.g. in Loubère et al. [28]. This problem is a threestate twomaterial 2D Riemann problem in a rectangular vessel. The simulation domain is as described in Figure 7. The domain is split up into three regions Ω_{i}, i = 1, 2, 3 filled with two perfect gases leading to a twomaterial problem. Perfect gas equations of state are used with γ_{1} = γ_{3} = 1.5 and γ_{2} = 1.4. Due to the density differences, two shocks in subdomains Ω_{2} and Ω_{3} propagate with different speeds. This creates a shear along the initial contact discontinuity and the formation of a vorticity. Capturing the vorticity is of course the difficult part to compute. We use a rather fine mesh made of 2048 × 878 points (about 1.8 M cells). On Figure 8, we plot the density, pressure, temperature fields respectively and indicate the location of the three material zones. One can observe a nice capture of both shocks and contact discontinuities. The vortex is also captured accurately.
Figure 7 Geometry and initial configuration for the reference triplepoint case. 
Figure 8 Results on the multimaterial “triple point’’ case (perfect gases) using a collocated Lagrange+remap solver + lowdiffusive interface capturing advection scheme, mesh made of 2048 × 878 points. Final time is T = 3.3530. a) Density field, b) pressure field, c) temperature field, d) colored representation of material indicators. 
Concluding Remarks and Perspectives
This paper is primarily focused on the redesign of Lagrangeremap hydrodynamics solvers in order to achieve better HPC nodebased performance. We have reformulated the remapping step under a finite volume flux balance, allowing for a full SIMD algorithm. As an unintended outcome, the analysis has lead us to the discovery of a new promising family of Eulerian solvers – the socalled Lagrangeflux solvers – that show simplicity of implementation, accuracy, and flexibility with a highperformance capability compared to the legacy staggered Lagrangeremap scheme. Interface capturing methods can be easily plugged for solving multimaterial flow problems. Ongoing work is focused of the effective performance modeling, analysis and measurement of Lagrangeflux schemes with comparison of reference “legacy” Lagrangeremap solvers including multimaterial interface capturing on different multicore processor architectures. Because of the multicore+vectorization scalability of Lagrangeflux schemes, one can also expect highperformance on manycore coprocessors like Graphics Processing Units (GPU) or Intel MIC. This will be the aim of next developments.
References
 Hirt C.W., Amsden A.A., Cook J.L. (1974) An arbitrary Lagrangian–Eulerian computing method for all flow speeds, J. Comput. Phys. 14, 227–253. (In the text)
 Benson D. (1992) Computational methods in Lagrangian and Eulerian hydrocodes, Comput. Methods Appl. Mech. Eng. 99, 235–394.
 Youngs D. (2007) The Lagrangeremap method, in F.F. Grinstein, L.G. Margolin, W.J. Rider (eds), Implicit large Eddy simulation: Computing turbulent flow dynamics, Cambridge University Press. (In the text)
 Poncet R., Peybernes M., Gasc T., De Vuyst F. (2016) Performance modeling of a compressible hydrodynamics solver on multicore CPUs, in IOS Ebook: Parallel Computing: on the road to Exascale, Series “Advances in parallel computing”, Joubert G.R. et al. (ed.), pp. 449–458, DOI: 10.3233/9781614996217449. (In the text)
 Williams S., Waterman A., Patterson Roofline D. (2009) An insightful visual performance model for multicore architectures, Commun. ACM 52, 65–76. (In the text)
 Treibig J., Hager G. (2010) Introducing a performance model for bandwidthlimited loop kernels, Proceedings of the Workshop “Memory issues on Multi and Manycore Platform” at PPAM 2009, Lecture Notes in Computer Science 6067, 615–624. (In the text)
 Stengel H., Treibig J., Hager G., Wellein G. (2015) Quantifying performance bottlenecks of stencil computations using the ExecutionCacheMemory model, Proc. ICS’15, Proc. of the 29th ACM on Int. Conf. on Supercomputing, pp. 2072016, ACM, New York, ISBN: 9781450335591, DOI: 10.1145/2751205.2751240. (In the text)
 Colella P., Woodward P.R. (1984) The numerical simulation of twodimensional fluid flow with strong shocks, J. Comput. Phys. 54, 115–173. [NASA ADS] [CrossRef] [MathSciNet] (In the text)
 Després B., Mazeran C. (2005) Lagrangian gas dynamics in two dimensions and Lagrangian systems, Arch. Rational Mech. Anal. 178, 327–372. [CrossRef] [MathSciNet] (In the text)
 Maire P.H., Abgrall R., Breil J., Ovadia J. (2007) A cellcentered Lagrangian scheme for compressible flow problems, SIAM J. Sci. Comput. 29, 1781–1824. [MathSciNet] (In the text)
 Maire P.H. (2009) A highorder cellcentered Lagrangian scheme for twodimensional compressible fluid flows on unstructured meshes, J. Comput. Phys. 228, 2391–2425. [CrossRef] (In the text)
 Dukowicz J.K., Baumgardner J.R. (2000) Incremental remapping as a transport/advection algorithm, J. Comput. Phys. 160, 318–335. [CrossRef] (In the text)
 Schiesser W.E. (1991) The Numerical Method of Lines, Academic Press, ISBN 0126241309. (In the text)
 Toro E.F. (2009) Riemann solvers and numerical methods for fluid dynamics. A practical introduction, 3rd edn., Springer, ISBN 9783540252023, DOI: 10.1007/b79761. [CrossRef] (In the text)
 Liou M.S. (1996) A sequel to AUSM: AUSM+, J. Comput. Phys. 129, 2, 364–382. [NASA ADS] [CrossRef] [MathSciNet] (In the text)
 Sweby P.K. (1984) High resolution schemes using fluxlimiters for hyperbolic conservation laws, SIAM J. Numer. Anal. 21, 995–1011. [NASA ADS] [CrossRef] [MathSciNet] (In the text)
 Sod G.A. (1971) A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws, J. Comput. Phys. 27, 1–31. [NASA ADS] [CrossRef] [MathSciNet] (In the text)
 Després B., Lagoutière F. (2001) Contact discontinuity capturing schemes for linear advection and compressible gas dynamics, J. Sci. Comput. 16, 479–524. [CrossRef] (In the text)
 Després B., Lagoutière F., Labourasse E., Marmajou I. (2010) An antidissipative transport scheme on unstructured meshes for multicomponent flows, IJFV 7, 30–65. (In the text)
 De Vuyst F., Béchereau M., Gasc T., Motte R., Peybernes M., Poncet R. (2016) Stable and accurate lowdiffusive interface capturing advection schemes, Proc. of the MULTIMAT 2015 Conference Würsburg, arXiv:1605.07091 (preprint). (In the text)
 Park J.S., Kim C. (2011) Multidimensional limiting process for discontinuous Galerkin methods on unstructured grids, in Computational Fluid Dynamics 2010, Kuzmin A. (ed.), Springer, pp. 179–184, ISBN 9783642178832. [CrossRef] (In the text)
 Rider A.J., Kothe D.B. (1998) Reconstructing volume tracking, J. Comput. Phys. 141, 112–152. [CrossRef] (In the text)
 BernardChampmartin A., De Vuyst F. (2014) A low diffusive Lagrangeremap scheme for the simulation of violent airwater freesurface flows, J. Comput. Phys. 274, 19–49. [CrossRef] (In the text)
 Abgrall R. (1996) How to prevent pressure oscillations in multicomponent flow calculations: A quasi conservative approach, J. Comput. Phys. 125, 150–160. [CrossRef] [MathSciNet] (In the text)
 Saurel R., Abgrall R. (1999) A simple method for compressible multifluid flows, SIAM J. Sci. Comput. 21, 1115–1145. [CrossRef] [MathSciNet] (In the text)
 Farhat C., Rallu A., Shankaran S. (2008) A higherorder generalized ghost fluid method for the poor for the threedimensional twophase flow computation of underwater implosions, J. Comput. Phys. 227, 7640–7700. [CrossRef] (In the text)
 Bachmann M., Helluy P., Jung J., Mathis H., Müller S. (2013) Random sampling remap for compressible twophase flows, Comput. Fluids 86, 275–283. [CrossRef] (In the text)
 Loubère R., Maire P.H., Shashkov M., Breil J., Galera S. (2010) ReALE: A reconnectionbased arbitraryLagrangianEulerian method, J. Comput. Phys. 229, 4724–4761. [CrossRef] (In the text)
Cite this article as: F. De Vuyst, T. Gasc, R. Motte, M. Peybernes and R. Poncet (2016). LagrangeFlux Schemes: Reformulating SecondOrder Accurate LagrangeRemap Schemes for Better NodeBased HPC Performance, Oil Gas Sci. Technol 71, 64.
All Tables
Performance comparison between the reference Lagrangeremap solver and the Lagrangeflux solver in MCUPs, using different machine configurations. Scalability (last column) is computed as the speedup of the multithreaded vectorized version compared to the baseline purely sequential version. Tests are performed for fine meshes, such that kernel data lies in DRAM memory. The Lagrangeflux solver exhibits superior scalability, because it has — by design — better arithmetic intensity.
All Figures
Figure 1 “Legacy” staggered Lagrangeremap scheme: thermodynamical variables are located at cell centers (circles) whereas velocity variables are located at cell nodes (squares). a) Eulerian cell, b) Lagrange step, c) Remapping step. 

In the text 
Figure 2 Secondorder Lagrangeflux scheme on reference Sod’s shock tube problem on two different mesh grids, 100 and 400 respectively. The solid red line is the analytic solution and blue points are the numerical solution. 

In the text 
Figure 3 Secondorder Lagrangeflux scheme on a double rarefaction nearvacuum case on two different mesh grids, 200 and 2000 points respectively. The solid red line is the analytic solution. 

In the text 
Figure 4 Shock tube problem with sonic rarefaction fan, 400 mesh points, final time T is T = 0.18, the solid red line is the analytic solution. 

In the text 
Figure 5 Shockshock case with subsonichypersonic shock, 400 mesh points, final time T is T = 0.16, the solid red line is the analytic solution. 

In the text 
Figure 6 Validation of the lowdiffusive interface capturing scheme on the KotheRider advection case, mesh 500^{2}. Plot of the passive variable z ∈ [0, 1] on the left column, and interface smearing indicator μ = z(1 − z) in the right column. a) At initial time t = 0, b) at time t = 3, c) at time t = 6, d) at time t = 9, e) at final time t = 12. 

In the text 
Figure 7 Geometry and initial configuration for the reference triplepoint case. 

In the text 
Figure 8 Results on the multimaterial “triple point’’ case (perfect gases) using a collocated Lagrange+remap solver + lowdiffusive interface capturing advection scheme, mesh made of 2048 × 878 points. Final time is T = 3.3530. a) Density field, b) pressure field, c) temperature field, d) colored representation of material indicators. 

In the text 