Two-level domain decomposition methods for highly heterogeneous Darcy equations . Connections with multiscale methods

— Multiphase, compositional porous media flow models lead to the solution of highly heterogeneous systems of Partial Differential Equations (PDEs). We focus on overlapping Schwarz type methods on parallel computers and on multiscale methods. We present a coarse space [23] that is robust even when there are such heterogeneities. The two-level domain decomposition approach is compared to multiscale methods.


INTRODUCTION
Multiphase, compositional porous media flow models, used in reservoir simulations or basin modeling, lead to the solution of complex non linear systems of Partial Differential Equations (PDEs). These PDEs are typically discretized using a cell-centered finite volume scheme and a fully implicit Euler integration in time in order to allow for large time steps. After Newton type linearization, one ends up with the solution of a linear system at each Newton iteration which represents up to 90 percents of the total simulation elapsed time. The corresponding pressure block matrix is related to the discretization of a Darcy equation with high contrasts and anisotropy in the coefficients. We focus on overlapping Schwarz type methods on parallel computers and on multiscale methods.
Coarse spaces are instrumental in obtaining scalability for domain decomposition methods. For matrices arising from problems with smooth coefficients, it is possible to build efficient coarse spaces based on domain wise constant vectors, see [29] and references therein. For problems with high heterogeneities, these simple coarse spaces do not work well. Here, we present a recent coarse space [23] that is robust even when there are such heterogeneities. We achieve this by solving local generalized eigenvalue problems which isolate the terms responsible for slow convergence. Building efficient coarse spaces is closely related to multiscale methods which also aim to reduce the computational cost of large scale problems.
An outline of the paper is as follows. Section 2 consists in an introduction to one level Schwarz methods. Material is basic but the presentation is quite new. In § 3, we present a recent spectral coarse space which adapts automatically to the heterogeneities of the problem. In § 4 we present results of large scale computations. In § 5, it is compared to multiscale methods. In section 6, we conclude and present prospects on adaptation of the spectral coarse space to finite volume discretizations.

SCHWARZ METHODS
We start with the original Schwarz algorithm [27] at the continuous (i.e. partial differential equations) level whose parallel version is named Jacobi-Schwarz method (JSM). We introduce two variants that are at the origin of the popular additive Schwarz method (ASM) and restricted additive Schwarz (RAS [4]) algorithms. The first one has been the subject of hundreds of papers (see [29] and references therein). The second one is the default parallel solver of the parallel package software PETSc [2]. This presentation shows in a unified setting the connections between these three algorithms.

Three Schwarz Algorithms at the continuous level
Hermann Schwarz was a German analyst of the 19th century. He was interested in proving existence and uniqueness of the Poisson problem. At his time, there were no Sobolev spaces nor Lax-Milgram theorem. The only available tool was the Fourier transform, limited by its very nature to simple geometries. In order to consider more general situations, H. Schwarz devised an algorithm based on solving iteratively Poisson problem set on a union of simple geometries. Let the domain Ω be the union of a disk and a rectangle, see Figure 1 and consider the Poisson problem: Find u : Ω → R such that: −∆u = f in Ω u = 0 on ∂Ω. (1) The Schwarz algorithm is an iterative method based on solving alternatively subproblems in domains Ω 1 and Ω 2 .
A complex domain Ω made from the union of two simple geometries It updates (u n 1 , u n 2 ) → (u n+1 1 , u n+1 2 ) by: then, (2) H. Schwarz proved the convergence of the algorithm and thus the well-posedness of the Poisson problem in complex geometries.
With the advent of digital computers, this method also acquired a practical interest as an iterative linear solver. Subsequently, parallel computers became available and a small modification of the algorithm makes it suited to these architectures. It is sufficient to solve concurrently in all subdomains, i = 1, 2: It is easy to see that if the algorithm converges, the solutions u ∞ i , i = 1, 2 in the intersection of the subdomains take the same values. Indeed, in the overlap Ω 12 := Ω 1 ∩ Ω 2 , let e ∞ := u ∞ 1 − u ∞ 2 . By the last line of (3), we know that e ∞ = 0 on ∂Ω 12 . By linearity of the equation, we also have that e ∞ is harmonic. Thus, e ∞ solves the homogeneous well posed BVP: −∆(e ∞ ) = 0 in Ω 12 e ∞ = 0 on ∂Ω 12 and thus e ∞ = 0 . Algorithms (2) and (3) act on the local functions (u i ) i=1,2 . In order to write algorithms that act on global functions in H 1 (Ω), the space in which problem (1) is naturally posed, we need extension operators, E i so that for a function w i : Ω i → R, E i (w i ) : Ω → R is the extension of w i by zero outside Ω i . We also need partition of unity functions χ i : V Dolean et al. / Two-level domain decomposition methods for highly heterogeneous Darcy equations. Connections with multiscale methods 3 Ω i → R, χ i ≥ 0 and χ i (x) = 0 for x ∈ ∂Ω i and such that: for any function w : Ω → R. This definition of a partition of unity is closer to the computer implementation than the classical definition of a partition of unity functions. There are two ways to write related algorithms that act on functions u n ∈ H 1 (Ω). The first possibility is : Let u n be an approximation to the solution to the Poisson problem (1), u n+1 is computed by solving first local subproblems: and then gluing them together using the partition of unity functions: A second possibility consists in replacing the above formula by a simpler formula not based on the partition of unity: Starting from the original Schwarz algorithm (2) that is sequential, we have thus three continuous algorithms that are essentially parallel: • Algorithm (5)-(6) Restricted Additive Schwarz (RAS) • Algorithm (5)-(7) Additive Schwarz Method (ASM) These algorithms although closely related are different in nature. The JSM method acts on a pair of local functions (u n 1 u n 2 ) whereas RAS and ASM act on a global function u n . Note that in the overlapping region, algorithms RAS and ASM update the solution in a different way. Algorithm ASM seems rather bizarre since it does not converge to the exact solution in the intersection Ω 1 ∩ Ω 2 . But its algebraic form given by (10) when used a preconditioner as explained in the sequel has the advantage to be symmetric positive definite (SPD). On the contrary the algebraic counterpart to RAS given by (9) is unsymmetric.

Schwarz Algorithms at the algebraic level
So far, we have given a continuous presentation of domain decomposition methods. Actually, these methods are used in their algebraic form to solve linear systems arising from the discretization of partial differential equations. We now give the matrix counterpart of these algorithms. For this, we first give a kind of dictionary to go from the continuous level to the discrete one: -the counterparts of a domain Ω and of an overlapping de- -the restriction of a function u : Ω → R to a subdomain -similarly, E i (u i ) the extension by zero of a function u i : Ω i → R to a function Ω → R corresponds at the algebraic level to R T i U i where R T i is the transpose of matrix R i and U i ∈ R #N i is a local vector.
-the counterparts of partition of unity functions -After discretization, solving Poisson problem (1) amounts to solving a SPD linear system -Solving a local subproblem in a subdomain Ω i such as in equations (3) or (5) corresponds at the algebraic level to solving linear systems of the form R i A R T i U n+1 i = F n i . We now define, at the algebraic level, the RAS and ASM algorithms and not JSM since it is seldom used and is more complex to define. As for the counterpart of the RAS algorithm (5)-(6), we give the following definition so that the iterative RAS algorithm reads: As for the counterpart of the ASM algorithm (5)- (7), we give the following definition so that the iterative ASM algorithm reads: As is well known, such fixed point methods are out performed by Krylov based iterative solvers such the conjugate gradient (CG) algorithm of the generalized minimal residual method (GMRES), see the book by Y. Saad [26] and references therein. In our context, using these methods amounts to solve the linear system (8) by a CG algorithm preconditioned by the symmetric preconditioner M AS M or by a GMRES algorithm preconditioned by the unsymmetric preconditioner M RAS . In both cases, the convergence properties are related to the spectral properties of the preconditioned operator M −1 AS M or RAS A. The restricted additive Schwarz method (RAS, see [3]) is the default parallel solver in the PETSc package. For the additive Schwarz method (ASM) many theoretical results have been derived, see [29] and references therein.

ADAPTIVE SPECTRAL COARSE SPACE
The domain decomposition methods presented so far were written for a two subdomain decomposition. Their extension to an arbitrary number N of subdomains (Ω i ) 1≤i≤N is only a matter of notation. It is sufficient in definitions of the previous section to sum over all subdomains from i = 1 to i = N. But, when the number of subdomains is large, plateaus appear in the convergence of Schwarz domain decomposition methods. This is the case even for a regular problem such as the Poisson problem (1). The problem comes from the fact the preconditioner lacks of a global mechanism for exchange of information. Preconditioners RAS and ASM defined in the previous sections are called one-level methods. Data are exchanged only from one subdomain to its direct neighbors. But the solution in each subdomain depends on the right handside in all subdomains. Let us call N d the number of subdomains in one direction. Then, for instance, the leftmost domain of Figure 3 needs at least N d iterations before knowing something about the value of the right handside f in the rightmost subdomain. The length of the plateau is thus typically related to the number of subdomains in one direction and therefore to the notion of scalability met in the context of high performance computing.
In order to achieve scalability of the domain decomposition (DD) method, we introduce two-level domain decomposition methods via a coarse space correction. The precise motivation and linear algebra setting are given in § 3.1 for a problem with smooth coefficients. A new approach § 3.2 introduced in [22,23] is necessary to achieve scalability for arbitrary highly heterogeneous coefficients. A condition number estimate theorem supports the approach. The method is tested in § 3.4 on difficult heterogeneous test cases including channelized medium. In practice, the coarse space seems to be optimal, see Table 10 in § 3.4.1.

Need for a two-level method
When the number of subdomains is large, plateaus appear in the convergence of Schwarz domain decomposition methods. The remedy will consist in the introduction of a Japhet, Nataf and Roux [19] two-level preconditioner via a coarse space correction.
The problem and its cure are well illustrated in Figure 4 for a domain decomposition into 64 strips. The one level method has a long plateau in the convergence whereas with a coarse space correction convergence is quite fast. For instance, in Figure 2 we consider a 2D problem decomposed into 2 × 2, 4 × 4 and 8 × 8 subdomains. For each domain decomposition, we have two curves: one with a one-level method and the second with a coarse grid correction which is denoted by M2. We see that for the one-level curves, the plateau has a size proportional to the number of subdomains in one direction. In two-level methods, a small problem of size typically the number of subdomains couples all subdomains at each iteration. It is through this mechanism that scalability can be achieved. Decomposition into many subdomains From a condition number point of view, stagnation corresponds to a few very low eigenvalues in the spectrum of the preconditioned problem. Using preconditioners M AS M or M RAS , we can remove the influence of very large eigenvalues of the coefficient matrix, which correspond to high frequency modes. Indeed, it has been proved that for a SPD matrix, the largest eigenvalue of the preconditioned system by M AS M is bounded by the number of colors needed to color the overlapping subdomains with different colors for adjacent subdomains, see [29] or [26] for instance. But the small eigenvalues still exist and hamper the convergence. These small eigenvalues correspond to low frequency modes and represent certain global information. We need a suitable coarse grid space to efficiently deal with them.
A classical remedy consists in the introduction of a coarse grid or coarse space correction that couples all subdomains at each iteration of the iterative method. This is closely related to deflation technique classical in linear algebra, see Nabben and Vuik's paper [21] and references therein. Suppose we have identified the modes corresponding to the slow convergence of the iterative method used to solve the linear system: Ax = b with a preconditioner M, in our case a domain decomposition method. That is, we have some a priori knowledge on the small eigenvalues of the preconditioned system M −1 A. For a Poisson problem, these slow modes correspond to constant functions that are in the null space (kernel) of the Laplace operators. For a homogeneous elasticity problem, they correspond to the rigid body motions. Let us call Z the rectangular matrix whose columns correspond to these slow modes. There are algebraic ways to incorporate these informations to accelerate the domain decomposition method. We give here the presentation that is classical in domain decomposition methods. In the case where A is SPD, the starting point is to consider the minimization problem It corresponds to finding the best correction to an approximate solution y by a vector Zβ in the vector space spanned by the n c columns of Z. This problem is equivalent to min β∈R nc 2(Ay − b, Zβ) 2 + (AZβ, AZβ) 2 and whose solution is: Thus, the correction term is: Let R 0 := Z T and r = b − Ay be the residual associated to the approximate solution y, the best correction that belongs to the vector space spanned by the columns of Z reads: When using such an approach with an additive Schwarz method (ASM), it is natural to introduce an additive correction to the additive Schwarz method: where the R i 's (1 ≤ i ≤ N) are the restriction operators to the overlapping subdomains. The structure of the two level preconditioner M −1 AS M,2 is thus the same than in the one level method. Compared to the one level Schwarz method where only local subproblems have to be solved in parallel, the two-level method adds the solution of a linear system in a sequential way with the matrix R 0 AR T 0 . This problem couples all subdomains at each iteration. But this matrix is a small O(N × N) square matrix and the extra cost is negligible compared to the gain. Indeed, in Table 1 we display the iteration counts for a decomposition of the domain in an increasing number of subdomains. In figure 4, we see that without a coarse grid correction, the convergence curve of the one level Schwarz method has a very long plateau that can be bypassed by a two-level method.  Convergence curves with and without a coarse space correction for a decomposition into 64 strips We give here a precise definition to Z for a Poisson problem. This construction was introduced in Nicolaides [24]. We take Z so that it has a domain decomposition structure. Z is defined by vectors which have local support in the sub-domains and so that the constant function 1 belongs to the vector space spanned by Z. Recall that we have a partition of unity in the following sense: let D i , 1 ≤ i ≤ N, be matrices so that we have: We define Z such that the i-th column of Z is: where 1 is the vector full of ones. The structure of Z is thus the following: The results of Figures 4 and Table 1 were obtained using this method. Convergence curves for a domain with two high permeability layers: long plateaus for one level methods, shorter plateaus for Nicolaides coarse spaces and no plateau for DtN coarse space For problems of the type −div(α∇u) = f with smooth coefficients α, this coarse space gives good results. But for highly heterogeneous coefficients, there is still a plateau in the convergence of the solver. The results in Figure 5 correspond to a domain which has two layers with high values of α. The computational domain has a stripwise decomposition into 64 subdomains. Two Schwarz methods are tested with either no coarse space correction, a Nicolaides coarse space or a spectral coarse space defined in § 3.2 so a total of six curves. The curves with very long plateaus are one level Schwarz methods. The curves Z Nico (pink curves) correspond to two level Schwarz methods with a Nicolaides coarse space, equation (13). The plateau in the convergence is not as large but still exists. With the spectral coarse space of the next section, we automatically select two modes per subdomain and get the convergence curves Z D2N (black curves).

Spectral coarse space for highly heterogeneous problems
We now propose a construction of the coarse space that will be suitable for parallel implementation and efficient for accelerating the convergence for problems with highly heterogeneous coefficients of the type with α a positive function. We still choose Z such that it has the form where N is the number of overlapping subdomains. But W i is now a rectangular matrix whose columns are based on the harmonic extensions of the eigenvectors corresponding to small eigenvalues of the Dirichlet-to-Neumann (DtN) map in each subdomain Ω i . Remark that the sparsity of the coarse operator E = Z T AZ is a result of the sparsity of Z.
The nonzero components of E correspond to adjacent subdomains. More precisely, let us consider first at the continuous level the Dirichlet to Neumann map DtN Ω i . Let u : and Γ i is the interface boundary. If the subdomain is not a floating one (i.e. ∂Ω i ∩ ∂Ω ∅), we use on the part of the global boundary, the boundary condition from the original problem B(u) = 0. To construct the coarse grid subspace, we use the low frequency modes associated with the DtN operator: where diam(Ω i ) is the diameter of subdomain Ω i . The rationale for this choice is that the condition number estimate of Theorem 3.2 is then similar to the one of Theorem 3.1 for the Poisson problem. Note the term α in the generalized eigenvalue problem (18). We first motivate our choice of a coarse space based on DtN map. We write the original Schwarz method at the continuous level, where the domain Ω is decomposed in a oneway partitioning, see Figure 6. The error e n i between the Fast or slow convergence of the Schwarz algorithm.
current iterate at step n of the algorithm and the solution u |Ω i (e n i := u n i − u |Ω i ) in subdomain Ω i at step n of the algorithm satisfies: On the 1D example sketched in Figure 6, we see that the rate of convergence of the algorithm is related to the decay of the harmonic functions e n i in the vicinity of ∂Ω i via the subdomain boundary condition. Indeed, a small value for this boundary condition leads to a smaller error in the entire subdomain thanks to the maximum principle.
Moreover a fast decay for this value corresponds to a large eigenvalue of the DtN map whereas a slow decay corresponds to small eigenvalues of this map because the DtN operator is related to the normal derivative at the interface and the overlap is thin. Thus the small eigenvalues of the DtN map are responsible for the slow convergence of the algorithm and it is natural to incorporate them in the coarse grid space.
We now explain why we only keep eigenvectors with eigenvalues smaller than 1/diam(Ω i ) in the coarse space. We start with the constant coefficient case α = 1. In this case, the smallest eigenvalue of the DtN map is zero and it corresponds to the constant function 1. For a shape regular subdomain, the first positive eigenvalue is of order 1/diam(Ω i ), see [12]. Keeping only the constant function 1 in the coarse space leads to good numerical convergence, see figure 4. In the case of high contrasts in the coefficient α, the smallest eigenvalue of the DtN map is still zero. But due to the variation of the coefficients we may possibly have positive eigenvalues smaller than 1/diam(Ω i ). In order to have a convergence behavior similar to the one of the constant coefficient case, it is natural to keep all eigenvectors with eigenvalues smaller than 1/diam(Ω i ).
To obtain the discrete form of the DtN map, we consider the variational form of (17). Let's define the bilinear form With a finite element basis {φ k }, the coefficient matrix of a Neumann boundary value problem in domain Ω i is Let I (resp. Γ i ) be the set of indices corresponding to the interior (resp. boundary) degrees of freedom and n Γ i := #(Γ i ) the number of interface degrees of freedom. Note that for the whole domain Ω, the coefficient matrix is given by With block notations, we have But the matrix A (i) Γ i Γ i refers to the matrix prior to assembly with the neighboring subdomains and thus cannot be simply extracted from the coefficient matrix A. In problem (17), we use Dirichlet boundary conditions. Let U ∈ R n Γ i and u := k∈Γ i U k φ k . Let v := k∈I V k φ k + l∈Γ i V l φ l be the finite element approximation of the solution of (17). Let V I = (V k ) k∈I , we have with obvious notations: A variational definition of the flux reads So the variational formulation of the eigenvalue problem (18) reads for all φ k , k ∈ Γ i and where tr(α) is the restriction of α Ω i to Γ i . Let M α,Γ i be the weighted mass matrix The compact form of equation (21) is (20), the discrete form of (18) is a generalized eigenvalue problem Let (U λ , λ) be an eigenpair, we need its harmonic extension to the degrees of freedom of domain Ω i , that is the vector: Actually, there is more practical way to directly compute these eigenpairs. For subdomain Ω i , let Oil & Gas Science and Technology -Rev. IFP Energies nouvelles we compute the lowest eigenvalues of the sparse generalized eigenvalue problem: This can be done using standard linear algebra library such as ARPACK.
The step by step procedure on how to construct the rectangular matrices W i in the coarse space matrix Z, see (16), is summed up in Algorithm 1. We call this procedure Algorithm 1 Construction of the spectral coarse space In parallel for all subdomains 1 ≤ i ≤ N, (16) with for each 1 ≤ i ≤ N, W i the rectangular matrix with m i columns defined by

Let Z de defined as in
3. Note R 0 := Z T and compute the coarse matrix E: The two-level preconditioner is given by eq. (11): We also use Z D2N to denote the coarse space constructed by this method. Its construction is fully parallel. Similarly we call Z Nico the method of Nicolaides or the corresponding coarse space. Let us remark that when the subdomain does not touch the boundary of Ω, the lowest eigenvalue of the DtN map is zero and the corresponding eigenvector is a constant vector. Thus, Z Nico and Z D2N coincide. As we shall see in the next section, when a subdomain has several jumps of the coefficient, taking Z Nico is not efficient and it is necessary to take Z D2N with more than one mode per subdomain.
This construction has been analyzed in [7]. We first recall a classical result. Let Z be a "Nicolaides type" coarse space We have, see [29]: Theorem 3.1 Let M AS M,2 be the two-level additive Schwarz method with the "Nicolaides" coarse space , we have for α = 1 the following condition number estimate: where δ is the size of the overlap between the subdomains and H the subdomain size and C does not depend on the number of subdomains.
But, for α discontinuous, C would depend on the jumps of α.
Let Z be the coarse space built via Algorithm 1, we prove under technical assumptions on α Theorem 3.2 Under the monotonicity of α in the overlapping regions, we have the following condition number estimate: where δ i is the size of the overlap of domain Ω i and C is independent of the jumps of α and of the number of subdomains.
Note that if α = 1 and we take only one mode per subdomain (m i = 1), we have for a regular interface λ i 2 ≃ 1/H i (see [12]) and we recover the "classical" estimate. Now in the general case, if the number of modes associated to subdomain Ω i m i is chosen so that, λ i m i +1 ≥ 1/H i , the convergence rate will be analogous to the constant coefficient case.

Comparison with a volumic spectral coarse space
The DtN spectral coarse space makes use of eigenvectors of the local Dirichlet to Neumann maps. There is thus a clear relationship with recent works by Galvis and Efendiev [9,10,[13][14][15] where the coarse space is based on eigenvalues of the volumic operator The drawback of their approach is that the coarse space is too large. This is easy to see in 1D. In Figure 7, we represent the function α in a subdomain Ω i . We have many discontinuities inside the domain. But, whatever the number of discontinuities is, our DtN map is a two by two matrix. The number of eigenvectors of the DtN map is two. Thus, the coarse space is made of two vectors per subdomain at most. But, the size of a volumic spectral coarse space is equal to the number of high heterogeneities islands. This phenomena also holds in the 2D case. In Figure 8 we show permeability field with high heterogeneities islands. For this case, only 4 eigenvalues of the DtN map are smaller than 1.3e − 4 whereas the other ones are larger than 0.9. Whereas twenty 1D example with many high heterogeneities islands volumic eigenvalues of eq. (24) are smaller than 3.8e−3 and the others are larger than 150. Numerical tests show that in this case only four eigenvalues are enough for having an efficient coarse space. In the papers by Galvis and Efendiev, they noted this fact and they have a complex procedure to get rid of the useless eigenvectors. In our case, the method adapts automatically to the permeability field. In Figure 9 we show typical DtN and volumic eigenvectors.  2D example with many high heterogeneities islands

First Numerical tests
We solve the model problem (15) on the domain Ω = [0, 1] 2 using standard continuous, piecewise linear (P 1 ) finite elements. The diffusion α is a function of x. The boundary condition is u = 0 on the left side boundary and ∂u ∂n = 0 on the remainder. The corresponding discretizations and data structures were obtained by using the software FreeFem++ [17] in connection with the METIS partitioner [20]. We will test the standard additive Schwarz (ASM) and the restricted additive Schwarz (RAS) preconditioners with and without coarse space, in particular comparing the new coarse space based on harmonic extensions of eigenvectors of the local DtN operators with the standard coarse space that is Eigenvectors for: DtN map (left) and the volumic operator (right) (Freefem++ plots) the piecewise constant space of Nicolaides [24]. In the tables and figures, +Nico means the use of the Nicolaides coarse space (14) and +DtN the use of the spectral coarse space defined in Algorithm 3.2. We test the method on (fairly irregular) overlapping partitions into N subdomains. These overlapping partitions are built by adding layers to non-overlapping ones obtained, e.g., via graph partitioner METIS (see Figure 13).
In Table 2, we test robustness w.r.t. the heterogeneities. The domain Ω contains layers with jumps in the coefficients ranging from 1 to 10 6 . We have 32 subdomains. The iteration counts depend weakly on the size of the jump in the coefficients. In Figure 10 Table 3, we show how many eigenvalues were selected in the coarse space. In Table 4, we vary the domain decomposition for the same permeability field. We now present a selection of difficult test cases in a more systematic way, with so called inclusions and channels.
We solve two test cases with known difficulties. The diffusion coefficient α is highly heterogeneous and takes values between 1 and approximately 2 × 10 6 and contains both high-permeability inclusions and channels. First of all we will analyze the performance of the method by increasing the number of channels and then by increasing the number of inclusions.

11
. We use the ASM preconditioner within conjugate gradients (CG) and the RAS preconditioner within GMRES, and in each case we stop the iteration process, when the relative residual is smaller than 10 −6 .  We start with only inclusions and add the channels one by one as shown in Figure 14 (Test Problem 1). When there are no channels, α varies between 1 and 10 6 , as indicated by the colors in Figure 14. With all three channels present, α varies between 1 and 2.8 × 10 6 . The corresponding convergence results are given in Table 5. Our algorithm performs significantly better. The piecewise constant coarse space has virtually no effect on the performance of either ASM or RAS, leading to iteration numbers that differ little from the results without any coarse grid in all four cases. Our new coarse space, on the other hand, is fully robust to the coefficient variation and to the addition of channels, and it leads to a gain of at least a factor 8 compared to the onelevel method in all cases. The situation is similar, if we use deflation-based coarse grid correction [21] with the same coarse spaces (see Table 6). However, the absolute numbers of iterations are reduced almost by a factor 2 in this case. Our theory applies equally to this case (see e.g. [16] for details), but we will not include any further numerical results with deflation-based coarse grid correction.  space that we build with our automatic selection strategy: for each number of channels we give min j m j and max j m j , as well as the global coarse space size n H = j m j and the average number of modes included per subdomain n H /N. For comparison, we also include information on the total number n Γ j of eigenmodes of the discrete DtN operator on each subdomain. We note that adding channels does not have a big influence on the size of the coarse space; we only need three additional eigenvectors in the case of three channels compared to the case of no channels.  Then, using the same domain and the same partition we successively add inclusions without any channels present as shown in Figure 15 (Test Problem 2). The results are in Table 8. Again, the piecewise constant coarse space is not working at all for this test problem. The DtN-based coarse space is almost completely robust to an increase in the number of inclusions and requires again significantly less iterations than the one-level method in all cases. Note that the subdomain partition (cf. Figure 13) is not aligned with the inclusions at all (cf. Figure 15). In Table 9 we see that also in this test problem, the coarse space size grows only very slowly with the number of inclusions and even in the hardest case n H is only 53 (cf. the dimension n of V h,0 , and thus of A is 25600).

Practical optimality of the spectral coarse space
The last series of tests, in Table 10, aims to prove that the number m j of eigenvectors per subdomain chosen by our au- Figure 15 Test   tomatic algorithm is indeed optimal in some sense. For Test Problem 1 with one channel (see Figure 14), we first reduce the number of coarse basis functions per subdomain by one, this has a huge influence on the iteration count. Then we add one basis function per subdomain and notice that this has much less effect. This suggests that the selection process we have designed is indeed the best compromise between enriching the coarse grid and solving a reasonably sized coarse problem.  Iteration numbers when reducing or increasing the number m j of coarse basis functions per subdomain given by the automatic selection strategy.

ADAPTIVE COARSE SPACE ON HPC PLATFORMS
Results in this section are based on a related method to the DtN coarse space method namely the Geneo method. The principle of this coarse space construction is similar in that the coarse space is built after solving local eigenvalues problems. It suffices to change the right hand side in the generalized eigenvalue problem (23). The new eigenvalue problem is of the form see [28] for more details. The Geneo coarse space is in practice quite close to the DtN coarse space. Its main advantage is to work not only for scalar PDEs but also for systems of PDEs as the elasticity system for instance. When applied to scalar PDEs, DtN and Geneo coarse spaces are almost identical and give very similar results. As a result, in order to have a general purpose code, we focused in HPC developments and tests on the Geneo method. Results in this section were obtained on Curie, a Tier-0 system for PRACE2 (Partnership for Advanced Computing in Europe) composed of 5040 nodes made of 2 eight-core Intel Sandy Bridge processors clocked at 2.7 GHz. The interconnect is an InfiniBand QDR full fat tree. We want here to assess the capability of our framework to scale: 1. strongly: for a given global mesh, the number of subdomains increases while local mesh sizes are kept constant (i.e. local problems get smaller and smaller), 2. weakly: for a given global mesh, the number of subdomains increases while local mesh sizes are refined (i.e. local problems have a constant size).
We don't time the generation of the mesh and partition of unity. Assembly and factorization of the local stiffness matrices, resolution of the generalized eigenvalue problems, construction of the coarse operator and time elapsed for the convergence of the Krylov method are the important procedures here. The Krylov method used is the GMRES, it is stopped when the relative residual error is inferior to ε = 10 −6 in 2D, and 10 −8 in 3D. All the following results where obtained using a LDL T factorization of the local solvers A δ i and the coarse operator E using MUMPS (with a MPI communicator set to respectively MPI_COMM_SELF or the communicator created by our library binding master processes). First, the system of linear elasticity with highly heterogeneous elastic moduli is solved with a minimal geometric overlap of one mesh element. Its variational formulation reads: where λ and µ are the Lamé parameters such that µ = E 2(1 + ν) (E being Young's modulus and ν Poisson's ratio). They are chosen to vary between two set of values, (E 1 , ν 1 ) = (2 · 10 11 , 0.25), and (E 2 , ν 2 ) = (10 8 , 0.4).
ε is the linearized strain tensor and f the volumetric forces (here, we just consider gravity). Because of the overlap and the duplication of unkowns, increasing the number of subdomains means that the number of unknowns increases also slightly, even though the number of mesh elements (triangles or tetrahedra in the case of FreeFem++) is the same. In 2D, we use piecewise cubic basis functions on an unstructured global mesh made of 110 million elements, and in 3D, piecewise quadratic basis functions on an unstructured global mesh made of 20 million elements. This yields a symmetric system of roughly 1 billion unkowns in 2D and 80 million unknowns in 3D. The geometry is a simple [0; 1] d ×[0; 10] beam (d = 1 or 2) partitioned with METIS.
Solving the 2D problem initially on 1 024 processes takes 227 seconds, on 8192 processes, this time is reduced to 31 seconds (quasioptimal speedup). With that many subdomains, the coarse operator E is of size 121 935 × 121 935. It is assembled and factorized in 7 seconds by 12 master processes. For the 3D problem, the wall-clock time is initially 373 seconds. At peak performance, near 6 144 processes, the time is reduced to 35 seconds (superoptimal speedup). Linear elasticity test cases. 2D on the left, 3D on the right. Strong scaling Then, the coarse operator is of size 92 160 × 92 160 and is assembled and factorized by 16 master processes in 11 seconds.
Moving on to the weak scaling propreties of our framework, the problem we now solve is a scalar equation of diffusivity with highly heterogeneous coefficients (varying from 1 to 10 5 ) on [0; 1] d (d = 2 or 3). Its variational formulation reads: The targeted number of unkowns per subdomains is kept constant at approximately 800 thousands in 2D, and 120 thousands in 3D (once again with P 3 and P 2 finite elements respectively).

Figure 17
Diffusion equation test cases. 2D on the left, 3D on the right. Weak scaling In 2D, the initial extended system (with the duplication of unkowns) is made of 800 million unkowns and is solved in 141 seconds. Scaling up to 12 288 processes yields a system of 10 billion unkowns solved in 172 seconds, hence an efficiency of 141 172 ≈ 82%. In 3D, the initial system is made of 130 million unkowns and is solved in 127 seconds. Scaling up to 8192 processes yields a system of 1 billion unkowns solved in 152 seconds, hence an efficiency of 127 152 ≈ 83%.

CONNECTIONS WITH MULTISCALE METHODS
Multiscale methods are an active field of research, for finite element methods see [11] (and references therein) and for multiscale finite volume methods see for example [18].
In § 5 we compare them with our two-level spectral coarse space. In § 5.1, we first recall basic facts on multiscale discretizations and their difficulties with arbitrary channelized flows, § 5.1.1.1. Although the goals of multiscale methods and DD methods are different, they have many related features that we compare in § 5.2. In particular, both methods build coarse basis functions. The superiority of the spectral coarse space comes the fact that the number and the shape of basis functions adapts automatically to the heterogeneities of the medium even for channelized media. This is not always the case for multiscale methods.

Figure 18
Fine mesh Ω h , coarse mesh Ω H and a dual coarse cell around point (x i , y j ).
Consider a problem set on a fine grid Ω h (see Figure 18) that is too large to be solved. We approximate u h via a coarse problem set on a coarse mesh Ω H . Defining a multiscale methods involve three steps: -pre-computation of a multiscale basis functions; -global formulation at the coarse level; -reconstruction of a fine scale solution.
There are of course many variants to deal with these topics and we don't try to give a completer review on the subject. We present here basic materials in order to compare multiscale methods with our DtN coarse space. In particular, we shall see that the DtN approach is more general and systematic.

Multiscale basis functions
The preferred and most common technique is to use oversampling, see [11]. For simplicity, we start with the original non oversampling approach.
We consider a structured two-dimensional grid. A coarse element is typically denoted by K. Let (x i , y j ) be a coarse grid vertex. We recall the construction of the corresponding coarse basis function φ H,i, j . For both Multiscal Finite Element Method (MsFEM) and Multiscale Finite Volume (MsFV), a standard choice is to solve the fine scale equation on the four neighboring coarse elements K i±1/2, j±1/2 , see Figure 18: in K i±1/2, j±1/2 φ i±1/2, j±1/2 = g i±1/2, j±1/2 on ∂K i±1/2, j±1/2 (29) where g i±1/2, j±1/2 is a piecewise affine function such that g i±1/2, j±1/2 (x i y j ) = 1 and is zero on the three other vertices of ∂K i±1/2, j±1/2 . Then, function φ H,i, j is defined by taking restrictions of φ i±1/2, j±1/2 to the coarse elements adjacent to the coarse grid vertex (x i , y j ): This construction presents unwanted boundary layers effects. In order to fix this problem, functions φ i±1/2, j±1/2 are computed on a coarse cell K δ i±1/2, j±1/2 enlarged with a few layers of fine elements, see Figure 18. Then a coarse basis function φ H,i, j (x, y) is computed as a linear combination of the restrictions of functions φ i±1/2, j±1/2 to K i±1/2, j±1/2 . This leads to a non conformal basis. When the coefficients of the operator L h are sufficiently smooth, this basis is adequate. This procedure is called oversampling.
When the coefficients are heterogeneous across these edges (left picture of Figure 19) the basis functions should see the heterogeneities. For this purpose, the piecewise linear Dirichlet boundary conditions are replaced by oscillatory boundary condition obtained by solving a reduced elliptic problem along the boundary of the coarse cell. The Dirichlet data must be in the kernel of the tangential part of the partial differential operator in eq. (29). An algebraic implementation of this construction was proposed in [25] and [31].
Note that for finite volume schemes for problems with high anisotropies, the cell problems (29) can also be modified by replacing Dirichlet boundary conditions (BC) by Neumann BC on some parts of ∂K δ i±1/2, j±1/2 , see § 6.3. of [18].

Multiscale basis functions for channelized permeability distributions
When the problem has strong heterogeneities, typically, three situations occur as shown in Figure 19: 1. Isolated heterogeneities 2. One heterogeneous channel Figure 19 Left: isolated heterogeneities, Middle: one channel, Right: two channels 3. Several heterogeneous channels The first case is well treated by multiscale methods as recalled above. For the second case, it has been noticed that it might not be sufficient: "It has been shown that the accuracy of purely local methods may be low if the permeability field has structures with very long correlation lengths" quoted from [18]. This is the case for instance with channels, see for example middle and right pictures of Figure 19. In order to fix this problem, iterative constructions of the coarse basis functions have been proposed, see [5,6,8]. Iterations take place between the coarse scale global flow and the fine scale local flow. A coarse space is first built with the oversampling method. It is used to obtain a coarse and then a fine grid solution. Then, the coarse basis functions are corrected by taking the coarse edge values of this solution as Dirichlet boundary conditions in equation (29). This procedure stops with some convergence criterion. To our knowledge, this technique is not supported by theoretical approximation results. The last case with several heterogeneous channels seems to be a concern even for this approach. Indeed, in this case the good coarse space function depend on the flow conditions: "The introduction of wells may additionally change global flow significantly and the coarse properties generated from the two generic global flows might lose accuracy in some cases. For such problems, the T can be recomputed, based on the actual well configuration and flow rates, using a local-global procedure analogous to that applied here. The overall issue of robustness with respect to global boundary conditions is complex and will be addressed in detail in a future paper." quoted from [5]. The problem comes from the fact that the number of coarse basis functions attached to the cell should be at least equal to the number of channels crossing the cell, see [30]. But, in multiscale methods even in the more algebraic ones as [31], the number of degree of freedom per aggregate is prescribed in advance. For a scalar problem, only one coarse basis function is assigned to a coarse grid vertex. It is thus not possible to cover all possible flow configurations.

Coarse problem
This step consists in approximating the fine scale solution u h by defining a suitable coarse space problem whose solution, denoted by u H , belongs to the space spanned by the coarse basis functions (φ H,i, j ) i, j . We consider first finite element formulation and then finite volume approximations.
For a finite element method, a Galerkin approach is usually used. For all i, j, let us denote by Z i, j the vector of the components of φ H,i, j on the basis of the fine FEM. We collect all these vectors in a rectangular matrix Z. Let A h denote the matrix associated to the fine FEM so that the matrix form of the fine FEM reads: where U h are the components of the solution u h on the fine FEM basis. Let us define A H := Z T A h Z and the coarse problem by: Find u H := i, j U H,i, j φ H,i, j such that This way, the coarse approximation U H satisfies a variational formulation in the coarse space spanned by the coarse basis functions φ H,i, j . For finite volume methods, the Galerkin approach can be used as well. But then, conservativity and monotonicity of the initial finite volume scheme are lost. In order to recover them, a dual coarse mesh is introduced, see i, j U H,i, j φ H,i, j such that conservativity is satisfied on the boundaries of the dual cells. Typically, a 9-point stencil is thus obtained and for anisotropic problems the monotonicity of the finite volume scheme on the fine mesh is lost on the coarse problem. Then a modified 7-point stencil is sought that still ensures conservativity, see [18].

Fine scale solution
This step is actually optional since U H contains fine scale information via the coarse basis functions φ H,i, j . In Ms-FEM, one can further improve the solution by solving local Dirichlet boundary value problems in each coarse element K i±1/2, j±1/2 : Thus U H could be used in principle to solve for instance a transport equation at the fine level. But the method to compute U H is not conservative which is then a big drawback. In multiscale finite volume methods, the reconstruction is based on solving Neumann problems in each coarse cell so that local conservativity is satisfied. As a result, the fine scale solution is not continuous at the edges of the coarse elements.

Comparison with the DtN two-level Schwarz method
The foremost difference between multiscale methods and two-level domain decomposition methods is the goal itself. In the first method, one wants to approximate the solution of the fine scale problem whereas in the second method one wants to solve the fine scale equations (28) or equivalently equation (31). In this respect, multiscale methods are competitors to homogenization or upscaling methods, see [11]. But in contrast to these methods, multiscale methods don't lead to some kind of average PDE models. They are a framework to provide a cheap way via a coarse solve to approximate the solution u h of a (too) large scale system of equations. In this respect, they can be seen as approximate two level solvers and could be used as well as preconditioners for Krylov type methods such as CG, GMRES or BICGSTAB. They are thus naturally comparable to two level DD methods such as the DtN approach described above. This has been noticed by several authors, see [31], [1] or [25] and references therein. There, the multiscale approach is simply a framework to provide an adequate coarse space. Thus, we compare the coarse basis functions constructions and give indications of their relative efficiency as preconditioners.
Moreover, the involved tools have some similarities.

Oversampling and overlapping
For both methods the fine mesh is decomposed into aggregates of fine elements. But, -in MsFEM or MsFV, the aggregates consist of some dozens of elements -whereas in DDM, subdomains may be quite large the construction of the coarse problem is essentially parallel. But, -in multiscale methods, we have a fine grain parallelism -in DDM, we have a coarse grain parallelism Oversampling is very reminiscent of overlapping in DDM. In both approaches, coarse basis functions are harmonic functions in overlapping aggregates (subdomains in DDM and extended coarse cells in oversampling multiscale methods). In order to use them in a coarse problem, they have to be cast to functions defined in the whole domain Ω h . In multiscale methods this was done via procedure which is somehow "brutal" since the resulting function is not even continuous on Ω h . In DD methods, the local coarse space functions are multiplied by a kind of partition of unity (the local matrices (D i ) 1≤i≤N , see formula (12)) before the extension by zero in the whole domain Ω h so that the resulting function is continuous on Ω h . In [10], the authors use partition of unity functions in MsFEM methods.

Local and global effects
Coarse basis functions are used to define a coarse problem and they are harmonic in the aggregates. But, -in multiscale methods, they mimic finite element basis function: only one such function per aggregate.
-In the spectral coarse space of § 3.2, the number of coarse basis functions per aggregate is not prescribed a priori. Figure 20 Medium with channels In multiscale methods, the number of degree of freedom per aggregate is prescribed in advance. For a scalar problem, only one coarse basis function is assigned to a coarse grid vertex. It has been explained in § 5.1.1.1 that even for sophisticated multiscale methods, it might not be enough for channelized media with changing flow conditions. Whereas the spectral coarse space construction works well for arbitrary channels configuration, see Figure 20, as we have seen in § 3.4.

CONCLUSION AND PROSPECTS
After having introduced Schwarz domain decomposition methods, we have presented the spectral coarse space introduced in [23] and later analyzed in [7]. It is practically optimal in the sense that a larger coarse space does not bring much improvement while a smaller one has a poor performance, see § 3.4.1. Moreover, the method adapts automatically to the heterogeneities of the problem. If necessary, more than one coarse basis function is allowed per aggregate. This construction is supported by a theoretical condition number estimate independent of the heterogeneities of the physical problem, see Theorem 3.2. In coarse spaces built using multiscale methods, such a theorem cannot hold since only one degree of freedom is allowed per aggregate, see § 5.2. This is why these methods have problems with channelized permeability distributions. A cure proposed [10] is to use a suitable spectral coarse space as a basis for a MsFEM method. Our DtN coarse space could be used in MsFEM methods in the same manner.
The spectral coarse space was developed, tested and analyzed in the finite element framework. In reservoir or basin simulations, finite volume discretizations are usually preferred to finite element discretizations. The extension of the spectral coarse space of § 3 to a finite volume discretization is thus mandatory for its use in subsurface modeling. As explained in § 3.2, the rationale behind this coarse space is written in terms of the original model i.e. in terms of partial differential equations. Thus the basis of the method does not depend on the discretization scheme. Therefore the definition and implementation of the spectral coarse space in a finite volume discretization will demand some work but can definitely be done. It would improve the method introduced in [31] by selecting in a sure (see Theorem 3.2) and optimal (see § 3.4.1) manner more efficient coarse spaces when the channelized character of the permeability distribution makes it necessary.