A Review of Recent Advances in Discretization Methods, a Posteriori Error Analysis, and Adaptive Algorithms for Numerical Modeling in Geosciences

Two research subjects in geosciences which lately underwent significant progress are treated in this review. In the first part, we focus on one key ingredient for the numerical approximation of the Darcy flow problem, namely the discretization of diffusion terms on general polygonal/polyhedral meshes. We present different schemes and discuss in detail their fundamental numerical properties such as stability, consistency, and robustness. The second part of the paper is devoted to error control and adaptivity for model problems in geosciences. We present the available a posteriori estimates guaranteeing the maximal overall error and show how the different error components can be identified. These estimates are used to formulate adaptive stopping criteria for linear and nonlinear solvers, time step choice adjustment, and adaptive mesh refinement. Numerical experiments illustrate such entirely adaptive algorithms.


INTRODUCTION
Recently, there has been an increased interest and significant progress in two subjects related to the numerical approximation of geosciences problems: the conception of novel discretization schemes for diffusion terms on almost arbitrary polygonal/polyhedral meshes and the development of a posteriori error estimates and of adaptive algorithms. The study of new schemes is a key ingredient to simulate more realistic models including complex geometric features and physical properties. The use of a posteriori-driven algorithms is a promising way of compensating the increased computational cost for complex models. This paper provides an overview of some recent advances in both fields. The material is organized as follows.
In Section 2 we introduce the basic model of geosciences treated in this work, the compositional multi-phase Darcy flow problem. Its sub-models, the single-phase steady, single-phase unsteady, and two-phase immiscible incompressible unsteady Darcy flow problems will also be considered in the paper in order to pertinently illustrate individual issues.
Section 3 summarizes some recent advances in discretization schemes for diffusion terms on general polygonal/polyhedral meshes. After discussing some general properties that are relevant both from the theoretical and practical point of view, we briefly present three families of numerical methods which have received extensive attention over the last few years. More specifically, Section 3.3 is devoted to multi-point finite volume (and mixed finite element) methods, Section 3.4 presents a few examples of lowest-order variational methods, while Section 3.5 focuses on discontinuous Galerkin methods. For all the methods we provide a concise introduction stating the main principles, some examples of actual schemes, and discuss their properties in detail. In the discussion we pay special attention to practical issues concerning the implementation and/or in the integration into existing codes.
In Section 4 we then present fully computable, guaranteed a posteriori error estimates successively for the three submodels mentioned above and for the compositional model itself. These estimates allow to certify the error committed in a numerical approximation. Moreover, they enable to distinguish and estimate separately the different error components, such as the spatial discretization error, the temporal discretization error, the linearization error, or the algebraic solver error. This distinction then gives rise to entirely adaptive algorithms, where in addition to the common time step choice and adaptive mesh refinement, the linear and nonlinear iterative solvers are steered by adaptive stopping criteria. We shall see that this typically leads to important computational savings.

THE COMPOSITIONAL DARCY MODEL
The compositional Darcy model describes the flow of several fluids through a porous medium occupying the space region Ω ⊂ R d , d = 2, 3, (assumed to be fixed for simplicity of exposition) over the time interval (0, t F ). We consider a system where matter is present in different phases from the set P = {p}, each containing one or more components from the set C = {c}. The number of phases and components are respectively denoted by N P and N C . A synthetic description of the system which accounts for the fact that a component may only be present in selected phases is provided by the binary component-phase matrix M = (m cp ) c∈C, p∈P such that, for all c ∈ C and all p ∈ P, For all c ∈ C we denote by P c ⊂ P the set of phases in which the component c is present. Symmetrically, for all p ∈ P, C p ⊂ C denotes the set of components present in the phase p. The governing equations are inferred from the general principles of mass and energy conservation supplemented by a suitable set of algebraic closure relations. For the sake of simplicity, it is assumed in what follows that all the phases are present. When this is not the case, the model can be modified as outlined in the work of Coats et al. [28], where an additional, discrete-valued unknown accounting for the phases present in each point of the domain is added associated to a flash calculation to enforce local equilibrium. In what follows we also assume that the temperature is fixed and uniform and that no energy source or sink is present, so that the energy balance is trivially verified.
Following [28], the unknowns of the models are the reference pressure P, the saturations S p defined as the volumetric fraction occupied by the phase p ∈ P, and the molar fractions C p,c of each component c ∈ C in the phases p ∈ P c in which it is present. It is convenient, for all p ∈ P, to define the vector of molar fractions C p := (C c,p ) c∈C p . While other choices are possible for the set of unknowns, this one has the advantage of lending itself to discretizations with arbitrary levels of implicitness in the time integration schemes, and allegedly milder nonlinearities. For all p ∈ P, the phase pressure P p is obtained by adding the capillary pressure to the reference pressure, The reference pressure P can be chosen equal to the pressure of a given phase. In this case, the corresponding capillary pressure is identically zero. A more general choice consists in using as a reference pressure a linear combination of phase pressures.
The tensor-valued absolute permeability and the porosity of the medium are denoted by K and φ, respectively. hal-00783068, version 1 -31 Jan 2013 D A. Di Pietro and M Vohralík / A review of discretization methods, a posteriori analysis, and adaptive algorithms 3 For each phase p ∈ P the following properties are relevant to the model (the usual dependence on the unknowns of the model is provided in brackets): (i) molar density ζ p (P p , C p ); (ii) mass density, ρ p (P p , C p ); (iii) viscosity, µ p (P, C p ); (iv) capillary pressure, P c p (S p ); (v) relative permeability, k r p (S p ).
To account for the presence of injection or production wells, for all c ∈ C we denote by q c a source field defined on the space-time domain Ω × (0, t F ). A detailed treatment of well models is out of the scope of the present review. The mass balance for each component yields where n c denotes the number of moles for the component c and, for all p ∈ P, the average phase velocity is given by Darcy's law ( g denotes the oriented gravity acceleration), The pore conservation principle states that the sum of the saturations is equal to one in each point of the space-time domain, as expressed by the following algebraic equation: Phase conservation requires that the sum of the molar fractions of the components present in a given phase be equal to one, c∈C p C c,p = 1 ∀p ∈ P.
Additional algebraic laws are obtained by enforcing the equality of component fugacities, which corresponds to assuming thermodynamic equilibrium in mass transfer between phases. For simplicity of exposition this topic is not addressed here, and we refer to [4,28] for further details. Finally, we assume that the system of PDEs (1) is supplemented with no flow boundary conditions and that suitable initial conditions are derived, e.g., by an equilibrium computation.

General considerations
One of the key ingredients of numerical methods for the compositional Darcy problem of Section 2 is the discretization of the diffusive terms −K∇P p , p ∈ P, appearing in the expression of the average phase velocity (2). From the model standpoint, the permeability field K displays strong heterogeneities reflecting the different mineral composition of geological layers. In addition, the upscaling of fine scale heterogeneities or of extensive fracturing can result in full permeability tensors with large anisotropy ratios. From the discretization standpoint, mesh generation is often performed in a separate stage, and is focused on integrating physical and geometric data from the seismic analysis. As a result, fairly general meshes can be encountered, featuring, e.g., nonmatching interfaces corresponding to geological faults or general polyhedral elements resulting from the degeneration of hexahedral cells in eroded layers. This is notably the case in basin modeling, where deposition and erosion as well as fracturing must be accounted for owing to the long time scales. In reservoir modeling, polyhedral elements may also be present in near wellbore regions, where the use of radial meshes can be prompted by (qualitative) a priori knowledge of the solution. Nonconforming h-refinement can also appear at specific locations where the resolution needs to be increased or when moving fronts are present; cf., e.g., Chainais-Hillairet et al. [27].
Identifying an appropriate discretization of diffusive terms is not an easy task, since several and often mutually contradictory requirements come into play. The most relevant can be summarized as follows: (i) Consistency on general polyhedral meshes and for heterogeneous anisotropic diffusion tensors. It is well known that the classical Two-Point Finite Volume method (TPFV) is consistent only on superadmissible meshes for which the line segments joining the center of a cell and the barycenters of its faces are K-orthogonal to the corresponding face (cf. [54, Lemma 2.1] for further details).
(ii) Robustness with respect to the heterogeneity and anisotropy ratios of the permeability tensor. Technically speaking, robustness is achieved if error estimates are available that are uniform with respect to K; cf., e.g., [38] for a discussion on the robustness of a diffusion-advection model in the presence of impermeable regions. In practice, this means that the discretization error can be bounded in terms of the product of a constant independent of the physical parameters and a power of the meshsize. Of course, this is only possible if the method is appropriately designed and a suitable error measure is chosen.
(iii) Stability, i.e., the ability of the scheme to prevent the amplification of numerical errors. While the consistency requirement has been well assimilated by practitioners (who sometimes formulate it in terms of a patch test, cf., e.g., [90,Chapter 10]), this is often not the case for the equally important stability requirement. In what follows we will mainly focus on stability in an energy-like or similar norm, which is a sufficient requirement for convergence in the linear case. Most of the modern schemes successfully embed this principle. In the nonlinear case, however, tighter forms of stability are required, which are not easy to obtain at a discrete level. A particularly relevant form for degenerate parabolic problems is the discrete maximum principle, which essentially states that, under suitable conditions, the extrema of hal-00783068, version 1 -31 Jan 2013 the solution are to be found on the boundary of the domain. This topic is not addressed in detail in the following discussion. For further insight on the role of the maximum principle in proving the convergence of discretization schemes for the Darcy problem, the reader may consult, e.g., the Ph.D. thesis of Michel [68,Chapter 6] and the references therein. A particularly instructive convergence study of a finite volume (FV) method on superadmissible meshes for unsteady advection-diffusion problems is carried out by Gallouët et al. [57]. A Discrete-Duality Finite Volume (DDFV) method for a large class of nonlinear degenerate hyperbolicparabolic problems is considered by Andreianov et al. [8] under assumptions on the mesh that allow to infer a discrete maximum principle. Numerical enforcement of the maximum principle by nonlinear corrections is considered, e.g., by Cancès et al. in [25], to which we also refer for an up-todate bibliographic section on this topic.
(iv) Low computational cost both in terms of CPU time and parallel communications. This is a key requirement in industrial codes, since competition is often based on reducing the simulation time rather than on improving the resolution of the model. The demand for faster simulators reflects in different ways on the design of numerical schemes. First, with the notable exception of the energy equation in basin modeling, it is generally admitted that lowest-order method are the sole offering an acceptable trade-off between precision and simulation time. More generally, a great care must be spent to avoid the explosion of the number of unknowns while still meeting the previous requirements. Second, the stencil of the scheme should be as compact as possible in order to limit the amount of data exchange in parallel executions. Indeed, on the one hand, the trend for computer manufacturers is to increase computing power by adding multiple cores on a single processor rather than increasing the speed of each core; on the other hand, the users of commercial simulators are often interested in increasing the complexity of the model rather than the mesh resolution. As a result, achieving parallel efficiency requires to handle situations where heavy computations are performed on (relatively) few cells on each processor. In such circumstances, boundary cells and, hence, parallel data exchanges, have a major impact on the overall simulation time. Another related aspect is the availability of efficient parallel preconditioners for the linear systems arising from the discretization and the linearization of the Darcy problem. Although this topic is not addressed in detail here, it is usually acknowledged that (relatively) standard preconditioners such as the algebraic multigrid Boomer AMG available in the HYPRE library [55] perform well when FV or FV-like methods are used and mild heterogeneities are present, while this is not always the case when discontinuous Galerkin (dG) discretizations are employed. The issue of devising good preconditioners for highly heterogeneous problems is an active field of research. For a discussion on this topic as well as for an up-to-date bibliographic section see, e.g., the recent works of Scheichl et al. [77] and Havé et al. [60]. When it comes to dG methods, one of the very few contributions available is the work of Ayuso de Dios and Zikatanov [12].
(v) Local conservation on the computational mesh. This is a somewhat controversial point, since local conservation does not seem mandatory from the analysis point of view. Moreover, possibly after minor modifications or after a reconstruction postprocessing procedure, most discretization methods can exhibit conservation properties, cf., e.g., the discussion in Vohralík and Wohlmuth [83] for mixed finite element and nonconforming finite element methods on general polygonal meshes, Di Pietro [33] for cell centered Galerkin methods, Eymard et al. [54] and Di Pietro and Lemaire [37] for nonconforming finite element and generalized FV methods, Di Pietro and Ern [35,Chapter 4] for dG methods, and, e.g., Ern and Vohralík [52] and the references therein for conforming finite element methods. For the purposes of the present work this property will hence be intended as the availability of a simple expression for the flux and the Darcy velocity (2) rather than its sheer existence. Note that the possibility of reconstruction of locally conservative Darcy fluxes is a key ingredient for the a posteriori analysis of Section 4, see Assumptions 2, 7, and 11 therein.
(vi) Integrability in existing simulators. The introduction of this latter point is essentially dictated by practical considerations. The lifespan of an industrial simulator is usually of ten years or more, and it is possibly only slightly shorter when it comes to large-scale academic codes. During this time, innovations are usually incremental and major changes to incorporate new schemes may only be considered if a clear tradeoff can be identified. As a result, coping with existing codes can play a major role in deciding which numerical method is best-suited for the application at hand. The most widely used industrial reservoir simulators are based on traditional FV methods, and modifications to include radically different schemes are not necessarily possible or economically viable. Similar considerations also apply to large-scale academic simulators. The advances both in the understanding of discretization methods and in the flexibility of programming languages have prompted recent projects to build upon generic bricks that allow to easily modify the numerical formulation. In the context of geosciences, an example is provided by the recent work of Di Pietro et al. [36,39] based on the proprietary platform Arcane [58] and inspired by similar tools for finite element methods [73], whereas an open source example is provided by the DuMuX project [56].

Model problem and notation
In the rest of this section we briefly review some relevant advances in the development and analysis of numerical hal-00783068, version 1 -31 Jan 2013 D A. Di Pietro and M Vohralík / A review of discretization methods, a posteriori analysis, and adaptive algorithms 5 schemes for diffusive terms, and highlight the characteristics of each method based on the points listed Section 3.1. For the sake of simplicity, our main focus is on the steady model problem, −∇·(K∇u) = f in Ω, where Ω denotes the spatial domain and f models a source term. Problem (5) coincides with the single-phase, singlecomponent (N P = N C = 1) Darcy problem provided gravitational effects are neglected. In this case, u represents the (unique) phase pressure. The interest in problem (5) is not merely theoretical, as it is used in practice as a starting point to infer an expression for the diffusive terms −K∇P p , p ∈ P, in the compositional model of Section 2.
In what follows we denote by T h = {T } a mesh, that is to say, a collection of open polyhedra called cells or elements such that T ∈T h T = Ω. The notion of general mesh is related to the range of element shapes and arrangements for which the numerical scheme possesses the mathematical requirements of consistency and stability. As such, its precise definition depends on the scheme itself. Moreover, it is often not possible to provide an optimal definition of admissible mesh, but only formulate sufficient conditions based on computable quantities. In practice, we would like to be able to treat (at least) all meaningful degenerate elements obtained by suppressing edges of a quadrilaterally-faced hexahedron. Such elements are encountered in basin modeling as a result of erosion. In reservoir modeling we may also be interested in allowing more general polyhedral elements to discretize the near wellbore region. A mesh T h is primarily caracterized by a linear dimension h corresponding to the largest diameter of its elements. We say that a method is convergent if a suitable measure of the error tends to zero when h does so. From a mathematical viewpoint, convergence is equivalent to stability for a consistent method (this result is sometimes referred to as the Lax-Richtmyer theorem).
An important notion for all the discretization methods discussed in what follows is that of interface, which defines the way two elements can come into contact. Here, a basic requirement is that nonmatching interfaces should be supported, i.e., two neighboring elements should be allowed to share only portions of faces. In basin modeling, nonmatching interfaces may be used to represent faults; in reservoir modeling they may appear as a consequence of nonconforming h-adaptivity, i.e., the increase of the local mesh resolution obtained by subdiving a mesh element leaving its neighbors untouched. The definition of interface may vary from one method to another. As an example, interfaces are connected and planar for the SUSHI method of [54], while they can be defined as the intersection of two elements (hence non necessarily planar and possibly non connected) when it comes to dG methods; cf. [ For every element T ∈ T h we denote by F T the set of faces that lie on the boundary of T . For every interface F ∈ F i h we choose an arbitrary but fixed orientation for the unit normal n F and enumerate the elements T 1 , T 2 ∈ T h such that F ⊂ ∂T 1 ∩∂T 2 so that the outward unit normal n T 1 ,F coincides with n F .
In what follows we assume that K is piecewise constant on T h , i.e., jumps in the permeability can only occur at interfaces. In practice, this assumption is always verified in geological modeling since the computational mesh is used as a support for the physical properties.

Principles
A class of schemes that is nowadays very popular in the oil industry is that of Multi-point Finite Volume Methods (MPFV), independently introduced in the 90s by Aavatsmark et al. [2] and Edwards and Rogers [45]. The key idea of MPFV methods is to recover consistency on general meshes by extending the dependence of diffusive fluxes to cell unknowns other than the ones associated to the cells sharing a face. The coefficient associated to each cell unknown is usually obtained by solving a local problem. In what follows we exemplify these ideas by outlining the Gmethod proposed by Agélas et al. [5], which generalizes the L-method of Aavatsmark et al. [3]. For a survey of other constructions we refer to Aavatsmark [1]. We cite, in particular, the O-method, for which a convergence analysis under very general assumptions on the permeability tensor has been recently proposed Agélas and Masson [1].
The FV discretization of problem (5) reads where f T := |T | −1 d T f and (Φ T,F ) T ∈T h , F∈F T are numerical fluxes which satisfy the following local conservation property: The single-valued quantity Φ F is termed interface flux. In the TPFV method, the interface flux only depends on the (scalar-valued) cell unknowns u T 1 and u T 2 which are meant to approximate representative values of the solution in the cells T 1 and T 2 , respectively. More specifically, for all T ∈ T h we identify a point x T ∈ T away from the boundary of T to which the value u T is associated. For all T ∈ T h and all F ∈ F T , denote by d T,F the orthogonal distance between x T and F. Using a finite difference approximation of the directional derivative along n F and enforcing relation (7) we infer hal-00783068, version 1 -31 Jan 2013   6 Oil & Gas Science and Technology -Rev. IFP Energies nouvelles To remedy this lack of consistency, Aavatsmark et al. [3] propose a local reconstruction based on d + 1 cells which share a same node usually referred to as the L-method. In Figure 1a we show a two-dimensional example where these faces are denoted by F 1 and F 2 . The key idea is to reconstruct a piecewise affine function on the gray patch only depending on the values of the unknowns u T , u T 1 , and u T 2 and of the permeability field in the cells T , T 1 , and T 2 . This piecewise affine reconstruction is obtained by enforcing pressure continuity and flux conservation across F 1 and F 2 . The interface fluxes Φ F 1 and Φ F 2 are then obtained replacing the exact solution u by the piecewise affine reconstruction in the expression −(K∇u) |T ·n F i , i ∈ {1, 2}. It can be shown that this reconstruction requires the solution of a d×d linear system. An explicit expression for the entries of the system is provided in [5, Lemma 3.1], to which we refer for a more formal and detailed presentation. When the permeability field is heterogeneous, this construction outperforms a Lagrange interpolation based on the cell values u T , u T 1 , u T 2 in terms of consistency, since the resulting piecewise gradient embeds a dependence on the jumps of K via (7). It is a simple matter to realize that, for a given interface F ∈ F i h , there are multiple choices for a second face to perform the construction outlined above; cf. Figure 1b. As a result, several different flux expressions are in principle available. The key idea of the G-method is to define Φ F as a linear combination of all possible fluxes with weights chosen in such a way as to enhance a selected property for the method. A criterion geared towards increased stability is proposed in [5,Section 3.4]. In a different context where the construction is used as a trace interpolator, an accuracyoriented criterion is discussed in [32, Section 2.3].

A numerical example in basin modeling
To assess the properties of the G-method method and provide a comparison with the other methods discussed in what follows, we consider the benchmark problem in basin modeling originally proposed in [5]. The results are obtained using the unified implementation discussed in [39]. Convergence is studied on a mesh family obtained by successive refinements of the two-dimensional basin mesh depicted in Figure 2, which contains both quadrangular and triangular elements as a result of erosion. We consider the following analytical solution: with suitable right-hand side f . The anisotropy ratio is taken equal to 0.1, corresponding to a permeability which is ten times smaller in the vertical than in the horizontal direction. Figure 2: Two-dimensional stratigraphic mesh. The actual aspect ratio is x:y = 10:1 In Figure 3 we evaluate the performance of several schemes including the G-method with respect to different metrics. Accuracy is evaluated in terms of the error on the pressure (L 2 -error) and on its gradient (H 1 -error), both computed using the cell center as a quadrature node. We emphasize that the error on the gradient is perhaps the most significant measure, since it closely relates to fluxes, which are the quantities of interest in oil-related problems. The order of convergence is classically expressed relating the error to the meshsize h, and it measures how fast the error tends to zero as the meshsize does so. When dealing with solutions that are sufficiently regular, one can expect that the error scales as a power of h, corresponding to the different slopes of the curves in the log-log plots of Figures 3a and 3b. Since the comparison includes schemes which feature a different number of unknowns for a given mesh, a fairer comparison consists in relating the error to the number of unknowns N DOF , which we do in Figures 3c and 3d. The memory consumption can be evaluated by plotting the error as a function of the number of nonzero entries in the linear system corresponding to the discretization of problem (5), which is the contents of Figures 3e and 3f.

Discussion
We conclude by summarizing the features of MPFV methods in terms of the points identified in Section 3.1: (i) Consistency. The comparison in Figures 3a-3d shows that, although MPFV are consistent by construction, the lack of an embedded stability mechanism sometimes results in the loss of convergence. This is the case, e.g., for the Lmethod on the first three levels of mesh refinement. Moreover, while MPFV methods stand the competition when the L 2 -error is considered, this is often not the case for the H 1error.
(ii) Robustness. The local problems introduced to construct numerical fluxes in MPFV methods embed a dependence on the permeability tensor. In a way, this adds to the robustness of the method since accuracy may be retained in the heterogeneous anisotropic case. In some cases, however, the conditioning of the local problems may be dramatically affected by rough permeability tensors, resulting in highly inaccurate reconstructions. For some methods, the local problems may even be ill-posed, and backup strategies must be devised; for the G-method cf., e.g., the discussion in [5, Section 3.1]; for the O-method cf., e.g., the discussion in [84,Example 3.10], and [83,Remark 4.2].
(iii) Stability. As we have already mentioned, convergence may sometimes be lost owing to the absence of an intrinsic stability mechanism in MPFV methods. Although stability can be proved in some circumstances (cf., e.g., [5,Lemma 3.4]), this typically requires assumptions on the mesh and on the permeability tensor that are either too stringent or difficult to check in practice.
(iv) Low computational cost. This is one of the key advantages of MPFV methods, which feature only one unknown per cell as is the case for the classical TPFV method. A major difference is, however, that the stencil is typically extended to the neighbors in the sense of nodes. While this is acceptable in most of the cases (most notably for quasihexahedral meshes), it can sometimes lead to very large stencils when tetrahedral meshes are used. The difference with respect to other methods can be appreciated, e.g., in Figures 3e and 3f. As regards the solution of the result-ing linear system, one point that deserves to be mentioned is that AMG preconditioners may be less efficient that in the TPFV case since the global matrix is no longer an M-matrix. When stability is lost, the presence of eigenvalues with opposite sign may significantly affect the solution of the linear system (even when the solution remains unique).
(v) Local conservation. MPFV are classical finite volume methods, hence they inherently provide a simple expression for interface fluxes.
(vi) Integrability in existing simulators. A common practice in finite volume codes is to express the links between cells in terms of a graph. MPFV methods naturally fit this approach, since the sole difference with respect to the TPFV scheme lies in the number of connections. In this respect they are the easiest method to integrate in existing simulators. A possible difficulty that deserves to be mentioned is that the stencil of MPFV methods includes neighbors in the sense of nodes, which may require to redesign the communication patterns in parallel codes.

Mixed finite element methods
We would like to emphasize here that mixed finite element methods, in particular the lowest-order Raviart-Thomas scheme, see Raviart and Thomas [74] or Brezzi and Fortin [23], can be viewed as a member of the MPFV family. Indeed, following Younès et al. [89] and Vohralík [84], they can likewise be implemented with one unknown per mesh element and local flux expressions can be obtained upon solution of local problems on patches of elements. They are in particular tightly related to the MPFA O-method, see [84]. Moreover, they can easily be defined on general polygonal/polyhedral meshes, see Vohralík and Wohlmuth [83]. At the same time, they do not suffer from the loss of convergence as discussed in point (i) above and as observed in Figure 3, neither they exhibit stability problems as those discussed in point (iii) above. A detailed discussion of these issues can be found in [83].

Principles
In recent years, several new methods have been proposed that successfully address the stability issues of MPFV methods. These methods include, in particular, the Mimetic Finite Difference (MFD) methods of Brezzi et al. [20][21][22], the Hybrid Finite Volume (HFV) methods of Eymard et al. [53,54], the Mixed Finite Volume (MFV) method of Droniou and Eymard [42], the finite volume vision of mixed finite elements [83,84], and the cell centered Galerkin methods introduced in [31,32]. The close relation between these methods has recently been investigated in [43]; see also [44], hal-00783068, version 1 -31 Jan 2013 D A. Di Pietro and M Vohralík / A review of discretization methods, a posteriori analysis, and adaptive algorithms 9 where a different formalism is presented leading to Gradient Schemes; [83] gives yet another equivalence viewpoint. Another emerging formalism that deserves to be mentioned is that of Compatible Discrete Operators recently proposed by Bonelle and Ern [18]. In what follows we collectively refer to these methods as Variational Lowest-Order (VLO) methods. While this naming is by no means standard, it underlines the fact that, unlike classical finite volume methods, they are inspired by the weak formulation of the problem.
These schemes rely on a weak or variational formulation and often share significant similarities with mixed or nonconforming Finite Element (FE) methods. To illustrate some important ideas we focus on two examples that are related to nonconforming FE methods. In what follows we assume for the sake of simplicity that the source term f is square-integrable. Moreover, zero pressure boundary conditions are considered, so that the natural space for the solution is V := H 1 0 (Ω) (the space of square-integrable functions with square-integrable derivatives that vanish on ∂Ω). The weak formulation of problem (5) Problem (9) classically admits a unique solution as a consequence of the Poincaré inequality, where C Ω > 0 only depends on the spatial domain Ω. Inequality (10) states that the L 2 -norm of the gradient is a norm and not just a seminorm in V. In other words, if a function v ∈ V is such that ∇v L 2 (Ω) d = 0, then v is the null function. This allows, in particular, to infer a stability result for the bilinear form a provided the permeability tensor K is uniformly elliptic. This means that diffusion occurs along every direction at every point of a cell.
To formulate a convergent approximate version of (9) it is necessary to devise a bilinear form a h which (i) provides a good approximation of a, i.e., is consistent possibly up to an error which decreases with the meshsize h; (ii) is stable based on a discrete version of (10). A key ingredient for obtaining these properties is to design a suitable approximation of the gradient.

Hybrid finite volume and cell centered Galerkin methods
In this section we present two examples of VLO methods which, up to minor modifications, were proposed in [54] and [32], respectively.
A remark that can be exploited to design a gradient approximation is that Green's formula (cf., e.g., [7, Theorem 3.2.1]) together with the planarity of faces yields, for all T ∈ T h and v smooth enough, (11) where, for X ⊂ Ω, ϕ X denotes the average value of ϕ on X, while n T,F is the unit normal to F pointing out of T . Formula (11) suggests that, introducing the face unknowns v F := (v F ) F∈F h and interpreting them as average values over faces, a gradient approximation is given by the piecewise It is assumed henceforth that v F = 0 for all F ∈ F b h , which amounts to strongly enforcing the zero pressure boundary condition. A drawback of the gradient approximation defined by (12) is that it does not satisfy a discrete version of (10), that is to say, one can have G h (v F ) L 2 (Ω) d = 0 even if v F is not null. This is the case, e.g., for the mesh  Figure 4, where a (tedious) hand calculation shows that the matrix of the linear system obtained enforcing G h (v F ) |T = 0 for all T ∈ T h has kernel dimension equal to 2.

Stabilizing using residuals
A possible strategy to recover a discrete Poincaré inequality consists in adding a consistent subgrid correction to the expression (12). For every cell T ∈ T h we fix one interior point x T ∈ T such that T is star-shaped with respect to x T . We introduce the cell unknowns with P T,F denoting the F-based pyramid with apex x T (cf. Figure 5) and where d T,F denotes again the orthogonal distance between x T and F, x F is the barycenter of F, and η > 0 is a userdefined parameter. The discretization of (9) reads: Find u : It can be shown that the bilinear form a hfv h admits the following alternative expression: To interpret the correction (15), we introduce the piecewise affine reconstruction R h such that, for all T ∈ T h , where v := (v T , v F ). Plugging (18) into (15) yields As a result, since |P T, , the term in the second line of (17) can be alternatively written which shows that it is nothing but a least squares penalization of the difference between u F and R h (u) F . The consistency of this term stems from the fact that it vanishes when the exact solution is piecewise affine on T h . Different choices are possible for the penalty parameter η in (15). The choice η = √ d is advocated in [54] since it allows to recover the TPFV method on superadmissible meshes, whereas it has been recently shown in [37] that the choice η = d leads to interesting analogies with the Crouzeix-Raviart element. The fact that the discrete gradient (14) satisfies a discrete Poincaré inequality has been proved in [

Stabilizing using jumps
Starting from (18), an alternative way of recovering a discrete Poincaré inequality is to introduce a least-square penalization of interface jumps inspired by the work of Arnold [10]. For all F ∈ F i h with F ⊂ ∂T 1 ∩ ∂T 2 we introduce the jump and (weighted) average operators defined by where λ i := K |T i n F ·n F represents the permeability in the normal direction. For the sake of brevity, on boundary faces we conventionally set ϕ = {ϕ} = ϕ. Stability hinges in this case on the following discrete Poincaré inequality valid for piecewise H 1 functions on T h (cf. [19] and also [35, where ∇ h denotes the element-by-element broken gradient operator and h F is the face diameter. The penalization of jumps is realized by the bilinear form where λ har = λ 1 λ 2 /(λ 1 + λ 2 ) on interfaces and λ har = λ on boundary faces, while η > 0 is a user-dependent parameter. Let V h denote the vector space of cell-and face-DOFs, and let V ccg h := R h (V h ) be the space of piecewise affine functions obtained from the reconstruction (18). The discretization of problem (5) reads: where hal-00783068, version 1 -31 Jan 2013 D A. Di Pietro and M Vohralík / A review of discretization methods, a posteriori analysis, and adaptive algorithms

11
The terms in the first line of (21) are responsible for consistency and stability, whereas those in the second line respectively ensure consistency and symmetry. The bilinear form (21) was introduced in the context of domain decomposition methods for degenerate advection-diffusion problems by Burman and Zunino [24]. A general analysis for dG methods for degenerate advection-diffusion problems inspired by similar mechanisms was later established in [38].

Discussion
It is useful to summarize the features of VLO methods with respect to the points identified in Section 3.1. (ii) Robustness. Unlike MPFV methods, VLO methods do not inherently require local constructions which may lead to ill-posed problems. On the other hand, the underlying discrete space has some degree of global regularity (e.g., the continuity of face-averaged values proved in [37]), which narrows the range of singularities in the exact solution that can be accurately represented with respect to dG methods (cf. Section 3.5.1 for an example).
(iii) Stability. As discussed in Section 3.4.2, stability is assured by introducing penalty terms that allow to control the L 2 -norm of discrete functions in terms of the L 2 -norm of the gradient. However, tighter forms of stability such as the discrete maximum principle are generally not available.
(iv) Low computational cost. While the methods presented in Section 3.4.2 have more unknowns than cell centered finite volume methods, several reduction strategies are available. As already mentioned, one possibility consists in interpolating face unknowns in terms of cell unknowns. Although the local constructions required for interpolation have similar problems as the ones used in MPFV methods, this can be fixed by locally maintaining face unknowns as proposed in [54]. In general, the stencils of the resulting methods are larger than those of MPFV methods. When possible, a second strategy to reduce the number of unknowns consists in performing hybridization to solve the discrete problems in terms of face unknowns only. The linear systems resulting from VLO discretizations can be solved efficiently with standard preconditioners when mild homogeneities are present. Unlike MPFV methods, the embedded stability ensures that the matrices are definite.
(v) Local conservation. When interface unknowns are kept, the local conservation properties of LOV methods can be formulated in terms of numerical fluxes whose expression can be obtained analytically. In this case, the interface unknowns act as Lagrange multipliers of the flux continuity constraint. For further details we refer, e.g., to [54] and [33].
(vi) Integrability in existing simulators. Even when a simple expression for the numerical flux is available, LOV methods are less easily integrated in existing numerical codes when compared to MPFV methods. Indeed, dealing with interface unknowns, whether they are kept or interpolated, may require substantial modifications to the data structures of the code as well as to the way parallelism is handled.

Principles
The key idea of dG methods is to search the approximate solution in a space of piecewise polynomial functions that are fully discontinuous at interfaces, i.e., for an integer k ≥ 1, where P k d (T ) denotes the restriction to T of polynomial functions of total degree ≤ k. The main advantage of considering fully discontinuous functions is that sharp gradients or singularities affect the numerical solution only locally, which is not the case when considering discrete spaces endowed with some form of global regularity. This feature was first recognized by Reed and Hill in 1973 [75], who introduced a dG discretization of a steady neutron transport problem. The first analysis for steady first-order PDEs is due to Lesaint and Raviart [65][66][67]. However, dG methods only reached popularity in the 90s, when Cockburn and Shu considered their application to time-dependent hyperbolic PDEs in conjunction with explicit Runge-Kutta schemes [29,30]. For PDEs with diffusion, dG methods originate from the work of Nitsche on boundary-penalty methods in the early 70s [70,71] and the use of Interior Penalty (IP) techniques to weakly enforce continuity conditions imposed on the solution or its derivatives across interfaces, as in the work of Babuška [13], Baker [14], Wheeler [88], and Arnold [9]. In the late 90s, following the success of Runge-Kutta dG methods applied to hyperbolic problems, a new interest arose in dG formulations of diffusion terms. An extension of the techniques of Cockburn and Shu to problems with diffusion was considered by Bassi and Rebay [15] in the context of compressible flows. A unified analysis of dG methods for diffusive terms can be found in the work of Arnold et al. [11], while a unified analysis encompassing both diffusive and hyperbolic PDEs has been derived by Ern and Guermond [47][48][49].
The use of dG methods in geosciences has been considered in several works, mainly focusing on reactive transport, where advection terms play an important role; cf., e.g., Sun and Wheeler [79,80] or Bastian et al. [17] and references therein. In this context, a key point is to ensure the robustness of the method in the vanishing or zero permeability limit. This problem has been addressed by Houston et al. [62], and Di Pietro et al. [38]. The latter work provides the backbone for the discussion in the following section.

Degenerate advection-diffusion
Following [38] we show how the features of dG methods can be exploited to construct a discretization which is robust with respect to vanishing (or even zero) permeability. We modify (5) to include advection and reaction terms, where β is an incompressible vector-valued velocity field, µ > 0 is a reaction coefficient, while (n denotes here the unit normal vector pointing out of Ω).
To include the case when the permeability tensor vanishes along one direction, the boundary condition is only enforced on the portion of ∂Ω where either normal diffusion is present, or where the advective flow enters the domain. Problem (22) is representative of a class of reactive transport models encountered, e.g., in CO 2 storage simulation. It has been shown in [38] that the solution to (22) features singularities along the discontinuities of the permeability field. In particular, jumps occur when the advection field flows from a nonpermeable to a permeable region, as shown in Figure 6. Jump singularities are not naturally handled by methods such as the ones described in Section 3.4.2, since the gradient reconstruction (12) is inherently based on the approximation of single-valued traces. On the contrary, discontinuities can be captured by dG methods provided they occur at element boundaries and not inside elements. To make sure that the appropriate interface conditions are automatically selected, (i) diffusive penalization of interface jumps should only occur when the permeability in the normal direction is nonzero on both sides of an interface. This is the case, e.g., for the bilinear form a (ii) advective penalization of interface jumps should incorporate a mechanism to enforce boundary conditions on the inflow portion of ∂Ω * and interface condition on the interface between permeable and nonpermeable regions. It has been shown in [38] that this is the case when upwind fluxes are considered, corresponding to the bilinear form The discretization of problem (22) An important difference with respect to the methods of Section 3.4.2 is that, this time, the discrete problem is formulated for an arbitrary order k ≥ 1. As an example, convergence results for the problem described in Figure 6 are provided in Figure 7.

Discussion
We briefly revise the features of dG methods with respect to the points identified in Section 3.1.
(i) Consistency. Since dG methods are (nonconforming) finite element methods, consistency can be interpreted as an orthogonality property for the numerical error u−u h . This has the important consequence that higherorder approximations can be considered, since the convergence rate is not limited by the consistency error. The use of high-order methods in subsoil modeling, although not common, can be justified when complex flow patterns are present, as shown, e.g., by Scovazzi et al. [78].  (ii) Robustness. As shown in the previous section, it is possible to design dG methods that are robust with respect to variations of the physical parameters of the problem. Indeed, the additional flexibility resulting from the use of fully discontinuous polynomial spaces allows to represent singularities in the solution which would otherwise affect the overall precision. More generally, coarse features can be expected to have only local effects, leaving the numerical solution in the far field unperturbed.
(iii) Stability. The stability of dG method is inherent to the use of penalty terms which allow to control the jumps of the discrete solution at interfaces. A relevant point in the example of Section 3.5.2 is that these terms can be finely tuned to avoid unnecessary (or unphysical) numerical diffusion. As is the case for all of MPFA and LOV methods, the discrete maximum principle is not available in general.
(iv) Low computational cost. Discontinuous Galerkin methods are usually the most expensive among the methods considered in this review. In fact, a fully discontinuous polynomial representation requires to introduce as many cell DOFs as the coefficients of a polynomial in P k d . Moreover, the resulting linear systems are usually more difficult to solve, although standard preconditioners are still usable when mild heterogeneities are present.
(v) Local conservation. Although this point is not detailed here for the sake of conciseness, it has been long known that dG methods enjoy local conservation properties expressed in terms of continuous numerical fluxes; see, e.g., [35,Section 4.3.4].
(vi) Integrability in existing simulators. Discontinuous Galerkin methods are generally difficult to integrate into existing finite volume codes, while this is generally easier for finite element codes. One point in favor of dG methods is that the connectivity is analogous to that of the TPFV method, which therefore can be used as a model to design communications in parallel implementations.

A POSTERIORI ERROR ANALYSIS AND ADAPTIVE ALGORITHMS
This last section of the paper is devoted to a posteriori error estimates and adaptive algorithms for the considered geosciences problems. The use of a posterioridriven algorithms seems to be a promising way of compensating the increased computational cost for complex models. Our presentation is done along the recent contributions [26, 40, 46, 50-52, 59, 63, 85-87]. The basic idea can be traced back at least to the Prager and Synge equality [72] and has been used in a posteriori error estimation previously; we refer for a general orientation to the monographs of Ladevèze [64], Verfürth [81], Ainsworth and Oden [6], Neittaanmäki and Repin [69], and Repin [76]. Rather engineering approaches have also been previously; let us in particular refer to [27] and to the references therein.

General considerations
A posteriori error estimates aim at giving bounds on the error between the known numerical approximation, say u hτ , and the unknown exact solution, say u, that can be computed in practice, once the approximate solution u hτ is known. They typically take the form where η n T = η n T (u hτ ) is a quantity linked to the discrete time t n and mesh element T , computable from u hτ , called an ele-hal-00783068, version 1 -31 Jan 2013 ment estimator. In (25), |||·||| is some space-time error measure, often the energy norm. The estimate (25) is written directly for an unsteady problem; for steady problems, we simply set N := 1 and leave out the temporal indices n. Then, |||·||| is only a space error measure and we use the notation u h for the approximate solution. Detailed notation is given below.
One may formulate the following six properties describing an optimal a posteriori error estimate: (i) ensure that (25) holds and that the element estimators η n T are fully computable from u hτ (guaranteed upper bound); (ii) ensure that, for all 1 ≤ n ≤ N and all T ∈ T n h , η n T represents a lower bound for the actual error on the time interval (t n−1 , t n ] and in the vicinity of the element T , up to a generic constant; this means that there exists a constant C > 0 such that where T T stands for the element T and its neighbors (local efficiency); (iii) ensure that the effectivity index defined as the ratio of the estimated and actual error, goes to one as the computational effort grows (asymptotic exactness); (iv) guarantee the three previous properties independently of the parameters of the problem and of their variations (robustness); (v) give estimators η n T which can be evaluated locally (only performing calculations in the element T or in its neighborhood T T ) (small evaluation cost); (vi) distinguish and estimate separately the different error components (error components identification).
Property (i) above allows to give a truly computable upper bound on |||u−u hτ ||| and thus to certify the error committed in a numerical simulation. Property (ii) enables to predict the localization of the error. It is possible to satisfy it entirely for steady problems. It then enables to detect the areas of the computational domain Ω where the error is large. Knowing such areas, one can concentrate more effort therein, by performing an adaptive mesh refinement. For unsteady problems, one typically only arrives at which justifies theoretically the localization of the error in time but not in space. Property (iii) ensures the optimality of the upper bound; if the error is quite small and the estimator predicts a large value, it may well satisfy properties (i) and (ii), but is probably not very useful as it overestimates highly the error. Property (iv) is one of the most important in practice. In real-life problems, parameters and coefficients such as the domain size, final simulation time, space and time steps, the permeability tensor K, the porosity φ, the viscosities µ, the sources q, or the nonlinear state functions for nonlinear problems may be very large or small or vary abruptly; an estimator satisfying property (iv) ensures that its results will be equally good in all situations. Property (v) then guarantees that the computational cost needed for the evaluation of the estimators η n T will be much smaller than the cost required to obtain the approximate solution u hτ itself (recall that typically a global problem needs to be solved in order to obtain the approximate solution for steady problems and one such a problem needs to be solved at each time step for implicit time discretizations of unsteady problems). Finally, the numerical error u − u hτ typically consists of several error components. The first one is the discretization error, which further splits into temporal (for unsteady problems) and spatial errors. These result from the approximation properties of the time stepping procedure and of the numerical scheme, and by the current temporal and spatial meshes. Another typical error component is the algebraic error, linked to the imprecision in the solution of the associated systems of linear algebraic equations. For nonlinear problems, the linearization error, linked to incomplete convergence of iterative nonlinear solvers such as the Newton method, arises equally. Property (vi) is essential for the identification of the discretization (spatial and temporal), linearization, and algebraic errors and for entire adaptivity, relying not solely on adaptive mesh refinement but employing crucially adaptive stopping criteria for linear and nonlinear solvers.
In the subsequent sections, we will illuminate the current knowledge on a posteriori error estimates for geosciences problems. We start by the model steady linear problem (5) and arrive up to the compositional Darcy flow model discussed in Section 2. The error estimates are derived under very general assumptions that allow to cover all the discretization methods discussed in Section 3, taking advantage of the unified framework developed in [50,51,59], see also the references therein.

The single-phase steady Darcy flow
Let us first consider the single-phase Darcy flow (5). In order to make the presentation independent of the numerical scheme at hand, we suppose that u h is a piecewise regular, typically piecewise polynomial function on the mesh T h . This in particular allows for u h being nonconforming, hal-00783068, version 1 -31 Jan 2013 D A. Di Pietro and M Vohralík / A review of discretization methods, a posteriori analysis, and adaptive algorithms 15 i.e., not contained in the energy space H 1 0 (Ω) (not continuous).

Controlling a posteriori the error
To give an a posteriori error estimate for the generic approximation u h we, following [51,59,85], henceforth assume that we are able to construct two functions s h and σ h such that: Assumption 1 (Potential reconstruction). There exists a scalar function s h ∈ H 1 0 (Ω), termed potential reconstruction. Assumption 2 (Equilibrated flux reconstruction). There exists a vector function σ h ∈ H(div, Ω) such that termed equilibrated flux reconstruction.
The two above assumptions mimic two essential properties of the exact solution of (5). First, the exact pressure u is continuous in the sense that it belongs to the space H 1 0 (Ω). Second, the exact (Darcy) flux given by −K∇u belongs to the space H(div, Ω), which implies, in particular, that its normal component is continuous across interfaces. Last, for the exact flux it holds that its divergence is equal to the source term f . Assumptions 1 and 2 mimic these properties on the discrete level. In conforming numerical methods such as vertex-centered finite volumes or conforming finite elements, the approximate solution itself satisfies u h ∈ H 1 0 (Ω). Then we simply set s h := u h . Similarly, in fluxconforming numerical methods such as cell-centered finite volumes or mixed finite elements, a discrete flux satisfying Assumption 2 is readily available and can be taken for σ h .
Suppose f ∈ L 2 (Ω), K symmetric, bounded, and uniformly positive definite, and recall that the energy (semi-)norm for (5) is then given by, for a function v piecewise H 1 on T h , |||v||| := K 1/2 ∇ h v L 2 (Ω) d . Then we have, see [51,72,85]: Theorem 3 (A posteriori error estimate, steady single phase flow). Let u be the exact (weak) solution of (5), let u h be its arbitrary (piecewise regular) approximation, and let Assumptions 1 and 2 hold. Then where the nonconformity estimators η NC,T are given by the flux estimators η F,T are given by η F,T := K 1/2 ∇ h u h + K −1/2 σ h T , and the residual estimators η R,T are given by In Theorem 3, C P,T is the constant from the Poincaré inequality, equal to 1/π whenever the element T is convex, and c K,T is the smallest eigenvalue of the permeability tensor K on the element T .
Remark 4 (Estimators of Theorem 3). The estimator η NC,T of Theorem 3 is related to the H 1 0 (Ω)-constraint on the pressures and evaluates the possible departure of u h from H 1 0 (Ω). The estimator η F,T is related to the constitutive law saying that the flux is given by −K∇u (this is precisely the Darcy law (2) when the gravitational effects are neglected) and to the H(div, Ω)-constraint on the fluxes and evaluates the departure of −K∇ h u h from H(div, Ω). Finally, the last estimator η R,T is related to the strong form (5) and to the condition of flux being in equilibrium with the sources.
It follows from Theorem 3 that the a posteriori estimate for the energy norm of the approximation error for problem (5) is certified, so that property (i) of Section 4.1 is satisfied. With appropriate choices of the reconstructions s h and σ h , it can be shown that also the properties (ii)-(v) hold true; property (iii) may not hold fully (but effectivity indices below two are usually observed), and, similarly, property (iv) does not hold with respect to the inhomogeneities and anisotropies of the diffusion tensor K unless specific adaptations are made, see [86] and the references therein. Finally, the estimate of Theorem 3 assumes that the system of linear equations associated to the given numerical method is solved exactly. Identification of the algebraic and discretization errors in the spirit of property (vi), leading to stopping criteria for iterative liner solvers, was undertaken in [63]. Further details can be found in [51,85] and the references therein.

Refining adaptively the mesh
If follows from the fact that property (ii) of Section 4.1 is satisfied that the estimators of Theorem 3 allow to predict the spatial distribution of the error. This is illustrated in Figure 8: in its left part, the actual error distribution over the mesh elements is plotted, whereas in its right part, the estimators for the cell-centered TPFV scheme approximation of (5) are shown. We can see that our prediction matches nicely the reality. It is then natural to refine the mesh adaptively, around those elements where the estimators predict a high error value. Such a concept is crucial especially in presence of singularities in the exact solution: then the mesh can be almost exclusively refined in such places, as we can witness it in Figure 9. Both these examples, as well as the one in Figure 10, are given for problem (5) with a model domain Ω = (−1, 1) × (−1, 1), zero source term f , isotropic but inhomogeneous diffusion tensor K being multiple of the identity tensor in the first and third quadrant and by the identity tensor in the other quadrants, and with an inhomogeneous Dirichlet boundary condition instead of the homogeneous one. This problem admits an analytical solution featuring a singularity at the origin (see [85] and the references therein). We consider two cases = 5 and = 100 corresponding to Figure 8 and Figure 9, respectively.
Finally, in Figure 10, we first asses the precision of our estimators: we plot both the energy error |||u − u h ||| and the estimate of Theorem 3, in the left part for = 5 and in the right part for = 100. We see that our estimators overestimate the actual error only mildly. Second, Figure 10 compares the situation of classical uniform mesh refinement with the adaptive mesh refinement based on our estimators. We see that the same precision can be achieved for significantly fewer unknowns in the adaptive case with respect to the uniform one. Equivalently, the error for the same number of unknowns is much smaller in the adaptive case. Actually, the error decrease in function of the number of unknowns is very slow (given by the regularity of the solution) in the uniform case, whereas it recovers the best possible speed in the adaptive case.

The single-phase unsteady Darcy flow
To lay the foundations for time-dependent problems, we consider the unsteady version of the model problem (5). For the sake of simplicity, we consider for the theoretical developments K to be an identity matrix, so that we look for u such that for some given initial pressure u 0 . Let t n , 0 ≤ n ≤ N, be a strictly increasing sequence of discrete times such that 0 = t 0 and t N = t F . We introduce the time intervals I n := (t n−1 , t n ] and the time steps τ n := t n − t n−1 for all 1 ≤ n ≤ N. On each t n , we suppose a (possibly different) mesh T n h . Again, to make the presentation as general as possible, and include all the space discretization schemes discussed in Section 3, we suppose that u hτ is such that u n h := u hτ (·, t n ) is piecewise regular (typically piecewise polynomial) on T n h , and that u hτ is piecewise affine and continuous with respect to time. We follow in our presentation [50,51].

Controlling a posteriori the error
Let the source function f ∈ L 2 (Ω × (0, t F )) be for simplicity piecewise constant in time, where we denote f n := f | I n , and let the initial condition u 0 ∈ L 2 (Ω). The exact solution lies in the space Y := {y ∈ X; ∂ t y ∈ X }, with X := L 2 (0, t F ; H 1 0 (Ω)) and X = L 2 (0, t F ; H −1 (Ω)). The space-time energy norm is given by, for y ∈ X, We extend it to only piecewise regular functions in space while replacing the usual gradient ∇ by the broken one ∇ h . It appears impossible so far to obtain the property (ii) (even local in time but global in space) for the energy norm. For this reason we, following Verfürth [82], augment the energy norm by a dual norm of the time derivative as with ∂ t y X := Then property (ii), local in time but global in space, can be obtained.
We suppose that the temporal discretization is fully implicit, backward Euler. In order to once again make the presentation independent of the spatial discretization scheme at hand, we make the following equivalents of Assumptions 1 and 2: Assumption 5 (Potential reconstruction). There exists a scalar function s hτ , piecewise affine in time and satisfying s n h := s hτ (·, t n ) ∈ H 1 0 (Ω), such that, for all 1 ≤ n ≤ N and for all T ∈ T n h , where s n hτ := s hτ | I n and u n hτ := u hτ | I n . We call s hτ a potential reconstruction.
Remark 6 (Condition (31)). Condition (31) is necessary as we shall estimate the error in the augmented · Y -norm of (30). For a similar estimate in the · X -norm of (29), it would not be necessary.
We then have, see [50,51]: Theorem 8 (A posteriori error estimate, unsteady single phase flow). Let u be the exact (weak) solution of (28) and let u hτ be its arbitrary piecewise regular in space and piecewise affine and continuous in time approximation. Let Assumptions 5 and 7 be satisfied. Then, with, for all 1 ≤ n ≤ N, the spatial and temporal error estimators given by (η n sp ) 2 := T ∈T n h 3 τ n 9(η n R,K + η n F,K ) 2 + (η n NC,2,K ) 2 For all T ∈ T n h , the residual estimator, the flux estimator, and the nonconformity estimators are given by Finally, the initial condition estimator is given by It follows from Theorem 8 that the augmented norm error in approximation of problem (28) is certified by the a posteriori error estimate (32), so that property (i) of Section 4.1 is satisfied. With appropriate choices of the reconstructions s hτ and σ hτ , it can be shown that also the properties (ii)-(v) hold true; (ii) is local in time but unfortunately only global in space, whereas property (iii) typically only gives effectivity indices around five. Importantly, property (iv) do holds with respect to the final time t F . This is illustrated in Figure 11. We consider (28) posed on Ω := (0, 3) × (0, 3) with K = 0.5I (I being the identity matrix), f = 0, and with u 0 and an inhomogeneous Dirichlet boundary condition given by the exact solution u(x, t) = e x e y e t e 3 . Three square meshes of Ω with 10 × 10, 30 × 30, 90 × 90 grids and associated time steps 0.3, 0.1, 0.3333 are considered. A vertex-centered finite volume scheme with backward Euler time stepping is tested in the left part of Figure 11 for t F = 1.5 and in the right part for t F = 3. The results confirm experimentally that the effectivity indices (overestimation factors) are independent of the final time. For illustration, we give the effectivity indices also for the energy norm in Figure 12. Although in this case we have no theoretical support, we numerically observe efficiency and the same robustness; moreover, here the effectivity indices are closer to the optimal value of one.

Adaptivity: mesh and time step (de)refinement
The two main error components in the present case are the spatial and temporal ones. Thanks to Theorem 8, we can identify them in the spirit of property (vi) of Section 4.1. Such a result is a basic theoretical ingredient for adaptivity in unsteady problems, where both the spatial meshes T n h and the time steps τ n can be refined and derefined during the simulation. An example of a resulting adaptive algorithm and numerical illustrations of the computational benefits of such an space-time adaptive approach can be found in [61].

The two-phase unsteady Darcy flow
We now move in our presentation further to the simplest multi-phase flow model: we consider a simplification of the compositional model of Section 2 with only two phases present and one component identified with each phase. The results of Sections 4.2 and 4.3 were recently extended to such a case in [26] for vertex-centered finite volume discretizations and in [87] in a general, discretization schemeindependent setting. We focus here particularly on property (vi) from Section 4.1 and on its practical benefits; all the mathematical details can be found in [26,87], which develop the ideas of [46,52,63].

Controlling a posteriori the error
Suppose again an implicit Euler time discretization. In analogy with Sections 4.2 and 4.3, to give an a posteriori error estimate for general approximations (S p,hτ , P p,hτ ), p ∈ P fixed, without specifying the spatial discretization scheme, we make the two following assumptions: Assumption 9 (Pressure reconstructions). There exist two scalar functions s 1,hτ , s 2,hτ , piecewise affine in time and satisfying s n 1,h := s 1,hτ (·, t n ) ∈ H 1 0 (Ω), s n 2,h := s 2,hτ (·, t n ) ∈ H 1 0 (Ω). We call s 1,hτ , s 2,hτ pressure reconstructions. Remark 10 (Assumption 9). In a proper mathematical formulation of the two-phase flow model, see [26] and the references therein, there are two quantities which posses the same continuity as the weak potential in (28): these are the global pressure and the complementary pressure (Kirchhoff transform). For nonconforming discretizations, where the discrete versions of these quantities are not continuous, the scalar functions s 1,hτ , s 2,hτ represent their continuous reconstructions. Alternatively, when one knows for instance that the reference pressure P and the capillary pressure of a phase p should physically be continuous, then s 1,hτ , s 2,hτ may represent their reconstructions.
For the following result, we suppose that we are on a certain time step t n , 1 ≤ n ≤ N, that some iterative linearization  (e.g., the Newton one) has been applied to the system of nonlinear algebraic equations resulting from the given numerical method and that we are on its step k, and that in order to solve the arising system of linear equations, some iterative linear solver has been applied, with the current step i. The corresponding saturation-pressure approximation couple on the time interval I n is denoted by (S n,k,i p,hτ , P n,k,i p,hτ ). Theorem 12 (A posteriori error estimate, two-phase flow). Let (S p , P p ) for one chosen p ∈ P be the exact (weak) saturation and pressure. Let (S p,hτ , P p,hτ ) be their arbitrary piecewise regular in space and piecewise affine and continuous in time approximations. Let Assumptions 9 and 11 be satisfied. Let a time step t n , 1 ≤ n ≤ N, a linearization step k ≥ 1, and an algebraic solver step i ≥ 1 be given. Then |||(S p −S n,k,i p,hτ , P p − P n,k,i p,hτ )||| I n ≤ η n,k,i sp +η n,k,i tm +η n,k,i lin +η n,k,i alg , (33) where η n,k,i sp , η n,k,i tm , η n,k,i lin , and η n,k,i alg are respectively the spatial, temporal, linearization, and algebraic error estimators.
The precise form of the error measure ||| · ||| I n , as well as of the error estimators η n,k,i sp , η n,k,i tm , η n,k,i lin , η n,k,i alg , can be found in [26,87].

Adaptivity: stopping the linear and nonlinear solvers and (de)refining the mesh and time step
Theorem 12 enables once again to control the overall error in a numerical approximation of the given problem. In addition to Sections 4.2 and 4.3, however, it allows to identify the different error components in the spirit of property (vi) from Section 4.1. It is thus suitable for designing an entirely adaptive algorithm, with adaptive stopping criteria for both linear and nonlinear solvers, adaptive time step choice, and adaptive mesh refinement and derefinement. We now illustrate numerically such adaptive procedures for a cellcentered TPFV discretization of the immiscible incompressible two-phase flow. The time discretization is fully implicit coupled, the Newton method is used for the linearization of the arising systems of linear equations, and the GMRes linear solver with Jacobi (diagonal) preconditioning is employed (the examples are taken from [87]).
In the left part of Figure 13, for a fixed discrete time and Newton step, we track the dependence of the different estimators of Theorem 12 on the GMRes iterations. We see that all η n,k,i sp , η n,k,i tm , and η n,k,i lin stabilize after a few GMRes iterations, whereas η n,k,i alg as expected decreases with GM-Res iterations. Classically, one would wait until the algebraic relative residual becomes very small, say smaller than 10 −13 . In the present case, this requires 1530 GMRes iterations. Our adaptive stopping criterion instead says that it is sufficient that the algebraic error estimate η n,k,i alg is some usergiven constant γ alg smaller than the sum η n,k,i sp + η n,k,i tm + η n,k,i lin , expressing that there is no need to continue with algebraic solver iterations once the algebraic error does not influence significantly the overall error. For γ alg = 10 −3 , such a criterion only requires 435 GMRes iterations. In the right part of Figure 13, we then plot the different estimators as function of the Newton iterations, at the same discrete time. 11 iterations are necessary to reach the classical stopping criterion requiring the L ∞ difference between two consecutive pressure and saturation approximations to be smaller or equal to 10 −11 , whereas only 3 iterations are sufficient to arrive at the adaptive stopping criterion η n,k,i lin ≤ γ lin (η n,k,i sp + η n,k,i tm ) with γ lin = 10 −3 .
The overall gains achievable thanks to our approach are then illustrated in Figure 14. In its left part, we plot the number of necessary Newton iterations on each time step for both the adaptive and classical stopping criteria. We can see that they are considerably smaller in the adaptive case. It is to be emphasized that in particular much fewer Jacobian matrix assemblies are necessary in our approach. In the right part of Figure 14 the cumulative number of GMRes iterations is given as function of time. From this last graph, we can conclude that in the adaptive approach the number of cumulative GMRes iterations is approximately 12-times smaller compared to that in the classical one.
The above usage of a posteriori error estimates seems to be rather new. More common is the procedure already described in Section 4.3.2, consisting in equilibrating of η n,k,i sp with η n,k,i tm and in equilibrating of the individual element components η n,k,i sp,T of η n,k,i sp through adaptive time and space mesh refinement and derefinement. Adaptive algorithms and numerical experiments in such a spirit are given in [87]. A simplified example can be summarized as follows: Algorithm 1 (An entirely adaptive algorithm).
2. Set up the system of nonlinear algebraic equations on time t n .
3. (a) Initialize the iterative linearization (typically by the last available approximations). Set k = 1. (b) Set up the system of linear algebraic equations on linearization step k. (c) i. Initialize the iterative linear solver (typically by the last available approximations). Set i = 1. ii. Perform one or several linear solver steps (in the latter case increase i appropriately). This gives the approximation (S n,k,i p,hτ , P n,k,i p,hτ ). iii. From the numerical method at hand, build the pressure reconstructions s 1,hτ , s 2,hτ and the equilibrated phase flux reconstructions σ p,hτ , p ∈ P.
iv. Evaluate the estimators η n,k,i sp , η n,k,i tm , η n,k,i lin , η n,k,i alg . v. Check the convergence criterion for the linear solver η n,k,i alg ≤ γ alg (η n,k,i sp + η n,k,i tm + η n,k,i lin ). (34) If this criterion is not satisfied, set i := i + 1 and go back to step 3(c)ii.
(d) Check the convergence criterion for the nonlinear solver η n,k,i lin ≤ γ lin (η n,k,i sp + η n,k,i tm ).
If this criterion is not satisfied, set k := k + 1 and go back to step 3b. 4. Check whether the spatial and temporal errors are comparable in the sense that η n,k,i sp ≈ η n,k,i tm , whether the spatial errors are equally distributed in the computational domain in the sense that η n,k,i sp,T are comparable for all T ∈ T n h , and whether the total error is small enough in the sense that η n,k,i sp + η n,k,i tm + η n,k,i lin + η n,k,i alg ≤ ε n , where ε n is a user-given criterion for the maximal error on the time interval I n . If this is the case, and t n < t F , set n := n + 1 and go to step 2. If not, refine the time step τ n and/or the space mesh T n h and go to step 2. Remark 13 (Computational cost). The above algorithm is essentially a very standard resolution algorithm. Its basic novel ingredient are the estimators η n,k,i sp , η n,k,i tm , η n,k,i lin , η n,k,i alg ; according to property (v) of Section 4.1, their evaluation price is very small. The simplest adaptivity to implement in an existing program is then that of the adaptive stopping criteria (34) and (35). It basically consists in replacing two source code lines. It is more demanding to implement (36), (37), and (38).

The compositional unsteady Darcy flow
We finally return back to the compositional multi-phase Darcy model introduced in Section 2. We are currently in [41] adapting the developments presented in Section 4.4 to this setting.

Controlling a posteriori the error
In [41], we have in particular derived an equivalent of Theorem 12 for the compositional multi-phase flow. Thus the overall error can be controlled and moreover its individual components identified.

Adaptivity: stopping the linear and nonlinear solvers and (de)refining the mesh and time step
Finally, similarly to Section 4.4.2, entirely adaptive algorithms were proposed for the compositional multi-phase flow case in [41]. Results similar to those reported in Figures 13 and 14 were obtained, enabling in particular substantial computational gains just by employing adaptive stopping criteria for linear and nonlinear solvers. To illustrate the capability of our estimators to likewise detect the distribution of the spatial error, typically concentrated around a moving front, we plot in Figure 15 the element contributions of the spatial estimator η n,k,i sp at two different discrete times. Refining and derefining the mesh adaptively while following the front (and choosing adaptively the time step size) is likely to still increase the computational attractiveness of our approach. Such algorithms represent for us a subject of current work. One example of a simpler model problem with however still many characteristic difficulties and entire adaptivity successfully put in place is [40].