An Optimization Strategy Based on the Maximization of Matching-Targets' Probability for Unevaluated Results

— An Optimization Strategy Based on the Maximization of Matching-Targets’ Probability for Unevaluated Results — The Maximization of Matching-Targets’ Probability for Unevaluated Results (MMTPUR), technique presented in this paper, is based on the classical probabilistic


INTRODUCTION
Optimization or inverse problems are usually associated with the minimization of an Objective Function (OF).This OF represents the mismatch between the targets to reach and the corresponding numerical function values or simulation results.Moving to a probabilistic optimization framework makes it possible to extend this OF minimization process to the maximization of the posterior distribution of the unknown parameters or variables.This posterior distribution depends on prior uncertainties on both parameters and targets (Tarantola, 2005).The bridge between this OF minimization problem and the associated classical probabilistic framework relies on assumptions about tolerances relative to targets and numerical function errors.The key points of the methodology described in this paper consist: -in approximating these tolerances by stochastic variables; and, -in considering Gaussian processes (Sacks et al., 1989) to model numerical function errors.
Based on the classical probabilistic optimization framework, we developed a technique called Maximization of Matching-Targets' Probability for Unevaluated Results (MMTPUR).Briefly, the numerical function values that have not been evaluated are considered as stochastic functions: a Gaussian process model is computed for each simulation result associated with a target.The posterior distribution of the unknown parameters is then built, accounting for all of these Gaussian process models.New points are iteratively added by maximizing this distribution.
Gaussian process models are widely used either to approximate responses of costly numerical models or to perform optimization.Jourdan (2000) and Marrel (2008) use Gaussian processes to approximate key outputs of numerical models from a fixed number of simulations.These approximations can also be used to perform uncertainty management, for instance to propagate input uncertainty towards output results and to perform sensitivity analysis.
Adaptive design strategies have also been proposed instead of fixed design to improve Gaussian process approximations for given responses (Scheidt et al., 2007;Busby, 2009).Thus, new simulation points are iteratively added with various strategies, in domains either where approximations are poor, where responses are complex, or where information is required.The overall objective is to get a good approximation of the responses of interest over the entire uncertain domain.One specific feature of the approach developed by Scheidt et al. (2007) is to add pilot points, at each iteration, in order to improve the Gaussian process model.These pilot points are fictitious data used to refine the final model.
As described in Schonlau (1997), Villemonteix et al. (2009) and Vazquez and Bect (2010), Gaussian process properties, with characterization of mean and covariance, can also be used for local or global optimization.In this case, the required OF is approximated by a Gaussian process and an improvement function corresponding to the potential reduction of the OF is defined.The classical OF minimization is then turned into the maximization of the improvement expectation, known as the expected improvement or gain.Adding a global optimization criterion to the expected improvement method yields the generalized expected improvement method.For a specific case the generalized expected improvement method corresponds to maximize the probability to be lower than the lowest available values of the OF.Busby and Feraille (2008) and Feraille and Marrel (2012) employed Gaussian processes within a probabilistic optimization framework to handle history-matching problems in reservoir engineering.Within this context, the OF quantifies the mismatch between target production data and the corresponding numerical responses.These authors proposed to approximate the OF by a Gaussian process using an adaptive algorithm and focused on areas where the OF was low (which is the usual ultimate history-matching goal).This OF approximation provided the posterior distribution of the unknown parameters in a reasonable number of simulations.
The main differences between the MMTPUR technique and the methods previously introduced are listed hereafter.First, we approximate using different Gaussian process models several responses of interest instead of a single one.Then, all the models are locally improved from a sequence of adjustments within uncertain domains where the simulation results are in agreement with the targets.This is not performed over the entire uncertain domain.Finally, the optimization strategy entails the maximization of the posterior distribution in accounting for several Gaussian process models (one for each required numerical function result which is associated with a given target) rather than a single model.
In the following sections, we focus first on the classical probabilistic optimization framework.We then go into the details of the Gaussian process technique, which is used to approximate the output results of a simulator based upon response surface modeling.We show how to take advantage of this add-on to the classical probabilistic formulation of inverse problems: the Gaussian process technique is used to estimate the uncertainty in numerical function values that have not been evaluated.The last section presents the MMTPUR algorithm and several application examples.

PROBABILISTIC INVERSE PROBLEM FRAMEWORK
Inverse problems can be solved using optimization techniques.Briefly, an iterative process is implemented to identify a set of parameters x providing a numerical response y(x) as close as possible to target t.One or several values of x as well as their corresponding simulation results y(x) have to be determined.The probabilistic framework described below was proposed to solve this problem (Tarantola, 2005).It entails the following data, requirements and assumptions: -targets T ={ T i }, T i being the i-th target, are associated with an uncertainty model characterized by probability density function p T (t), where t is a realization of T. As an example, in reservoir engineering historymatching process, the simulation model aims at simulating the production profiles for a given reservoir while the targets can be measured production profiles.
Uncertainty is linked to the accuracy in measurements; -the parameters of the simulation model, X, are associated with an a priori uncertainty model described by probability density function p X (x), x being a realization of X. Considering a Gaussian uncertainty model described by mean parameter values l X and covariance operator C X , the a priori probability density function is expressed as: The C integration domain is related to the targets' space, namely: [À1;+ 1] i .
One often considers that the y value depends deterministically on the x value.Therefore, p Y(x) (y) is a Dirac probability density function d({y(x)}), and the likelihood function becomes: Besides, assuming a Gaussian uncertainty model for T, characterized by the mean target values, l T , and a covariance operator, C T , the likelihood function is expressed as: Finally, we obtain the following relationship for the posterior probability density function: This framework is used to determine x values (or realizations of X) that maximize the a posteriori probability of X.It consists in finding x values that maximize p X =Y x=t ðÞ or that minimize Objective Function OF(x).The minimization of the Objective Function can be performed referring to many techniques, said to be global or local depending on whether they converge to global or local minima.For instance, the following types of methods can be highlighted: -the local gradient-based methods (Tarantola, 2005), which include quasi-Newton and Gauss-Newton families; -the global evolutionary and genetic methods (Hansen, 2006); -the global response surface-based methods (Schonlau, 1997;Villemonteix et al., 2009;Vazquez and Bect, 2010).
In the remainder of this paper, we propose to use the known simulation results to approximate p Y(x) (y) from several Gaussian processes, instead of considering a Dirac probability density function.Before proceeding, the key points of the Gaussian process technique are recapped in the following section.We then show how this technique can be used to estimate probability density function p Y(x) (y) for unevaluated results of the numerical model.Finally, we discuss what this approach implies for the posterior distribution of X.

RESPONSE APPROXIMATION USING GAUSSIAN PROCESS
Kriging was introduced in Geostatistics (Matheron, 1963).It is nowadays widely used as an approximation to model output functions {y(x)} (i.e., the responses provided by numerical simulators) with respect to input {x =( x 1 , ..., x d )}.This kriging approximation is a Gaussian Process (GP) characterized by a mean and a covariance structure.It can be shown that the GP or kriging mean is the best linear unbiased estimator.
In other words, we have a set of values for input parameters x =( x 1 , ..., x d ), d being the number of parameters, and we derive the y(x) values from the simulator.Then, we consider y(x) as a realization of the following random function Y(x) (Sacks et al., 1989) which is a GP: Yx ðÞ¼hx ðÞÀZx ðÞÀUx ðÞ ð 7Þ where: h(x) is the mean of the response.Its shape is a priori known: hx ðÞ¼ P k j¼0 b j h j ðxÞ.Note that h(x) can be seen as the trend of the response.h j (x) are elementary functions or terms associated with the shape of h and b are coefficients.Quite often, h j are the terms associated with a polynomial of order 2 including all or part of the following terms: constant, linear terms (x i ), first order interaction terms (x i x j ), quadratic terms (x i ).
Employing to more complex h j functions can also be envisaged; -Z(x) corresponds to the stochastic part of model Y.It is characterized by a zero mean and covariance where d is the dimension of x (number of parameters) and {r,(p i ) i = 1, ..., d , (k i ) i = 1, ..., d } are the covariance parameters.Assuming that Z is stationary (meaning that covariance does not depend on x values) makes it possible to perform the computations described hereafter; -U(x) is an independent noise that can be added to the random part of model Y.It is known as the nugget effect in geostatistics.Its usefulness appears when dealing with none-deterministic simulator y.Effectively, it allows for introducing a discontinuity at the origin into the covariance of model Y(x).U(x)i sa Gaussian random variable with zero mean and covariance and variance equal to e 2 ¼ r 2 Á s ; s !0. Coefficient s represents the contribution of the nugget effect.When U(x) is used, the covariance of the stochastic part of model Y becomes: with d the Dirac's delta function.
Note that in what follows we will not consider nugget effect.
For a given sample of realizations, denoted SA =(Y(x i )=y i ) i = 1, ..., n with n the number of points, it can be shown that the conditional mean and variance of Gaussian process Y(x) given SA are: Functions h j are selected on the basis of a priori knowledge.If such information is not available, then a constant value is taken.Coefficients b as well as hyperparameters k =( k i ) i = 1, ..., d , p =( p i ) i = 1, ..., d and r are estimated by maximizing the logarithm of the following likelihood function: Thus, estimate of: Instead of using only this optimal value of the hyperparameters, it is possible to consider their posterior probability density functions that account for the previously defined likelihood lðr; b; k; pÞ (Kennedy and O' Hagan, 2001).In such a case, the main drawback is the computational cost.For this reason, we preferred to disregard the Bayesian kriging technique.
An example of a GP Y(x) is displayed in Figure 1 for a one dimensional case with no nugget effect.The circles are elements of the SA realization sample.A Gaussian density function / is defined for each x with mean EY x ðÞ =SA ½ and standard deviation var Yx ðÞ =SA ½ ðÞ 1=2 .A continuous line shows the mean of Y(x) whereas dotted lines provide the 99% confidence interval (mean±3std).The unknown realizations of Y(x) should fall in this confidence interval associated with a probability of 99%.We observe that the realizations of the SA sample are accurately reproduced by Y(x).Effectively, for these points, x i 2 SA: EY x i ðÞ =SA ½ ¼ y i and due to the absence of nugget effect var Yx i ðÞ =SA ½ ¼ 0. Thus, the probability density function obtained for these points is a Dirac: d yx i ðÞ fg ðÞ .

GAUSSIAN PROCESS USED WITHIN PROBABILISTIC INVERSE PROBLEM FRAMEWORK
As stated above within the "Probabilistic inverse problem framework" section, simulation results In what follows, we consider that each Y i is a GP with no nugget effect.Thus when, the simulation is not run for x.Results y i (x) are unknown and Y i GP yields an estimation of the probability density function of p Y i x ðÞ y i ðÞ , as: where u is a Gaussian probability distribution function.
The likelihood probability density function for y(x) results, assuming targets and simulation results to be independent, can be written as: Similarly, it is shown that the posterior distribution of X is: Let us focus on the particular cases listed below.-each T i respects the Gaussian law This actually corresponds to the convolution at zero of the two Gaussian probabilities, p T i and p Y i x ðÞ ; -each T i respects the Uniform law defined between [a where U Y i x ðÞ is the cumulative density function of the Gaussian law p Y i x ðÞ .It is worthwhile mentioning that the independence assumption between {Y i (x)} leads to simplified computations and good results as shown in the examples below.However, this assumption can be removed provided we compute the GP of Y i accounting for the covariance of several dependent {Y i (x)}.This can be done using cokriging (Cressie, 1993) technique.

MAXIMIZATION OF MATCHING-TARGETS' PROBABILITY FOR UNEVALUATED RESULTS (MMTPUR) TECHNIQUE
The main feature of the MMTPUR technique consists in approximating the simulator or numerical function for unknown results associated with targets T with GP.These GP are used to compute the posterior distribution of X.Then, this posterior distribution is maximized, leading to updated x values.The successive steps of the MMTPUR technique are recapped in Figure 2 and explained hereafter:

Step 1 and 2: Initial Design
We first build the initial set of (x j ) j = 1, ..., n values with (y i j ) j = 1, ..., n, i = 1, ..., p .n stands for the number of points, p for the number of targets and m is the current iteration index.y i j is associated with the j-th simulation result related to the i-th target T i .Even if it is theoretically possible to consider a single point within the initial sample, space-filling techniques like Latin hypercube sampling (McKay et al., 1979;Fang et al., 2006) are preferred to obtain the n point set.The objective is to end up with initial samples investigating the entire domain associated with the possible variations in parameters (x).

4.2
Step 3: Build GP Y i m for Each Target i Using SA m At this point, the GPs Y i m = Y i /SA m are built using the current sample SA m (see Eq. 8, 9).

4.3
Step 4: Add k New Points (x* j ) j = 1, ..., k to Maximize the Posterior Distribution of X As shown in Equation ( 12), the posterior distribution of X is: where: / l Y i ; r Y i ðÞ is the Gaussian probability density function of the current Y i m GP (Fig. 1) with mean and standard deviation obtained from Equations (8, 9); and p T i is the probability density function for target T i , targets being assumed to be independent.
To select the k new points (x* i ) i = 1, ..., k to be added to the current SA m sample, we apply the following iterative strategy: 1. select the first new point x* 1 as the one that maximizes p X =Y x=t ðÞ .Several optimization techniques can be selected to solve this maximization problem, such as genetic algorithms, gradient-based optimizations, etc., or a combination of these techniques; 2. add fictitiously x* 1 to sample SA m and associate the mean value derived from Y i m : EY i m xÃ 1 ðÞ ½ to the fictitious y i value for any i; 3. go back to step 1 and repeat until k new points are selected.
Selecting the number k of new points is a very practical issue and depends on simulation capabilities, such as computing power (cluster, network) or the number of available simulator licenses.For example, if 10 simulation jobs can be simultaneously run, k can be set to 10.However, if the simulation jobs can be launched only sequentially, k will be set to 1.

Step 5: Build Sample SA m+1
The k new (x* j ) j = 1, ..., k points are simulated to obtain (y* i j ) j = 1, ..., k, i = 1, ..., p .Then, the sample is updated: -a maximum number of simulations affordable, or -a threshold value for the maximum of the posterior distribution of X.For instance, if this maximum is lower than the threshold, the optimization is stopped.It means that the expected probability to match the targets is very low when adding this point.When one of these criteria is reached, the optimization process is stopped.Otherwise, index m is incremented and we go back to step 3.

APPLYING MMTPUR TO AN ANALYTICAL FUNCTION WITH ONE TARGET
The results, derived from the MMTPUR technique for a one-dimensional case, x2R, and one target, are shown in Figure 3.In this example, the simulator output y 0 (x)isa known analytical function.Associated target T 0 follows the probability density function p T 0 t 0 ðÞ .More precisely, we consider that: y 0 is the following analytical function: is a uniform law between 0 and 1; -there is no prior probability density function for x.MMTPUR technique is used to determine x values that give y 0 (x) belonging to interval [0, 1].
As explained above, an initial sample (steps 1 and 2), SA 0 , is needed.It corresponds to the 10-circled points in Figure 3.These points being known, we move to the following steps: -aG PY 0 is built to approximate y 0 (step 3); -Y 0 is then used to find a new x value in order to get new y 0 (x) in agreement with target T 0 and associated p T 0 t 0 ðÞ (step 4); -the SA sample is updated (step 5) and the above steps are repeated.
The results obtained for iterations 5, 10, 15 and 20 (resp., 15, 20, 25 and 30 points) are displayed in Figure 3. Crosses indicate the added points.The evolution of the Y 0 GP is also shown: its mean is represented by a black MMTPUR results for a one-dimensional case with one target.

M. Feraille / An Optimization Strategy Based on the Maximization of Matching-Targets'
Probability for Unevaluated Results line and its mean plus or minus 3 9 the standard deviation, by dotted lines.
The new x points (crosses) lead to y 0 (x) values that mainly fall within the [0, 1] interval (Fig. 3).It means that the y 0 (x) values respect the T 0 distribution, as required.In addition, Y 0 turns out to be accurate in areas where tolerance can be respected, thus approximating very well the analytical function y 0 at these locations.

APPLYING MMTPUR TO A FLUID-FLOW MODEL
In this core flood example, we apply the MMTPUR technique to a fluid-flow model depending on 4 parameters.The selected targets are synthetic cumulative oil and gas production profiles.To check the soundness of the MMTPUR technique, we compared the simulation results corresponding to a parameters' posterior sampling of 1 000 points with the specified targets and related tolerances.

Test Case Description
The core flood numerical model was built with the commercialized software PumaFlow TM .This one-dimensional model encompasses 62 cells.The rock sample was initially saturated with oil and irreducible water and a "pseudo-well" named PRODB was located at the outlet boundary to produce oil and gas.Porosity (Fig. 4) was measured in the laboratory: it ranges between $24% to $32%.Permeability and capillary pressure were then derived from porosity based upon analytical models.We simulated three successive periods with different pressure drops between the outlets (Fig. 5).
The synthetic measurements displayed in Figure 5 were simulated for a randomly-drawn forward porosity model.They consist of the cumulative oil and gas productions at the outlet.

Parameters and Targets Definition
The synthetic measurements described above are now considered as reference data.At this point, we assume that the 4 parameters listed below are unknown: -the 2 Corey exponents associated with the gas-oil relative permeability curves: "gas Corey exp." and "oil Corey exp."; -a global permeability multiplier: "K mult." and -a global gas capillary pressure multiplier: "gas CP mult.".The minimum, maximum and reference values for each of these parameters are reported in Table 1.The reference synthetic production data were generated using the reference values given in the table.
We selected 3 targets among the available synthetic data: -the two cumulative oil and gas productions corresponding to the last time and -the cumulative oil production at 10 hours (36 000 seconds).
The MMTPUR technique calls for the definition of tolerances for each selected targets.They were described by uniform probability laws over fixed ranges with ±2% variations (Fig. 6, 7).For simplicity, we disregarded prior probability p X x ðÞ .

MMTPUR Results
The initial design encompasses 5 simulations randomly selected on the basis of Latin hypercube sampling.The MMTPUR procedure was launched and stopped after the sequential addition of 6 other simulations.Figure 6 and Figure 7 show that none of the initial simulations (black curves) provide responses respecting all targets.This is especially true for gas production, which is associated with a target much larger than the initial simulation results.The MMTPUR technique makes it possible to identify simulations (green curves) that approximately respect the required targets.
As described previously for the 4th step of the MMTPUR technique, the posterior parameters' distribution is built at each iteration (Eq. 13).In what follows the distribution obtained at the last iteration, are jointly used with a Markov-Chain Monte-Carlo algorithm (Geyer, 1992) to build a 1 000 points parameters' posterior sampling.The marginal histograms are displayed in Figure 8.These histograms are all in agreement with the reference values.Moreover, the wide spread of the gas capillary pressure multiplier indicates that the influence of this parameter for matching the identified targets is minor compared to the others.Tolerance interval defined by ±2% relative to the reference value 05 0 000 500 000 0 100 000 150 000 200 000 1 000 000 2 000 000 3 000 000 Cumulative gas production (10 3 cm 3 ) Time (sec) MMTPUR simulation results displayed with synthetic cumulative gas production data (symbol "*").Black curves are associated with the 5 initial simulation results.Green curves correspond to the 6 simulation results iteratively added during MMTPUR procedure.

M. Feraille / An Optimization Strategy Based on the Maximization of Matching-Targets' Probability for Unevaluated Results
For validation purposes, we ran the simulations for the 1 000 points of the parameters posterior sampling.The results are displayed in Figure 9 for oil production and in Figure 10 for gas production.We also plotted the histograms of the 1 000 simulation results for each target.The black lines indicate the chosen tolerance intervals.
The tolerance intervals for the 3 targets are almost fully respected by the 1 000 simulations.We observe that the range of the histogram obtained for the last time of oil production is narrow compared to the associated tolerance interval (Fig. 9).This can be explained by correlations between simulation results.This 4-dimensional example highlights that the MMTPUR technique is able to define all possible simulations consistent with the tolerance intervals chosen for the 3 targets from only 11 simulations.This means that the posterior distributions built for parameters are good and lead to satisfactory results.

MAXIMIZATION OF MATCHING-TARGETS' PROBABILITY FOR UNEVALUATED RESULTS TECHNIQUE IN AN UNCERTAIN CONTEXT
The MMTPUR technique can be extended to optimization in an uncertain context.In this case, the uncertain domain is determined by two different types of parameters, controllable and uncontrollable, grouped in x =( x c , x u ).The controllable parameters, x c , can be adjusted to determine optimal values.On the contrary, the uncontrollable parameters, x u , are intrinsically uncertain.Thus, the goal is to find the optimal values of x c which make it possible to fit specific targets while accounting for the uncertainty related to x u .
As an example, a practical problem in reservoir engineering consists in identifying the location of a well with known liquid and/or gas rates while satisfying particular targets on oil and water productions, given uncertainties about uncontrollable parameters, such as correlation lengths or the average porosity of a facies.
We associate: a priori uncertainty on the controllable parameters, through: p X c x c ðÞ ; -uncertainty on the uncontrollable parameters, through: p X u x u ðÞ .The uncertainty associated with the uncontrollable parameters can be accounted for while calculating likelihood p Y =X c t=x c ðÞ instead of p Y =X t=x ðÞ .Thus: where the integration domain X u is related to the x u space.
As a result, the posterior uncertainty on X c is derived from the following equation: It is then possible to add new points x c while maximizing the posterior uncertainty relative to the controllable parameters X c to get simulation results in agreement with target definition, which accounts for the uncertainty associated with X u .

CONCLUSION
In this paper we introduce, a new optimization technique, the Maximization of Matching-Targets' Probability for Unevaluated Results, particularly adapted when getting one evaluation of the numerical function is very time consuming.MMTPUR is based upon an add-on to the classical probabilistic optimization framework considering the numerical function values that have not been evaluated as stochastic functions.The associated uncertainty model is a Gaussian process for each required numerical function result (i.e., associated with each target).It makes it possible to estimate probability density functions for each required unevaluated results.Therefore, the posterior distribution of the parameters as given by the MMTPUR approach, allows us to account for the error estimations related to the unevaluated function values or simulation results.As described, it is then further used to iteratively find and simulate new points that maximize this posterior distribution.
MMTPUR technique was also applied and presented on different simple cases with one or several targets producing good results.Moreover, an extension of the MMTPUR technique was described for a specific uncertain context.We believe that this technique could be extended to solving other problems, such as historymatching in reservoir engineering, or optimization with targets to reach.

M
. Feraille / An Optimization Strategy Based on the Maximization of Matching-Targets' Probability for Unevaluated Results Figure 1Example of a GP Y(x).

1-
Figure 2 Successive steps of the MMTPUR technique.
Figure 3 Figure 4 Description of the numerical fluid-flow model.The color represents the porosity property.
Figure6MMTPUR simulation results displayed with synthetic cumulative oil production data (symbol "o").Black curves are associated with the 5 initial simulation results.Green curves correspond to the 6 simulation results iteratively added during MMTPUR procedure.
Figure 8Marginal histogram parameters computed using the 1 000 points of the posterior sampling.Each plot is between parameter ranges.Arrows indicate the reference value for each parameter.