Ensemble-based method with combined fractional flow model for waterflooding optimization

Proxy models are widely used to estimate parameters such as interwell connectivity in the development and management of petroleum fields due to their low computational cost and not require prior knowledge of reservoir properties. In this work, we propose a proxy model to determine both oil and water production to maximize reservoir profitability. The approach uses production history and the Capacitance and Resistance Model based on Producer wells (CRMP), together with the combination of two fractional flow models, Koval [Cao (2014) Development of a Two-phase Flow Coupled Capacitance Resistance Model. PhD Dissertation, The University of Texas at Austin, USA] and Gentil [(2005) The use of Multilinear Regression Models in patterned waterfloods: physical meaning of the regression coefficient. Master’s Thesis, The University of Texas at Austin, USA]. The proposed combined fractional flow model is called Kogen. The combined fractional flow model can be formulated as a constrained nonlinear function fitting. The objective function to be minimized is a measure of the difference between calculated and observed Water cut (Wcut) values or Net Present Values (NPV). The constraint limits the difference in water cuts of the Koval and Gentil models at the time of transition between the two. The problem can be solved using the Sequential Quadratic Programming (SQP) algorithm. The parameters of the CRMP model are the connectivity between wells, time constant and productivity index. These parameters can be found using a Nonlinear Least Squares (NLS) algorithm. With these parameters, it is possible to predict the liquid rate of the wells. The Koval and Gentil models are used to calculate the Wcut in each producer well over the concession period which in turn allows to determine the accumulated oil and water productions. To verify the quality of Kogen model to forecast oil and water productions, we formulated an optimization problem to maximize the reservoir profitability where the objective function is the NPV. The design variables are the injector and producer well controls (liquid rate or bottom hole pressure). In this work the optimization problem is solved using a gradient-based method, SQP. Gradients are approximated using an ensemble-based method. To validate the proposed workflow, we used two realistic reservoirs models, Brush Canyon Outcrop and Brugge field. The results are shown into three stages. In the first stage, we analyze the ensemble size for the gradient computation. Second, we compare the solutions obtained with the three fractional flow models (Koval, Gentil and Kogen) with results achieved directly from the simulator. Third, we use the solutions calculated with the proxy models as starting points for a new high-fidelity optimization process, using exclusively the simulator to calculate the functions involved. This study shows that the proposed combined model, Kogen, consistently generated more accurate results. Also, CRMP/Kogen proxy model has demonstrated its applicability, especially when the available data for model construction is limited, always producing satisfactory results for production forecasting with low computational cost. In addition, it generates a good warm start for high fidelity optimization processes, decreasing the number of simulations by approximately 65%.


Introduction
Development and management of petroleum fields generally require estimation of many parameters (e.g., permeability, porosity, oil, water cut, etc.), so the best production scheme for each field can be applied. In reservoir simulation, physics-based models are used to obtain these parameters, but in most cases, they are difficult to be applied due to its high computational cost. An alternative to deal with this problem is to use proxy models. These models have been widely used for decision making process in recent years, and present as a powerful technique to analyze reservoir dynamics with a low computational cost and a quick response.
During the development of oil fields, one of the most used methods to maintain reservoir pressure and improve oil recovery is the waterflooding method. For this method, there are several injectors wells operating simultaneously in the field that inject water into the reservoir to increase its pressure, and the amount of water injected by a certain injector may influence the flow rate of one or more producer wells. At the surface facilities, liquid production rates are separated in, e.g., water, gas and oil rates, and can be processed to obtain valuable and useful information about the relationship between producer and injector wells. In any active field, production and injection flow rates are generally the most abundant data available, and can be used to optimize oil recovery when employed to forecast injection flow rates, determine when and where drill new wells, improve the production design and estimate interwell connectivity.
There are several different techniques, statics and dynamics, to estimate the communication between wells. Some Static techniques are interference test (Du et al., 2002;Maizeret, 2013) and tracer test (Zhang et al., 2001). As example of Dynamic techniques, we have streamlines analyses (Datta-Gupta and King, 2007), flow diagnostics (Moyner et al., 2014) and signal processing techniques (Wang et al., 2020) and the Capacitance Resistance Model (CRM) by Yousef et al. (2005). Some traditional techniques, e.g. the interference test, require in some level reservoir characterization or fluid distribution description to run, what may limit their applicability. In this context, the CRM shows as an alternative to estimate inter-well connectivity, once only production data is needed to run the model. It also can be used to qualitatively indicate the presence of channel, sealing faults, and allocate wells (Weber, 2009). Usually its parameters are used to forecast field production. CRM is a data-driven model that can carry out field history matching and heterogeneities detections only using available field production history data. Sayarpour (2008) details the CRM development and shows an analytical solution based on the superposition in time and space for three different reservoir control volumes: the entire field (CRMT); the drainage volume of a single producer (CRMP), and the volume between an injectorproducer pair (CRMPI). For example, Lins et al. (2017) applied the CRM to compute the connectivity between layers in a compartmentalized reservoir, and Mirzayev and Jensen (2019) estimate the connectivity in a low permeability reservoir using pseudo-wells.
In this study, we apply the CRMP model, widely used in recent studies (Lins et al., 2017;Tueros et al., 2018), that describes the relationship between a pair of injector and producer with only one parameter called connectivity. This model has three parameters: connectivity, that represents the rate fraction of a given injector that influences the liquid rate of a specified producer; time constant, that quantifies the degree of fluid storage between wells; and productivity index, that indicates how the pressure changes in the pore drainage volume around a producer affect its rate.
The CRMP parameters can be obtained using different strategies: Nonlinear Regression (Albertoni and Lake, 2003;Gentil, 2005;Kaviani and Valkó, 2010;Liang et al., 2007;Sayarpour et al., 2007), Nonlinear Least Squares method (Holanda et al., 2015;Lins et al., 2017;Moreno, 2013;Tueros et al., 2018;Weber et al., 2009) and Neural Networks (Liu et al., 2020;Panda and Chopra, 1998). Here, we used the nonlinear least squared method. Therefore, with CRMP parameters and injection flow rates, the total liquid rates of producer wells can be predicted, and, if well type and location are kept fixed, the injected water volume can be optimized (Holanda et al., 2015;Hong et al., 2017;Jafroodi and Zhang, 2011;Sayarpour et al., 2009;Weber et al., 2009). Theorical arguments about the physical meanings and the strategies to obtain CRM parameters can be found in (Gentil, 2005;Noetinger, 2016). For example, Shahamat et al. (2016) used the capacitance and resistance concept and developed a mathematical model for compartmentalized reservoirs and detailed analytic solutions.
Over the past decade, CRMP strategy has been used in decision making process for waterflooding operations, but as every method, it has its advantages and disadvantages. As main advantage it presents the use of injection and production data as input to estimate fluid dynamics in the reservoir, without any prior knowledge of its physical and fluid properties. However, as a disadvantage CRMP cannot separate oil and water rates from output results.
The main objective of this work is to maximize the profitability of the field, and for this, it is necessary to know oil production, and water injection and production of wells. However, the CRMP cannot separate oil and water produced volumes. Therefore, fractional flow models can be used for this task. The most used models in the literature are Koval (Cao, 2014) and Gentil (Gentil, 2005).
The Koval model has been used for cases where the wells have low water production, immature fields (Cao, 2014), but, Hong et al. (2017) proposed a modified model for mature fields. Gentil (2005) proposed a model for mature fields with Wcut above 50%. In this work, we propose to combine the fractional flow models, Kogen, to address the weaknesses of the Koval and Gentil models using Wcut as a transition parameter between both models. The parameters of Koval and Gentil models can be obtained by NLS.
The SQP algorithm is used to determine Kogen parameters, and well control trajectories in the waterflooding problem (Alim, 2013;Pinto et al., 2020;Tueros et al., 2020). SQP is a gradient-based algorithm, so derivatives are required. Computing derivatives by finite differences in large scale problem becomes computationally prohibitive. A viable alternative is to use the ensemble-based method proposed by Chen et al. (2009). This method has been widely used in reservoir engineering by (Chen and Oliver, 2012;Chen and Reynolds, 2016;Do and Reynolds, 2013;Tueros et al., 2015;Zhang et al., 2018). Tueros et al. (2018) proposed refinement techniques on the ensemble-based method. The main advantages of the ensemble-based method are reduced number of simulations in the optimization process and readily coupling with reservoir simulators.
As mentioned earlier, we use the combined Kogen model to obtain oil and water cumulative productions and the ensemble-based method for calculating derivatives in the optimization process. Both strategies are applied to two reservoir models existing in the literature. We observed that the proposed strategy reduces the computational cost and the number of iterations and function evaluations during the optimization process.

Optimization problem formulation
In this section we present the waterflooding problem, optimization algorithms used to solve the problem, approximate computation of gradient and smoothing strategies to avoid wild changes of controls in the optimized trajectories.

Waterflooding problem
Here, the waterflooding problem with capacity operations may be formulated as an optimization problem with linear constraints where design variables are the controls of injector wells: In equation (1), the NPV is the objective function, d is the discount rate and s t is the reference time period at control cycle t. q t = the vector of rates at control cycle t and q w,t = injection rate of the well w at control cycle t. I = set of indices to the N I injectors, the life-cycle period subdivided into N t control cycles and Q inj,max = the maximum allowed total injection rate of the field. Superscripts l and u represent the lower and upper bounds of design variables. The cash flow at the control cycle t represents the oil revenue minus the cost of the water injection and production, given by: where P = set of indices to the N P producers, Ds t = time interval of the tth control cycle; q op w;t and q wp w;t are average oil and water rates at the wth production well at tth control cycle, respectively; q wi w;t = average water rate at the wth injector well at tth control cycle; r o = oil price, c wp and c wi = cost of water producer and injected, respectively. Indices op, wp and wi denote the oil, water production and water injection phases, respectively. On the other hands, in the problem, equation (1), we can also impose nonlinear constraints as maximum field liquid production.
In this study, the control variables are the injection rates in each control cycle which are normalized by the respective maximum injection rate of the well, q w,max : With normalization of controls in equation (3), we consider the following alternative formulation: Maximize : RðxÞ ! subject to : P w2I x w;t where x = column vector that contains all injection well controls; NPV = normalization factor of the function f(x) normally taken as NPV(x 0 ); x 0 = initial point; if x w,t = 0 means the tth control has zero flow and the well is closed at cycle t and the value 1 is the maximum allowed rate for injector wells.

Optimization algorithms
In this study, we use two optimization algorithms to solve our problems. Among the gradient-based optimization algorithms for solving large-scale problem, one of the most efficient both in number of iterations and function evaluation required in the optimization process is the SQP algorithm Liu et al., 2018). A detailed description of the SQP can be found in Biegler (2010). In addition, the SQP solves a subproblem, in each iteration to approximate the original problem. Another method that used is the Nonlinear Least Squares that is detailed below.

Nonlinear Least Squared method
In this work, we use the NLS method to solve the uncoupled problem of Section 4.1. In this method, the objective function is defined as: where T i (x) = is the difference between the numerical reservoir simulator and the approximate model responses.
For example, the real value may be the liquid rate of a given producer well obtained by the simulator and the approximate value is obtained using the CRMP model. By minimizing this function, we find values for the parameters that best correlate the approximate model and the real response. The gradient vector and Hessian matrix of the objective function in equation (6) are given by: Assuming that T i (x) tends to zero near the solution, then the first term of matrix of second derivatives (Eq. (6)) will tend to zero, and the Hessian matrix of the second derivatives f(x) can be approximated using only the first order derivatives of T i x ð Þ. The NLS is generally based on the Gauss-Newton approximation, which displays a quadratic convergence ratio near the solution, for those cases where the Hessian approximation is accurate, that is, when the residuals tend to zero in solution.
The advantage of using this method is that due to its mathematical structure it allows obtaining the Hessian matrix using only first-order derivatives. Moreover, by exploring this problem structure, the second-order convergence characteristic of Newton's algorithm can be obtained using only first-order derivatives.

Computation of the gradient by ensemble-based method
The SQP algorithm required the gradient vector in the optimization process. Here, we approximate the gradient using the ensemble-based method by Chen and Oliver (2012). The gradient vector computed by ensemble-based method is a nonlocal estimate considering the subspace spanned by the adopted control realizations that has a strong dependence on ensemble size (Chen and Oliver, 2012). The advantages are the reduced number of simulations in the optimization process, and the use of the simulator as a black-box and being therefore independent of the specific adopted software. It is necessary to generate different control realizations that must be simulated to compute the NPV. In this work, we use the steps proposed by Chen et al. (2009), , Su and Oliver (2010), Tueros et al. (2018) to generate the realizations of time-correlated controls: 1. Use Spherical covariance function (Do and Reynolds, 2013;Tueros et al., 2015) to generate the prior covariance matrix, C, for control variables, avoiding abrupt changes with time. 2. Compute the Cholesky decomposition of the covariance matrix, LL T = C. 3. Inform the initial values of the controls to be perturbed, x 0 2 R N x (this is only informed on the first iteration, for the following iterations use the optimal point of the previous iteration), where N x = total number of control variables. 4. Adopt a standard deviation to produce perturbations of the control variables d. The magnitude of the perturbations set by d needs to be balanced not to be too small or too large, which is problem dependent (Su and Oliver, 2010). 5. Create a vector of independent random variables with zero mean and unit variance, Z. 6. A realization of the vector of control variables, x r 2 R N x , is given by: The second term in equation (7) generates smooth control realizations for gradient computation. This does not guarantee smoothness of controls during the optimization process . In this study, the realizations of control variables for ensemble-based satisfy all constraints (e.g., wells and capacity allowed).

Computation of the gradient using sensitivity matrix
The following steps are used to calculate the sensitivity matrix and to obtain the ensemble-based approximate gradient : 1. Generate a matrix X 2 R N x ÂN r containing the set of all feasible control realizations using equation (7), where Nr = ensemble size. In this work, we used ensemble size with 10, 20 and 30 realizations.
2. Compute the ensemble-based covariance matrix C xx 2 R N x ÂN x (Chen and Oliver, 2012). 3. Calculate the ensemble-based cross-covariance matrix between controls and well NPV's. 4. Compute the sensitivity matrix as the product of the pseudo-inverse of the covariance matrix by the crosscovariance matrix. 5. The approximate gradient is the sum of the columns of the sensitivity matrix.
The accuracy of the sensitivity matrix is strongly dependent on ensemble size. However, using large sizes increases computational cost during the optimization process.

Smoothing factor
In this study, we use the smoothing technique proposed by Tueros et al. (2020) devised to avoid abrupt jumps in control trajectories during the optimization process. The strategy is based on adding a new term to the objective function of equation (4). The optimization problem with the smoothing term can be expressed as: where N v = total number of controls by wells; b = smoothing factor. We obtain the original optimization problem, if this factor is zero. In this study, we used b = 10 À3 for all examples as recommended by Tueros et al. (2020). The gradient of the smoothing term can be easily computed analytically.

Illustrative example
To develop an understanding of the strategies proposed, we present a synthetic model based on the Brush Canyon Outcrop (BCO; Oliveira, 2006), with an internal sealing fault (Fig. 1). The cells that model fault have permeability, porosity, and transmissibility set to zero. The model has six layers, with 55 Â 53 active cells. The horizontal permeability varies from 157 to 2592 mD (Fig. 1b). The vertical permeability is 30% of the horizontal permeability, porosity varies between 16% and 28%, and viscosity is 0.11 cP. The reservoir has twelve wells, seven producers and five injectors, and wells are completed vertically (Fig. 1a). Producer wells operate with constant BHP (11 000 kPa). Injector rate has an operational range between 0 and 2500 m/day. The lifetime of the reservoir is 24 years, 4 for production history and 20 for the optimization process. The wells opening during production history is shown in Table 1.

Capacitance and resistance model
This work uses the Capacitance and Resistance Model based on Producer wells (CRMP) proposed by Sayarpour et al. (2007), which is derived from the material balance equation (oil and water) and does not depend on well locations to quantify the communication (connectivity) between a pair of injector and producer wells. The CRMP has three parameters that collectively produce a good representation of the reservoir flow dynamics. Its great advantage is not to require any geological or fluid reservoir characterization to construct the model, using only the available data from the field production history.

CRMP
In waterflooding operations, there are multiple producers and injectors operating simultaneously and, consequently, a given producer can have its production rate influenced by more than one injector at a certain period. CRMP considers injector rates and BHP of producers as inputs signals that perturb the reservoir dynamic, generating the total liquid production rates as outputs (Fig. 2). The derivative continuity equation for a producer well j is expressed as shown by equation (9) and, it is developed from the material balance equation. Its parameters and derivation can be found in Sayarpour et al. (2007): where s j = time constant for a producer j; q j (t) = total fluid production of producer j at time t; k ij = connectivity for a pair injector/producer well; I i (t) = rate of injector i at time t; J j = productivity index of producer j; P wf,j = bottom hole pressure of producer j; c t = total compressibility; V p = drainage pore volume of a producer j (connected porous volume). Integrating equation (9) over a discrete time period, Dt, we obtain the total liquid rate of a producer j, at reservoir conditions, at a certain time k, which can be written as (Holanda et al., 2015;Lins et al., 2017;Sayarpour et al., 2007;Weber, 2009): where, the first term in the right-hand side corresponds to the initial response of the production rate associated with primary depletion, the second is associated with the injection contribution and the last term with the influences of BHP changes in the producers. Figure 3 shows a schematic representation of the effect of injector wells on the liquid rate of a given producer in waterflooding operations. We can also observe that for a certain field, there will be multiple producers and injectors operating simultaneously and, subsequently, more than one injector well can influence the production of a given producer well at a certain period. Gentil (2005) and Yousef et al. (2005) discuss the physical meaning of the connectivity and time constant parameters, respectively, and their insights were used for a better understanding of the CRMP model assumptions.  Wells in production (11 000 kPa) and injection (2500 m 3 /day) 1 1 P-5 2 9 0 I -5 3 180 P-5, P-7 -I-5, I-4 4 270 P-5, P-6, P-7 -I-3, I-4, I-5 5 360 P-3, P-5, P-6, P-7 -I-2, I-3, I-4, I-5 6 420 ALL 7 500 ALL 8 900 ALL 9 1461 ALL To determine the CRMP modal parameters k ij , s j , J j , i = 1. . .N I ; j = 1 . . . N p , we solve the following constrained nonlinear least squares problem (Weber, 2009 where, q obs j;k = observed liquid rate of the producer j at time step k and q cal j;k = liquid production rate calculated by the CRMP for the producer j at time step k. The observed data is obtained from the actual field production history, but here we use a commercial reservoir simulator CMG-IMEX (2018) to generate them. It is important to highlight that the number of unknown parameters depends if BHP is considering constant or variable. In this work, we consider constant minimum BHP in all producer wells.
The parameters of the CRMP problem are obtained by solving two optimization problems: an uncoupled problem, where only limit constraints are imposed, and another that considers all constraints, coupled. The uncoupled problem response serves as a warm start point for the other problem and has the advantage of reducing the number of iterations during the optimization process in the coupled problem.
The parameters are computed using one realization of controls, knowing that the approximation depends mainly on the available data (Albertoni and Lake, 2003). The following steps are followed for the parameter computation (Lins et al., 2017;Tueros et al., 2018;Weber, 2009): 1. Set random BHP values for producers and injection rates on the simulator. These values are set to change every 30 days during the simulation period (Holanda et al., 2015;Tueros et al., 2018). 2. Use production data after all producer's present go through water breakthrough. 3. Using NLS method to solve the uncoupled problem of equation (12) for each producer well, j = 1 . . . N p , to warm start the main coupled problem.
If a pair of injector/producer wells presents a high connectivity value, it evidences the presence of a channel or a fracture between them. However, a low connectivity value indicates the existence of a low permeability zone or barrier in the interwell region. Table 2 presents the parameter results for BCO model. Wells P-1 and P-4 present high connectivity values with respect to I-1 and I-4, which are the injector wells at the same side of the fault. Although, their connectivities related to the remaining injectors are smaller with exception for the pair P-1/I-2 with a high interwell permeability zone that increases its connectivity value.
Regarding the time constant, P-4 presents the highest value which is explained by its location in the reservoir. This well is surrounded by low permeability regions and its closest injector is I-4. By contrast, P-2 shows the lowest value for the time constant indicating that the pressure changes in its surroundings are felt in shorter amount of time, as a result of the proximity to I-2 and I-3.
For the other producer wells, the same analysis was made to validate the results. Also, a history match and rate prediction were conducted to check the CRMP parameters quality and reliability. The time window for history matching was chosen such that water breakthrough had already occurred at all producers to reduce effects of nonlinearities in the CRM parameters (Holanda et al., 2015). For this case, the chosen time window was between 0 and 2200 days. The curves for history match and rate prediction for the wells P-1 and P-5 are shown in Figure 4. From the results, one can notice that CRMP model captures the reservoir dynamics well using only production data.

Fractional flow models
In the previous section, the applied CRMP model computes total liquid rate of producer wells as a single phase. However, in most of the field operations and waterflooding optimization problems is important to separate oil and water rates.
From the liquid rate, q j,k , it can be obtained water production rate, q wp j;k , and oil rate, q op j;k , of producer j at time k as: q wp j;k ¼ q j;k Wcut j;k and q op j;k ¼ q j;k 1 À Wcut j;k where Wcut j,k = water cut of producer j at time k.
In equation (13) it is necessary to calculate the Wcut of producer wells. Therefore, we proposed a combined fractional flow model (Koval + Gentil) to adjust Wcut curves of producer wells. Koval, Gentil and Kogen fractional flow models are presented in the next sections. (2008) and it is derivative from the Koval Theory (1963), which is used to capture the effects of water fingering generated by reservoir heterogeneity and unfavorable mobility ratio between displacing and displaced fluid. Cao et al. (2015) combine the CRMP and Koval models to separate oil and water production. This strategy shows a good forecast when the reservoir is immature.

Koval model was introduced by Sayarpour
From the Buckley-Leverett equation, modifying the viscosities ratios and after some rewriting, we obtain the following equation to determine the Wcut (Cao et al., 2015;Hong et al., 2017): where kval j = Koval factor of producer j; t D,j = the dimensionless time for a producer j; CI j,k = the cumulative water injected contribution from all injectors to producer j during a certain period Dt and, V p,j = the drainage pore volume of producer j. We assume that the water injection rate of an injector is constant over Dt and equal to I j,k , respectively. Using the connectivity parameters, k ij , from CRMP, the CI j,k is defined as: The dimensionless time, t D,j , may be interpreted as the volume of cumulative water injected in terms of pore volumes, implying that we must identify the contributions from all injectors to the production of a certain producer j in a reservoir, which can be determined using the CRMP model as shown in equation (15) (Cao, 2014). The modal parameters kval j and V p,j , can be found by solving the following constrained nonlinear least squared problem: where, V p, field = the total pore volume of the field. In this work, the kval l = 1. It is important to highlight that if V p, field is not available, the constraint can be neglected. In this work the problem is solved with limit constraint only unlike procedures proposed by Cao et al. (2015) and Hong et al. (2017). Therefore, parameters can be obtained independently for each well. Figure 5 compares Wcut values obtained by the simulator and Koval model for wells P-1, P-4 and P-5. We can see a good forecast in the early years and the remaining time shows loss of quality. Therefore, using this Wcut provides a strong inaccuracy in the calculation of oil rate. Similar results were also observed in the works of Cao (2014) and Hong et al. (2017).

Gentil model
In Section 4.1 it was discussed a fractional model that performs better when Wcut is small and the reservoir did not achieve significant water breakthrough. However, for large multi-wells reservoir, the time that breakthrough is achieved for each producer well will vary significantly, with some wells producing high amount of water whereas others do not. This diverse system highlights the necessity of a data driven model applicable when the Water-Oil Ratio (WOR) is greater than 0.5, on other words, when the producer wells have high Wcut value.
The Wcut j,k can be obtained from water-oil ratio as: where WOR j,k = Water-Oil Ratio of producer j at time k. Gentil (2005) proposed an empirical power law to fit the historical data by assuming a nonlinear relationship between water-oil ratio and the cumulative water injected, CI j,k (Eq. (15)) and express as Liang et al. (2007): where, a j and b j are the fractional flow model parameters for each producer j. This expression can be transformed into a linear form given by Gentil (2005) and express as: Equation (19), has no constraints that couple the Gentil model parameters for all producer wells, so their fit can be determined independently to history values of the water/ oil rate natural log, LWOR j,k . Equation (19) is valid only for later time data (e.g. the water cut must be greater than 0.5 (Gentil, 2005). This happens because applying the natural logarithm to equation (18), the linear relationship between water oil ratio and "CI" is only observed when Wcut is large. This model is usually fitted to a smaller time window than the CRMP model.
The decision variables in the Gentil model can be obtained by solving the following nonlinear least squared problem: Minimize where LWOR obs j;k = observed natural log of the water-oil rate for producer j at time step k. LWOR cal j;k = the log of the calculate water/oil rate for producer j at time step k, equation (19), and koil = the initial value of the time window used to calculate the parameters. A similar procedure was used by Weber (2009). Figure 6 shows the results obtained for wells P-1, P-4 and P-5. It can be noticed that even though Gentil approach is applied for mature waterflood reservoirs, if the Wcut change with time is not so abrupt, the model is capable to capture the Wcut behavior for values lower than 0.7. However, most results confirm that analysis data window should be in late horizon time, where the Wcut is greater than 0.7 in which case numerical errors are consistently smaller.

Kogen model
In Figure 7a we compare the water cut obtained by the simulator with Koval and Gentil models. It can be observed that Koval shows good fit for initial times and Gentil for later times. To overcome the limitations of Koval and Gentil, we propose the combined Kogen model based on the previously mentioned models. Kogen fits well to immature and mature fields, using only available production history data and CRMP parameters. Kogen main parameter is based on the difference in water cuts of the Koval and Gentil models at the moment of transition between the two (Fig. 7b), hereafter referred to as t s .
The combined Kogen model can be formulated as a constrained nonlinear curve fitting, expressed as: In equation (21) the objective function, MisFit j represents a measure of the difference between calculated and observed Wcut or NPV. The nonlinear constraint is the difference in Wcut values of the Koval, Wcut j,sk , and Gentil, Wcut j,sg , models at the time of transition, t s . In Equation (21), a is the jump allowed between Koval and Gentil models. In this work we used a = 20%. The other parameters are associated with the Koval and Gentil models. The problem is solved using the SQP algorithm and its gradient is obtained by finite differences. As commented previously MisFit j function can be defined by either of the following two expressions: where, NPV obs j;k = net present value of the producer j observed by simulator in the step time k; NPV cal j;k = net present value of the producer j calculated in the step k; Wcut obs j;k = water cut of the producer j observed by simulator in the step time k. Wcut cal j;k = water cut of the producer j calculated in the step k.
The performance of both expressions for misfit function are compared in Figures 8 and 9. It can be observed the Wcut-based function performs slightly better than the NPV-based.
Results show that the proposed strategy is able to better approximate water cut with the simulator result, when compared to the individual Koval and Gentil models. Even though Kogen has more variables than the other models, it is fast and requires only minimal information.
Furthermore, in order to be able to evaluate waterflood performance we develop a combined fractional flow model that together with CRMP technique generates an efficient proxy model, which produces promising starting points for robust optimal management problems.

Examples
Two synthetic reservoirs, BCO and Brugge models, are used to assess the proposed strategies, the proxy model CRMP + Fractional flow models, Koval, Gentil and Kogen, for function evaluation during the optimization process.
The optimization algorithm of choice is the SQP available in MATLAB Optimization Toolbox (2014). The gradient is computed by an ensemble-based method. Commercial simulator IMEX is used to generate high fidelity field production data.
The results statistics of the objective function are presented in box-plots. The limits of the box-plot are 25% and 75% percentile, the line (red) in the box-plot is the median, the extreme lines are the maximum and minimum values, and the crosses are the outliers. The economic parameters used to calculate the NPV for the all examples are oil revenue, 80 $/bbl, water injected and produced price, 5$/bbl, and annual discount rate of 10% per year.

Example 1. BCO model
The BCO model described in Section 2.6 is used in three different studies using an ensemble-based method for the optimization process. The first two use the same random starting controls. First, we analyze the most appropriate ensemble size for computing the gradient vector by comparing optimization results, using simulator to evaluate production response. Second, we make a comparative study of the optimization results using the fractional flow models against the previous best results. Finally, we use the best solution of the fractional flow models as a warm start point for a new high-fidelity optimization process in order to assess the quality of proxy-based results. The objective is to maximize NPV for a period of 24 years, four for history and 20 for optimization. Well controls are changed every six months, i.e., there are 40 control cycles. The total number of variables is 200. The rate of injector wells can vary from 0 to 2500 m 3 /day, while the maximum allowed field injection rate is 7000 m 3 /day.

BCO by ensemble-based method
In this exercise, we analyze the size of the ensemble for calculating the gradient vector. The initial NPV for this case is 3.991 Â 10 9 . The ensemble-based method is of a stochastic nature, so to have efficient comparisons, 10 optimization process runs were performed. Figure 10 compares optimization results obtained with different ensemble sizes. The first plot (left) was obtained with an ensemble of Nr = 10 and the others with 20 and 30 respectively. The median obtained with Nr = 30 was 5.45 Â 10 9 and shows a small variability in the results. The best response was 5.507 Â 10 9 , requiring 1400 calls from the simulator during the optimization process. The gain compared to the medians of the plots Nr = 10 and Nr = 20 was 5% and 2%, respectively. From the results presented in Figure 10, we observe that when we increase the size of the ensemble, from 20 to 30, the gain is small. For these reasons, we adopt Nr = 30 for the other examples in this work.
In Figure 11 we show the optimal trajectories of the wells for the best result obtained in Figure 10. The smoothing strategy helped to reduce abrupt changes in the trajectory of the controls. The injectors I-4 and I-5 are kept at maximum rate. Injectors I-2 and I-3 remain closed for the first few years.

Comparison of different fractional flow models
With the injection rates and CRMP parameters we predict the liquid rates for the production wells. To calculate the NPV, it is necessary to determine the oil and water rates.
In this example, we analyze two scenarios: first, we compare the performance of the three fractional flow models, Koval, Gentil and Kogen, to separate the oil and water rates during the optimization process. Finally, the best solution is compared with the simulator production results. The initial NPV obtained with the Koval, Gentil and Kogen models are 3.351 Â 10 9 ; 3.873 Â 10 9 and 3.941Â10 9 , respectively (3.991 Â 10 9 , using the simulator). Figure 12a shows the optimization results using the fractional models (Koval, Gentil and Kogen; from left to right) in three box plots. The median for Kogen is 4.975 Â 10 9 , showing a gain of 20.2% and 2.65% with respect to the median values of Koval and Gentil. The best response for each of the models in Figure 12a is 4.188 Â 10 9 ; 4.878 Â 10 9 and 5.015 Â 10 9 . Figure 12b shows the iteration histories for the best result of each fractional model. The best NPV by Kogen was obtained in 8 iterations, while Koval and Gentil in 7 and 13 iterations, respectively. Finally, Kogen proved to be the most efficient.
Another case studied was to compute optimum NPV using Kogen and replacing the ensemble-based method with the Finite Difference (FD) method for calculating the gradient. The final NPV was 5.388 Â 10 9 in 30 iterations. The FD solution improved the NPV but at larger number of Kogen calls.
In Table 3 we compare the response of the fractional flow models with the results obtained directly by the simulator. In all cases we observed that there is a reduction in NPV.

Ensemble-based optimization with a warm start
In this case we use the best solution for fractional flow models as a starting point for high-fidelity optimization (ensemble-based method using simulator response). The comparison of the results is shown in the four box plots of Figure 13. For example, the curve for Koval represents the results after high-fidelity optimization with the best result obtained by Koval model. We find that the results are consistent with those obtained in Section 5.1.1. The main advantage of the strategy is the reduction in the number of simulator calls during the optimization process. For example, the best responses using Koval, Gentil and Kogen as warm starts were obtained in 12, 10 and 9 iterations with 540, 420 and 420 simulator calls, respectively, which are significantly less than the 1400 simulations used to the best response of Figure 10. In the following examples, we use only the Kogen model.

Example 2. Brugge model
The example is based on the Brugge model with single completion per well. A detail description of the model is documented in Peters et al. (2009). We used the realization #28 (Fig. 14). The objective is to maximize NPV for a period of 20 years, using the preceding 10 years for production history. All wells operate from the very first day. An explanation about the connectivity parameters computation of the Brugge model can be found in Oliveira et al. (2020).
All 400 design control variables are the rates of the 10 injectors well using six months control steps. The rate of injector wells can vary from 0 to 4000 bbl/day, while the maximum field total injection capacity is 30 000 bbl/day. The producer wells operate with constant BHP of 1000 psi.
Below we present two studies: first, in order to establish the appropriate ensemble size, we compare high fidelity optimization results varying the number of ensemble members for computing the gradient vector. Second, we use the ensemble-based method with Kogen model to maximize NPV and the best response is used as a warm start point for high-fidelity optimization. The initial well controls were randomly selected.

Brugge model by ensemble-based optimization
In this exercise, the initial NPV is 1.12 Â 10 9 . The results of high-fidelity optimization are shown in two box plots of Figure 15. The leftmost plot is obtained for ensemble of Nr = 20 and the other with Nr = 30. The median value with Nr = 30 is 1.54 Â 10 9 and shows small variability in responses. The best NPV is 1.555 Â 10 9 , where 1330 simulator calls were needed in 30 iterations. The gain obtained with Nr = 30 is 3% with respect to the median value for Nr = 20. Henceforth we adopt Nr = 30. Figure 16 shows optimal control trajectory for the best result. We observed that optimal controls do not show abrupt jumps.

Ensemble-based optimization with Kogen model
Initial NPV using is 1.01 Â 10 9 which is close to 1.12 Â 10 9 using the simulator responses. The leftmost box plot of Figure 17 refers to results obtained using the same random initial controls of the previous section. The other plot uses the controls of the best solution as starting point. The median value of the first plot is 1.30 Â 10 9 and maximum response is 1.348 Â 10 9 requiring 15 iterations. Using simulator results we obtain an NPV of 1.328 Â 10 9 . High-fidelity simulator results are used for the second plot. The median     NPV is 1.537 Â 10 9 , while the best result 1.543 Â 10 9 was obtained in 10 iterations with 480 simulator calls.

Discussion and conclusion
Fractional flow models present in the literature are used specifically for immature or mature fields. In this work, we present a general combined model, Koval + Gentil, called Kogen, to separate oil and water from total liquid production of a producer well, using only the production data and CRMP parameters. The performance of the proposed model was compared with the fractional flow models, Koval and Gentil, for optimization processes. The main advantage of Kogen model is its capacity to forecast the reservoir production during the whole concession period. Two strategies were presented to determine Kogen model parameters: a misfit function based on water cut and another based on NPV. The results show that the first presented a slightly better adjustment. The adopted maximum water cut difference at the switch time between Koval and Gentil models was 20%.
The combination of CRMP and Kogen models, proxy CRMP-KOGEN, works as a tool for forecasting oil and water production, which is essential for optimizing the field production. The results were used as a warm start point for high-fidelity optimization, which uses the simulator, in order to decrease the number of functions evaluations.
Two examples from the literature, BCO and Brugge, were used to evaluate the advantages of the proposed strategy. The algorithm selected to solve the optimization problem was the SQP, and the ensemble-based method was used to calculate the gradient vector.  A parametric study was developed to define the most appropriate size of ensemble for the high-fidelity optimization process. It was observed that with Nr = 30 the gains obtained were 2% and 3% when compared with results for Nr = 20, in the BCO and Brugge models, respectively. Therefore, it is recommended to use 30 realizations per ensemble for future examples. With this value, water injection optimizations were performed using proxy models with CRMP coupled to the three fractional flow models comparing final NPV values. Proxy-Kogen model showed gains of 20.2% and 2.65% with respect to the proxy models with Koval and Gentil fractional flow models, respectively. When the optimal controls of proxy models were run in the simulator it was observed a reduction of about 3.22% in NPV values.
When the proxy-Kogen model result was used as a warm start for the high-fidelity optimization process, we observed gains of 15% in the final response. This strategy also allowed to reduce the number of simulations by approximately 65%, during the high-fidelity optimization process.