Regular Article
An integrated Shannon Entropy and reference ideal method for the selection of enhanced oil recovery pilot areas based on an unsupervised machine learning algorithm
^{1}
Department of Petroleum Engineering, Amirkabir University of Technology, Tehran 1591634311, Iran
^{2}
Department of Chemical and Petroleum Engineering, Sharif University of Technology, Tehran 1458889694, Iran
^{*} Corresponding author: mehdi@aut.ac.ir
Received:
12
January
2021
Accepted:
10
November
2021
Pilotscale enhanced oil recovery in hydrocarbon field development is often implemented to reduce investment risk due to geological uncertainties. Selection of the pilot area is important, since the result will be extended to the full field. The main challenge in choosing a pilot region is the absence of a systematic and quantitative method. In this paper, we present a novel quantitative and systematic method composed of reservoirgeology and operationaleconomic criteria where a cluster analysis is utilized as an unsupervised machine learning method. A field of study will be subdivided into pilot candidate areas, and the optimized pilot size is calculated using the economic objective function. Subsequently, the corresponding Covariance (COV) matrix is computed for the simulated 3D reservoir quality maps in the areas. The areas are optimally clustered to select the dominant cluster. The operationaleconomic criteria could be applied for decision making as well as the proximity of each area to the center of dominant cluster as a geologicalreservoir criterion. Ultimately, the Shannon entropy weighting and the reference ideal method are applied to compute the pilot opportunity index in each area. The proposed method was employed for a pilot study on an oil field in south west Iran.
© S.M. Motahhari et al., published by IFP Energies nouvelles, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Maximizing the Recovery Factor (RF) and economic profit are the main goals of Enhanced Oil Recovery (EOR) methods. Pilot design plays a vital role in the road map of EOR planning and risk reduction. Therefore, EOR pilots are conducted to reduce uncertainty in EOR performance [1–4]. In the last 50 years, a significant number of EOR field pilots has been studied and the gained experience has been used in the oil and gas industry [5]. The results of the implementation of EOR pilots may increase consciousness regarding hydrocarbon field behavior and EOR scenarios designed for entire hydrocarbon field.
Pilot area selection is one of the imperative factors in designing an EOR [6]. The objective of the pilot location studies is determining how to narrow down the pilot candidate areas from the field extent to only an optimum area of interest.
Three methodologies have been presented in the literature for pilot area selection. The first category presents some qualitative insights regarding the pilot area, such as the necessity of using predictive reservoir simulation models [4, 7–10]. These studies do not provide any quantitative methods to select a pilot area. In the second category, the reservoir area is filtered to select the pilot area according to some merit parameters [11–13]. These studies are based on reservoir past performance. In the final category, the reservoir is divided into a number of areas and the pilot area is selected on the basis of a semiquantitative approach. In a previous study [14], a hydrocarbon field was divided into six square kilometers parts and each part labeled as a locator. Consequently, locators were classified based on permeability patterns and the number of waterinvaded zones. Permeability patterns were described as the combination of the average permeability of the different layers. The average value was marked per layer as shown in Figure A1 (Appendix A). The locators were chosen as pilot areas from a class that has a maximum number of dominated pattern.
According to other studies [15, 16] that have been carried out, the field was divided into six areas as shown in Figure A2 (Appendix A). The size of the areas can be arbitrary. A semiquantitative approach was chosen to rank each of the six areas for pilot implementation. A list of selection criteria, including the qualitative ranking (low, medium, and high) in each area is shown in Figure A3 (Appendix A). Although this approach tries to determine the part with the best properties, each selected part is not representative of the whole field. Moreover, predefined patterns such as permeability patterns were applied to identify dominated pattern in this category.
In addition to the methodological approaches, criteria based on reservoirgeology and operationaleconomic issues could be considered. According to the geological criteria of the reservoir, the pilot area should have alike behavior to the reservoir to be an appropriate representative and achieve the EOR goals [4, 12, 14]. However, an area with moderate to good reservoir properties is another selection criterion. For example, having remaining oil saturation in the range of 0.2–0.5 could lead to an appropriate pilot area [11]. According to other studies [7, 8], high remaining oil saturation has been applied to locate pilot areas. In addition, a lower ratio of vertical to horizontal permeability and higher lateral permeability continuity have also been a basis for pilot area selection [15, 16].
Operationaleconomic criteria refer to uncertainty and economic aspects of pilot area selection. An area with a higher accuracy in analysis of pilot production performance is suitable from an operational point of view. Moreover, a lower cost of pilot implementation is desirable too. The level of knowledge, the interaction of pilot and production area, the distance between a candidate area and surface facilities and the number of existing applicable wells for running a pilot are all important operationaleconomic criteria. For example, a pilot area could be selected based on the quality and quantity of the available petrophysical data [4, 14, 15]. In addition, less interaction between the pilot and production areas could lead to a more reliable interpretation of the gathered data [4, 11–13]. Besides, the closeness of the pilot to existing facilities could lead to economic aspects [12–14]. Finally, existing wells may be used to reduce pilot implementation costs [13]. The pilot operation will assist in receiving relevant operational and technical experience and establishing required quality control [17].
It is therefore apparent that having a systematic and quantitative approach is necessary to choose the optimum pilot area. Such an approach should consist of a better representative of reservoir features along with operationaleconomic criteria. In this paper, a methodology is presented to select the optimal size and location of the pilot for an EOR project. To do so, a pilot opportunity index is introduced. The proposed index is defined based on integration of reservoir quality maps (from reservoir simulations), fuzzy clustering, Shannon entropy, Analytical Hierarchy Process (AHP) and the Reference Ideal Method (RIM) as a multicriteria decisionmaking method. Finally, the presented method is utilized to a real reservoir case and the optimum pilot area is thereby determined.
2 Methodology
The three key steps for choosing the location and size of a pilot established are:
Pilot candidate areas specification;
Reservoir representativeness quantification and calculating its value for each pilot candidate area;
Pilot opportunity index calculation and selecting the optimum pilot region.
Figure 1 illustrates the workflow of the planned scheme.
Fig. 1 Workflow of the planned scheme. 
2.1 Pilot candidate areas
In the model of hydrocarbon reservoir, the field is segmented into pilot candidate regions of an equal size. Then, the fivespot pattern are utilized to locate wells in each region.
Afterwards, a sensitivity analysis is performed for the Net Present Value (NPV) in regard to the production and injection wells distance, which reflects the size of the pilot. The NPV is the most commonly used economic objective function and is shown in equation (1):
where t_{1} is the start time of production, T is the overall time of econometry, P_{o} is the unit price of oil, q_{o} is the oil production rate, is the unit cost of handling the produced water, is the water production rate, is the unit cost of injecting water, is the water injection rate, C_{o} is the unit operating cost per barrel of produced oil and r is the discount rate. R(t_{0}) is the initial capital expenditure, as shown in equation (2), where C_{facility} is the cost of facility installation, n_{well} is the entire number of injection and production wells and C_{well} is the drilling cost of a well. Consequently, the optimal pilot size has the highest NPV. Finally, the size and number of pilot candidates are determined.
2.2 Impression of reservoir representativeness
Each candidate region has a reservoir dynamic behavior. Considering all areas, it is possible to classify the behavior. The class which has the highest number of areas is the representative of reservoir behavior. The behavior of each area is characterized by two arrays of cell RF values and the covariance matrix of the residual oil volume in cells positioned in each candidate region.
2.2.1 Quality maps
The various static and dynamic parameters could be combined together to form the quality maps. Due to dynamic parameters such as oil saturation, the quality maps change during the production period [18, 19]. The quality map is applied to solve decision making problems as an auxiliary analysis with parameters like cumulative oil production [20].
The 3D timedependent quality maps of residual oil volume is computed by using equation (3) to aggregate various parameters:
where A_{i,j}, h_{k}, ∅_{i,j,k} and So_{i,j,k,t} are the surface area, the height, the porosity and the simulated oil saturation of cell (i, j, k) in the given time period, respectively. Equation (4) is applied to calculate the RF in each cell based on these quality maps:
where t_{s} and t_{e} are the beginning and termination times of the simulation, correspondingly.
Equation (5) is applied to illustrate the covariance matrix of the residual oil volume for a typical candidate region. For instance, is a set of annual residual oil volume sequence from t_{s} to t_{e} in the first cell of the area:
2.2.2 Clustering
The number of clusters of reservoir behaviors could be specified by Fuzzy clustering. Clustering is a subset of unsupervised machine learning [21] and data mining [22] methods. It is used to assess the flow field in hydrocarbon reservoirs [23–25].
The similar objects are placed in a same group using clustering analysis. Equation (6) [26] is utilized to define the objective of fuzzy clustering. The number of clusters and cluster center are signified by c and c_{i}. A multidimensional space is denoted by X that includes n candidate areas. For the jth candidate region in the cluster i, the membership degree is indicated by u_{ij}.
Furthermore, v and d_{ij} are identified as the fuzzifier and the Euclidean distance of the jth candidate region from the c_{i}.
The optimal number of clusters is determined using two cluster validity indices: (1) XB (Xie and Beni) [27] and (2) fuzzysilhouette [28] as illustrated in equations (9) and (10)–(14), correspondingly.
The higher square distance between cluster centers (a_{j}) is more favorable, as well as lower mean square distance between a candidate area and its cluster center (b_{j}). Therefore, the lower and higher values of the XB and fuzzysilhouette are the result of highquality clustering:
2.2.3 Reservoir Similarity Index
For a candidate area, the intensity of the reservoir representativeness is indicated by the Reservoir Similarity Index (RSI). Therefore, the RSI is composed of the two normalized distances: 1) d(RF array, RFcenter) and 2) d(COV array, COVcenter). The RFcenter and COVcenter are the centers of the clusters that have the maximum size after conducting clustering on the RF array and the covariance matrix of the residual oil volume, correspondingly. The RF array and COV array are the Recovery Factor and Covariance matrix arrays of the residual oil volume of the candidate area. The weights of the two normalized distances are w_{1} and w_{2}. The RSI is calculated as the sum of the product of weight and the normalized value for multiplicative inverse of the two criteria using equation (15):
The Shannon entropy technique is utilized to determine the w_{1} and w_{2}.
2.2.4 Shannon entropy
The information content of an event as the concept of entropy was introduced by Shannon in information theory [29]. Shannon entropy is used to optimize data acquisition of the hydrocarbon fields [30]. Furthermore, the Shannon entropy can quantity the relative importance of criteria in a multicriteria problem [31]. The more important criteria have greater dispersion and their effects are higher on the RSI value. Considering m alternatives and n criteria, a decision matrix S of m × n elements is shown in equation (16), where S_{ij} is the jth criterion value for the ith alternative (candidate area).
Equation (17) is applied to normalize the decision matrix S. Then, equation (18) is utilized to compute the Shannon entropy. Afterward, equation (19) is employed to calculate the weight of each criterion:
2.3 Pilot opportunity index
The attractiveness of pilot implementation in a candidate area is indicated by the Pilot Opportunity Index (POI). Two significant features for POI computations are reservoir geological and operationaleconomic. Reservoir geological factors correspond to the reservoir representativeness intensity. Operationaleconomic factors refer to the reliability of the pilot performance results and the implementation cost of the pilot.
Therefore, the POI is based on the RSI, the level of knowledge, the interference between internal and external production of a candidate area, the distance from the surface facilities to candidate area, and the number of existing applicable wells for running a pilot in an area. Hence, the RIM in multicriteria decisionmaking methods is utilized to quantify the POI. Moreover, criteria weights are calculated based on the Analytical Hierarchy Process (AHP). The higher the POI is, the more desirable the pilot implementation in an area is. The pilot area is an area with the highest POI value.
2.3.1 Decision criteria
The criteria to evaluate the POI for each candidate area include:
RSI: In a candidate area, this criterion shows the value of the reservoir representativeness. Higher values of this criterion are preferable.
Interference between internal and external production of a candidate area: Pilot production performance could be affected considerably by nearby existing production wells. Minimizing this interference leads to a more accurate pilot production performance analysis. The drainage radius, the distance from a well where its normal pressure gradient approaches zero (i.e. zero fluid flux), is applied to quantify this issue [32]. In other words, there would be pressure interference in case the distance between two wells is less than two times the drainage radius. Therefore, to determine the interaction between the candidate area and the existing wells, two criteria are established, namely, the number of existing wells around a candidate area with a distance less than two times the drainage area, known as “interfered wells”, and the average distance between these wells and a candidate area. A lower number of interfered wells and higher values of the average distance are more favorable.
Level of knowledge: A higher quality and quantity of available data in a candidate area lead to a lower uncertainty in pilot design and evaluation. The variogram range, the distance limit beyond which the data are no longer spatially correlated, is applied to quantify this criterion [33]. The properties in cells that contain an existing well are known while the properties of other cells can be estimated by the known cell value. The estimation is possible in the case of the spatial distance between known and unknown cells being smaller than the variogram range. Furthermore, the error of estimation decreases as the distance decreases. Therefore, to determine the level of knowledge in a candidate area, two criteria are established, namely, the total number of existing wells inside and outside an area with a distance less than the variogram range to an area, known as “adjacent wells”, and the average distance between these wells and a candidate area. A higher number of total wells and lower values of the average distance are more favorable.
Distance between a candidate area and surface facilities: the lower the distance, the lower the cost. The closest route from areas to the surface facilities is the most desirable path.
Number of existing applicable wells for running a pilot in an area: This is the number of existing wells in a candidate area that matches the fivespot pattern in terms of location. Higher values of this criterion are more preferable due to less capital expenditure to drill new wells that fit the pattern in an area.
2.3.2 AHP
Analytical Hierarchy Process (AHP) combines mathematics and intuition to express the comparative importance of each criterion in a rational and consistent approach [34, 35]. Table 1 is applied to make the pairwise comparisons matrix among criteria.
In the pairwise comparisons matrix, i rows are compared with the j column for n number of criteria such that the main diagonal is equal to one. The results of the pairwise comparison on criteria are summarized in an (n × n) matrix in which elements w_{ij} (i, j = 1, 2, … n) are shown by equation (20):
The pairwise comparisons matrix is normalized by dividing the value of each element with the sum of the corresponding column. After normalization, the relative weight of each criteria is calculated by computing the average of each row.
The accuracy of the AHP output is related with the consistency of the pairwise comparison judgment. A Consistency Ratio (CR) is calculated by dividing the Consistency Index (CI) for the judgment by the Random Consistency Index (RI) for the corresponding random matrix using equations (21) and (22). The RI is in accordance with the degree of consistency, as shown in Table 2 for a different number of criteria. Moreover, the Cl is based on the largest eigenvalue (λ_{max}) and the number of criteria (n), as shown in equation (21). The weighted sum vector is calculated with a multiplication of the pairwise comparisons matrix of criteria in the vector of relative magnitudes. Then, the consistency vector is computed by dividing the magnitudes of the weighted sum vector of criteria by the vector of relative magnitudes. The largest eigenvalue (λ_{max}) is calculated by averaging elements of this vector:
The CR above 0.1% or 10% indicates that the pairwise comparisons should be revised, while smaller numbers mean comparison are more consistent.
2.3.3 RIM
Multi Criteria Decision Making (MCDM) is applied to solve selection issues including multiple criteria and alternatives for petroleum field development problems [36]. The RIM [37] multicriteria decisionmaking technique is applied to prioritize the alternatives (candidate areas) with respect to the criteria outlined in Section 2.3.1.
Equation (24) is applied to normalize the decision matrix S, as illustrated by equation (23). The range of values and ideal values for each criterion are shown by [A, B] and [C, D], respectively. Equation (25) is utilized to compute the weighted normalized matrix. Equation (26) is applied to calculate the distances between the ideal alternatives and a candidate area. In conclusion, equation (27) is employed to compute the score of each area:
3 Results and discussion on a case study
The studied field has been producing since 2006 and 31 wells have been drilled so far. The field’s oil in place is around 4 billion standard barrels. The number of grids in the three main directions x, y and z is 88 × 275 × 22, respectively. The field has 15 different rock types. Variogram model and its parameters are calculated for the porosity as shown in Figure 2. The variogram range is equal to 1500 m. Furthermore, drainage radius is 320 m according to interpretation of well test in the field.
Fig. 2 The variogram model of porosity values. 
A historymatched model is applied to predict production scenarios by reservoir numerical simulation. In case of enhancing the history match, it is proposed to use the integration of multistart simulated annealing with genetic algorithm [38].
Waterflooding is planned in the field to increase production from 20,000 to 110,000 barrels per day. Moreover, the maximum allowable bottom hole pressure in injection wells and the minimum flowing bottom hole pressure in production wells are 5880 and 3290 psia, correspondingly.
The pilot project is required prior to the design and field implementation of EOR. Hence, it is necessary to determine the size and location of the pilot area. The method established in this research was utilized to choose the best pilot region for this field.
Initially, the reservoir model is subdivided into equal size pilot candidate areas. In each area, production and injection wells are located according to a fivespot well pattern. The sensitivity analysis on the NPV is utilized to determine the optimum pilot dimensions. Data to compute the NPV via equations (1) and (2) are illustrated in Table B1 (Appendix B). The result of varying distance between production and injection wells, which reflects the pilot size on the NPV and cumulative oil production, is illustrated in Figure 3. There is the highest of NPV in case of well distance is 900 m.
Fig. 3 Determining pilot size by sensitivity analysis. 
The positions of the injection and production wells, along with existing wells in the reservoir, are illustrated in Figure 4 according to this distance. Moreover, reservoir segmentation resulted in 72 candidate regions, as illustrated in Figure A4 (Appendix A). Therefore, the number of grids for a candidate area in the directions x, y and z is 9 × 9 × 22, respectively. If an area does not contain all the 22 reservoir layers, it is removed. Therefore, six regions were removed. Consequently, there are 66 valid candidate regions in the case study reservoir.
Fig. 4 Fivespot patterns in the case study reservoir. 
Running of the production and injection well pattern in the simulator leads to oil saturation computation. With equation (3), the product of the pore volume in oil saturation resulted in the quality maps of the residual oil volume as shown in Figure A5 (Appendix A). By using equations (4) and (5), 1782member array of the recovery factor and 1588653member array of the covariance between cells are allocated to an area. By using equations (6)–(8), the candidate areas are clustered. The values of the hyperparameter (fuzzifier v) and convergence criterion are 1.5 and 1e–9 in fuzzy clustering, respectively.
By using equations (9)–(14), fuzzy clustering of the candidate areas are validated as shown in Figure 5. According to Figure 5, if the number of clusters is considered to be two and five, the clustering of the covariance between cells (Sil.F_COV and XB_COV) and the recovery factor (Sil.F_RF and XB_RF) arrays are optimal, accordingly.
Fig. 5 Determining the optimal number of clusters with cluster validity indexes. 
According to the preceding graph, clustering of candidate regions leads to two and five clusters for covariance matrixes and recovery factor arrays, respectively. Figure 6 shows membership of clusters based on the recovery factor arrays. Moreover, Figure 7 shows membership of clusters based on the covariance matrixes. Each point on these two charts corresponds to a candida region. Additionally, the * symbol shows centers of the clusters. Also, Tables B2 and B3 (Appendix B) show results of clustering such as cluster size. The dominant clusters are cluster 5 and 2 for recovery factor and the covariance matrixes, respectively.
Fig. 6 Clustering result using the colorcoding graph for the RF arrays. 
Fig. 7 Clustering result using the colorcoding graph for the COV arrays. 
Figure A6 (Appendix A) shows the distribution map of the clusters in the field with reference to Figures 6 and 7. The candidate areas in the same cluster are close to each other with respect to location. Therefore, there is a spatial relationship between the reservoir behavior, which is characterized by the covariance of the residual oil volume and the RF, and the position in the field.
The covariancecenter and RFcenter are utilized to show the cluster centers. Averaging all regions in dominant clusters leads to determination of these centers. Afterward, the distance from these two centers to candidate areas is computed as displayed in Table B4 (Appendix B).
Prior to computing the RSI value of each region, the weights of two criteria: (1) the covariancecenter distance from an area and (2) the RFcenter distance from an area are determined. Using equations (16)–(19) besides the criteria values in Table B4 (Appendix B), the weight of the first criterion is 0.313 while the weight of the second criterion is 0.686. The sum of product of weight and normalized value for multiplicative inverse of the two criteria has resulted in a RSI value that is shown in Figure A7 (Appendix A). For instance, the candidate area numbers 29, 34 and 33 have the highest reservoir representativeness intensity with RSI values equal to 0.99, 0.92 and 0.82, respectively.
To calculate the POI of each candidate area, the criteria of RSI, the distance from the surface facilities to candidate area, the number of interfered wells, the mean distance from these wells to candidate area, number of existing applicable wells for running a pilot, number of adjacent wells, and finally the mean distance from these wells to candidate area have been considered with the weight factors of 0.33, 0.145, 0.145, 0.055, 0.055, 0.09 and 0.18, respectively, according to the AHP weighting method. The pairwise comparison matrix (Eq. (20)) and weights of the criteria are shown in Table B5 (Appendix B). In this method, every pair of criteria is compared with each other with respect to the expert judgments. In each comparison this question must be answered: which criterion is more important and how much? Table 1 is applied to express the rating preferences between each pair of criteria. For instance, experience and judgment strongly favor the RSI over the number of adjacent wells with importance intensity value equal to 5 as shown in Table B5 (Appendix B). Furthermore, the consistency ratio is computed to verify the reliability between the pairwise judgements using equations (21) and (22). The values of the largest eigenvalue (λ_{max}), the random consistency index (see Tab. 2), the Consistency Ratio (CR), and the Consistency Index (CI) are 7.06, 1.35, 0.008, and 0.011, respectively. The Consistency Ratio of 0.011 shows that the pairwise comparisons are consistent.
The POI is computed for each region utilizing the Reference ideal method via equations (23)– (27) and the aforementioned criteria values, as shown in Figure A7 (Appendix A). For example, the values of criteria that contributed to the POI for candidate area numbers 8, 45, 34 and 29 are shown in Table 3.
Weight factors of criteria and their values for candidate areas numbers 8, 45, 34 and 29 to calculate POI values.
The values of calculated POI in these regions are equal to 0.73, 0.66, 0.64 and 0.59, accordingly. Therefore, Pilot execution in these regions is relatively desirable. In Table 3, the average distance between interfered wells and a candidate area is indicated by the expression of NA when the number of interfered wells is equal to zero.
According to Table 3 and Figure A7 (Appendix A), for example, although candidate area number 29 has the highest amount of RSI value and its distance between a candidate area and surface facilities is relatively low in comparison with the other candidates (i.e. 8 and 45), candidate area number 8 has the highest POI value and it is suggested for running the pilot because candidate area number 29 does not have any existing applicable wells for the implementation of a pilot. Furthermore, candidate area number 29 has two interfered wells that considerably reduce the accuracy of interpretation of gathered data after running a pilot in it.
Figure A8 (Appendix A) shows contour maps of the RSI and POI values on top of the reservoir. There is a spatial relationship between the covariance matrix for the residual oil volume as well as recovery factor and location in the field as shown in the figure. Therefore, there is a spatial relationship between the RSI and the location in the field because RSI is based on covariance matrix for the residual oil volume as well as recovery factor.
Figure A9a (Appendix A) shows assigned numbers of candidate regions. In Figure A9b (Appendix A), Ranking of each region derived from RSI is shown. Moreover, the pilot area number 29 has the highest reservoir representative intensity value. In addition, in Figure A9c (Appendix A), each candidate region is ranked according to its POI value. Therefore, the candidate area number 8 has the first priority to be selected as the pilot area. It should be noted that the POI value is the most influential parameter for selecting the best candidate area, because it already covers RSI and other factors.
A sensitivity analysis is performed to evaluate how changes in the weights assigned to the criteria would change the ranking of the pilot candidate areas. Table B6 (Appendix B) shows sensitivity analysis by preference of certain criteria through 7 scenarios. In Table B6 (Appendix B), the A–G is a criteria name as shown in Table B5 (Appendix B). The S1 is a base case scenario in which the weight of the criteria is determined using AHP method, as shown in Table B5 (Appendix B). In scenarios S2 to S8, the weight of criteria A to G is increased by 0.2 compared to the base case scenario, respectively. For instance, the weight of the RSI has increased by 0.2 while the ratio of the weight is the same for other criteria [39] in scenario S2 as shown in Table B6 (Appendix B).
By making a comparison between the rankings of each candidate area number (e.g. 8, 45, 26 and 34) in different scenarios, it is demonstrated that candidate area number 8 is ranked first and second in six out of eight scenarios as shown in Table 4. Candidate area number 8 could be selected as pilot area based on the rankings obtained by using different weight of criteria. Therefore, candidate area number 8 is clearly dominant compared to other candidate areas.
Ranking of candidate areas numbers 8, 45, 34 and 29 for different weight criteria scenarios.
4 Conclusion
Based on this research, six important conclusions have been found:
An innovative quantitative and systematic approach consisting of reservoirgeology and operationaleconomic criteria has been presented.
A cluster analysis as an unsupervised machine learning method could be utilized to quantify reservoir representative intensity values.
MCDMs as decisionmaking methods have been utilized to integrate the factors, i.e., the distance from the surface facilities to candidate area, the number of interfered wells, the mean distance from these wells to candidate area, number of existing applicable wells for running a pilot, number of adjacent wells, and the mean distance from these wells to candidate area, in addition to RSI criteria with each other, and thereby, a new index called the POI has been presented.
A model of reservoir is subdivided into 66 pilot candidate regions in accordance with optimized pilot size. Then, corresponding covariance matrix and recovery factor arrays of the areas are optimally clustered in 5 and 2 clusters based on simulated annual 3D reservoir quality maps. A hybrid integration of the AHPShannon Entropy and RIM resulted in the highest value of POI in area number 8. Candidate area number 8 has been selected as pilot area in this case study since it has the POI value of 0.73. There is no interfering well affecting the reliability of pilot results in the area. Furthermore, it has one existing applicable well for running a pilot that reduces execution of pilot cost.
Finally, the proposed procedure has the capability to be extended to all enhanced oil recovery processes. Several EOR processes such as chemical flooding and miscible gas can be modeled to calculate remaining oil volume covariance matrix and RF array. Also, other screening criteria could be identified to determine POI according to the combination of the Shannon Entropy AHP and MCDM methods.
The validity of the results taken from the presented methodology has been assessed by the following stages: (1) numerical reservoir modeling, (2) fuzzy clustering and (3) criteria weighting. In order to validate the simulated annual remaining oil volume, a historymatched reservoir model is utilized in this study. In addition, the validity indices of fuzzysilhouette and XieBeni are utilized to validate the best cluster numbers. Furthermore, the criteria weighting results are validated by the Consistency Ratio (CR) in the AHP method.
Moreover, the proposed procedure could be developed using other criteria and well patterns.
Appendix A
Fig. A1 Permeability ranges of the most represented patterns of one of the reservoirs and the associated pie chart showing their areal distribution on the field [14]. 
Fig. A4 Segmentation of the reservoir with allocated area numbers. 
Fig. A5 Remaining oil volume maps at (a) beginning, (b) 10, (c) 20, and d) 30 years after waterflooding for the 15th layer of the reservoir. 
Fig. A6 The distribution map of the clusters in the field based on: (a) RF arrays. (b) COV arrays. 
Fig. A7 Values of RSI (blue) and POI (red) in each candidate area. 
Fig. A8 Contour maps of the: (a) RSI and (b) POI values. 
Fig. A9 (a) Number of candidate regions. (b) Ranking of each region derived from RSI. (c) Ranking of each region derived from POI. 
Appendix B
Data for determining pilot size.
Clustering outcomes for recovery factor arrays.
Clustering outcomes for covariance arrays.
Distance between candidate areas and center of dominant cluster for Recovery Factor (RFcenter) and covariance between cells (covariancecenter) arrays.
The pairwise comparison matrix and weights of the criteria using AHP.
Scenarios with different criteria weights.
References
 Hu W. R. (2008) Necessity and feasibility of PetroChina mature field redevelopment, Pet. Explor. Dev. 35, 1, 1–5. [Google Scholar]
 AlQenae A., Chetri H., Kumar P.R., Orjuela J. (2018) Tracking the performance of strategically significant EOR Pilot: Zooming into interwell connectivity, in: Abu Dhabi International Petroleum Exhibition & Conference, OnePetro [Google Scholar]
 Chen P., Selveindran A., Kumar C., Saloma Y., Bose S., Balasubramanian S., Thakur G. (2019) CO_{2}EOR and carbon storage in Indian oilfields: from laboratory study to pilot design, in: SPE Western Regional Meeting, OnePetro. [Google Scholar]
 Teletzke G.F., Wattenbarger R.C., Wilkinson J.R. (2010) Enhanced oil recovery pilot testing best practices, SPE Reserv. Evaluation Eng. 13, 01, 143–154. [Google Scholar]
 Liu Z.X., Liang Y., Wang Q., Guo Y.J., Gao M., Wang Z.B., Liu W.L. (2020) Status and progress of worldwide EOR field applications, J. Pet. Sci. Eng. 193, 107449. [Google Scholar]
 Babadagli T. (2020) Philosophy of EOR, J. Pet. Sci. Eng. 188, 106930. [Google Scholar]
 Sandoval J.R., Pérez H., Maya G., Castro R., Muñoz E., Colmenares K., León J., Sánchez F., Villadiego D., Manrique E., Romero J., Izadi M. (2010) Dina Cretáceos Field chemical EOR: from screening to pilot design, in: SPE Latin American and Caribbean Petroleum Engineering Conference, OnePetro. [Google Scholar]
 Chen X., Feng Q., Wu X., Zhao G. (2016) A pilot numerical simulation case study for chemical EOR feasibility evaluation, J. Pet. Explor. Prod. Technol. 6, 2, 297–307. [Google Scholar]
 Alfarge D., Wei M., Bai B., Alsaba M. (2018) Lessons learned from IOR pilots in Bakken formation by using numerical simulation, J. Pet. Sci. Eng. 171, 1–15. [Google Scholar]
 Taqi F., Ahmad K., Garcia J.G., Zhang I., Zijlstra E., Ayyad H., Sullivan M. (2019) Interference pressure transient test for permeability anisotropy evaluation in shallow unconsolidated reservoir undergoing EOR polymer flood pilot, in: SPE Kuwait Oil & Gas Show and Conference, OnePetro. [Google Scholar]
 Chai C.F., Adamson G., Lo S.W., Agarwal B., Ritom S., Du K., Azizan N. (2011) St. Joseph Chemical EOR Pilot – a key derisking step prior to offshore ASP full field implementation, in: SPE Enhanced Oil Recovery Conference, OnePetro. [Google Scholar]
 Ozen O., Wahlheim T.A., Attia T., Barrios L., Bin Majid M.N., Wilkinson J. (2014) Dukhan field CO_{2} injection EOR pilot: Reservoir modeling and planning, in: International Petroleum Technology Conference, OnePetro. [Google Scholar]
 Ali H.A., Musa T.A., Doroudi A. (2015) Chemical enhanced oil recovery pilot design for Heglig Main FieldSudan, in: SPE Saudi Arabia Section Annual Technical Symposium and Exhibition, OnePetro. [Google Scholar]
 Saniez J., VandeBeuque S., Ekpenyong D.E., Bastos N., Wantong P., Salley B., AlYafei A. (2012) State of the art of geoscience and reservoir integrated study for EOR CO_{2} Pilot Implementation: example of a Giant Carbonate Reservoir of Arabian Gulf UAE, in: Abu Dhabi International Petroleum Conference and Exhibition, OnePetro. [Google Scholar]
 AlDhuwaihi A.S., Abdullah M.B., Tiwari S., AlMurayri M.T., AlMayyan H., Shahin G.T., Shukla S. (2017) Fitforpurpose chemical EOR ASP modeling strategy to guide pilot development decisions for a giant reservoir in North Kuwait, in: SPE/IATMI Asia Pacific Oil & Gas Conference and Exhibition, OnePetro. [Google Scholar]
 Sharma S., Kamal D., AlMaraghi E., AlMahrooqi S., Winkler M. (2016) Miscible gas EOR pilot design decisions driven by linking EOR performance parameters to uncertainties – a Kuwait Field Example, in: SPE EOR Conference at Oil and Gas West Asia, OnePetro. [Google Scholar]
 Prasad D., Pandey A., Kumar M.S., Koduru N. (2014) Pilot to fullfield polymer application in one of the largest onshore field in India, in: SPE Improved Oil Recovery Symposium, OnePetro. [Google Scholar]
 Da Cruz P.S., Horne R.N., Deutsch C.V. (2004) The quality map: a tool for reservoir uncertainty quantification and decision making. SPE Reserv. Evaluation Eng. 7, 01, 6–14. [Google Scholar]
 Martini R.F., Schiozer D.J., Nakajima L. (2005) Use of quality maps in reservoir management, J. Braz. Soc. Mech. Sci. Eng. 27, 4, 463–468. [Google Scholar]
 da Cruz Schaefer B., Sampaio M.A. (2020) Efficient workflow for optimizing intelligent well completion using production parameters in realtime, Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles 75, 69. [Google Scholar]
 James G., Witten D., Hastie T., Tibshirani R. (2017) An introduction to statistical learning with applications in R, Springer. [Google Scholar]
 Gorunescu F. (2011) Data mining: concepts, models and techniques, Vol. 12, Springer Science & Business Media. [Google Scholar]
 Mohaghegh S.D. (2020) Subsurface analytics: contribution of artificial intelligence and machine learning to reservoir engineering, reservoir modeling, and reservoir management, Pet. Explor. Dev. 47, 2, 225–228. [Google Scholar]
 Hu J.I.A., Lihui D.E.N.G. (2018) Water flooding flowing area identification for oil reservoirs based on the method of streamline clustering artificial intelligence, Pet. Explor. Dev. 45, 2, 328–335. [Google Scholar]
 Vaseghi F., Ahmadi M., Sharifi M., Vanhoucke M. (2021) Generalized MultiScale Stochastic Reservoir Opportunity Index for enhanced well placement optimization under uncertainty in green and brownfields, Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles 76, 41. [Google Scholar]
 Borgelt C. (2013) Objective functions for fuzzy clustering, Comput. Intell. Intell. Data Anal. 3–16. [Google Scholar]
 Xie X.L., Beni G. (1991) A validity measure for fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell. 13, 8, 841–847. [Google Scholar]
 Subbalakshmi C., Krishna G.R., Rao S.K.M., Rao P.V. (2015) A method to find optimum number of clusters based on fuzzy silhouette on dynamic data set, Proc. Comp. Sci. 46, 346–353. [Google Scholar]
 Shannon C.E. (1948) A mathematical theory of communication, Bell Syst. Tech. J. 27, 3, 379–423. [Google Scholar]
 Abellan A., Noetinger B. (2010) Optimizing subsurface field data acquisition using information theory, Math. Geosci. 42, 6, 603–630. [Google Scholar]
 Zhang Y., Li P., Wang Y., Ma P., Su X. (2013) Multiattribute decision making based on entropy under intervalvalued intuitionistic fuzzy environment, Math Prob. Eng. 2013, 526871. [Google Scholar]
 Chaudhry A. (2004) Oil well testing handbook, Elsevier. [Google Scholar]
 Pyrcz M.J., Deutsch C.V. (2014) Geostatistical reservoir modeling, Oxford University Press. [Google Scholar]
 Saaty T.L., Peniwati K. (2013) Group decision making: drawing out and reconciling differences, RWS Publications. [Google Scholar]
 Samad A.M., Hifni N.A., Ghazali R., Hashim K.A., Disa N.M., Mahmud S. (2012) A study on school location suitability using AHP in GIS approach, in: 2012 IEEE 8th International Colloquium on Signal Processing and its Applications, IEEE, pp. 393–399. [Google Scholar]
 Li Q., Zhang J., Deng B., Chang J., Li H., Liu S., Xu X. (2011) Grey decisionmaking theory in the optimization of strata series recombination programs of high watercut oilfields, Pet. Explor. Dev. 38, 4, 463–469. [Google Scholar]
 Cables E., Lamata M.T., Verdegay J.L. (2016) RIMreference ideal method in multicriteria decision making, Inform. Sci. 337, 1–10. [Google Scholar]
 Maschio C., Schiozer D.J. (2019) Integration of geostatistical realizations in data assimilation and reduction of uncertainty process using genetic algorithm combined with multistart simulated annealing, Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles 74, 73. [Google Scholar]
 Alinezhad A., Amini A. (2011) Sensitivity analysis of TOPSIS technique: the results of change in the weight of one attribute on the final ranking of alternatives, J. Optim. Indus. Eng. 4, 7, 23–28. [Google Scholar]
All Tables
Weight factors of criteria and their values for candidate areas numbers 8, 45, 34 and 29 to calculate POI values.
Ranking of candidate areas numbers 8, 45, 34 and 29 for different weight criteria scenarios.
Distance between candidate areas and center of dominant cluster for Recovery Factor (RFcenter) and covariance between cells (covariancecenter) arrays.
All Figures
Fig. 1 Workflow of the planned scheme. 

In the text 
Fig. 2 The variogram model of porosity values. 

In the text 
Fig. 3 Determining pilot size by sensitivity analysis. 

In the text 
Fig. 4 Fivespot patterns in the case study reservoir. 

In the text 
Fig. 5 Determining the optimal number of clusters with cluster validity indexes. 

In the text 
Fig. 6 Clustering result using the colorcoding graph for the RF arrays. 

In the text 
Fig. 7 Clustering result using the colorcoding graph for the COV arrays. 

In the text 
Fig. A1 Permeability ranges of the most represented patterns of one of the reservoirs and the associated pie chart showing their areal distribution on the field [14]. 

In the text 
Fig. A2 Areal division of the reservoir into six areas [15]. 

In the text 
Fig. A3 Qualitative ranking of the selection criteria in each area [16]. 

In the text 
Fig. A4 Segmentation of the reservoir with allocated area numbers. 

In the text 
Fig. A5 Remaining oil volume maps at (a) beginning, (b) 10, (c) 20, and d) 30 years after waterflooding for the 15th layer of the reservoir. 

In the text 
Fig. A6 The distribution map of the clusters in the field based on: (a) RF arrays. (b) COV arrays. 

In the text 
Fig. A7 Values of RSI (blue) and POI (red) in each candidate area. 

In the text 
Fig. A8 Contour maps of the: (a) RSI and (b) POI values. 

In the text 
Fig. A9 (a) Number of candidate regions. (b) Ranking of each region derived from RSI. (c) Ranking of each region derived from POI. 

In the text 