An integrated Shannon Entropy and reference ideal method for the selection of enhanced oil recovery pilot areas based on an unsupervised machine learning algorithm

. Pilot-scale enhanced oil recovery in hydrocarbon ﬁ eld development is often implemented to reduce investment risk due to geological uncertainties. Selection of the pilot area is important, since the result will be extended to the full ﬁ eld. The main challenge in choosing a pilot region is the absence of a systematic and quantitative method. In this paper, we present a novel quantitative and systematic method composed of reser-voir-geology and operational-economic criteria where a cluster analysis is utilized as an unsupervised machine learning method. A ﬁ eld of study will be subdivided into pilot candidate areas, and the optimized pilot size is calculated using the economic objective function. Subsequently, the corresponding Covariance (COV) matrix is computed for the simulated 3-D reservoir quality maps in the areas. The areas are optimally clustered to select the dominant cluster. The operational-economic criteria could be applied for decision making as well as the proximity of each area to the center of dominant cluster as a geological-reservoir criterion. Ultimately, the Shannon entropy weighting and the reference ideal method are applied to compute the pilot opportunity index in each area. The proposed method was employed for a pilot study on an oil ﬁ eld in south west Iran.


Introduction
Maximizing the Recovery Factor (RF) and economic profit are the main goals of Enhanced Oil Recovery (EOR) methods. Pilot design plays a vital role in the road map of EOR planning and risk reduction. Therefore, EOR pilots are conducted to reduce uncertainty in EOR performance [1][2][3][4]. In the last 50 years, a significant number of EOR field pilots has been studied and the gained experience has been used in the oil and gas industry [5]. The results of the implementation of EOR pilots may increase consciousness regarding hydrocarbon field behavior and EOR scenarios designed for entire hydrocarbon field.
Pilot area selection is one of the imperative factors in designing an EOR [6]. The objective of the pilot location studies is determining how to narrow down the pilot candidate areas from the field extent to only an optimum area of interest.
Three methodologies have been presented in the literature for pilot area selection. The first category presents some qualitative insights regarding the pilot area, such as the necessity of using predictive reservoir simulation models [4,[7][8][9][10]. These studies do not provide any quantitative methods to select a pilot area. In the second category, the reservoir area is filtered to select the pilot area according to some merit parameters [11][12][13]. These studies are based on reservoir past performance. In the final category, the reservoir is divided into a number of areas and the pilot area is selected on the basis of a semi-quantitative approach. In a previous study [14], a hydrocarbon field was divided into six square kilometers parts and each part labeled as a locator. Consequently, locators were classified based on permeability patterns and the number of water-invaded zones. Permeability patterns were described as the combination of the average permeability of the different layers. The average value was marked per layer as shown in Figure A1 (Appendix A). The locators were chosen as pilot areas from a class that has a maximum number of dominated pattern.
According to other studies [15,16] that have been carried out, the field was divided into six areas as shown in Figure A2 (Appendix A). The size of the areas can be arbitrary. A semi-quantitative approach was chosen to rank each of the six areas for pilot implementation. A list of selection criteria, including the qualitative ranking (low, medium, and high) in each area is shown in Figure A3 (Appendix A). Although this approach tries to determine the part with the best properties, each selected part is not representative of the whole field. Moreover, predefined patterns such as permeability patterns were applied to identify dominated pattern in this category.
In addition to the methodological approaches, criteria based on reservoir-geology and operational-economic issues could be considered. According to the geological criteria of the reservoir, the pilot area should have alike behavior to the reservoir to be an appropriate representative and achieve the EOR goals [4,12,14]. However, an area with moderate to good reservoir properties is another selection criterion. For example, having remaining oil saturation in the range of 0.2-0.5 could lead to an appropriate pilot area [11]. According to other studies [7,8], high remaining oil saturation has been applied to locate pilot areas. In addition, a lower ratio of vertical to horizontal permeability and higher lateral permeability continuity have also been a basis for pilot area selection [15,16].
Operational-economic criteria refer to uncertainty and economic aspects of pilot area selection. An area with a higher accuracy in analysis of pilot production performance is suitable from an operational point of view. Moreover, a lower cost of pilot implementation is desirable too. The level of knowledge, the interaction of pilot and production area, the distance between a candidate area and surface facilities and the number of existing applicable wells for running a pilot are all important operational-economic criteria. For example, a pilot area could be selected based on the quality and quantity of the available petrophysical data [4,14,15]. In addition, less interaction between the pilot and production areas could lead to a more reliable interpretation of the gathered data [4,[11][12][13]. Besides, the closeness of the pilot to existing facilities could lead to economic aspects [12][13][14]. Finally, existing wells may be used to reduce pilot implementation costs [13]. The pilot operation will assist in receiving relevant operational and technical experience and establishing required quality control [17].
It is therefore apparent that having a systematic and quantitative approach is necessary to choose the optimum pilot area. Such an approach should consist of a better representative of reservoir features along with operationaleconomic criteria. In this paper, a methodology is presented to select the optimal size and location of the pilot for an EOR project. To do so, a pilot opportunity index is introduced. The proposed index is defined based on integration of reservoir quality maps (from reservoir simulations), fuzzy clustering, Shannon entropy, Analytical Hierarchy Process (AHP) and the Reference Ideal Method (RIM) as a multicriteria decision-making method. Finally, the presented method is utilized to a real reservoir case and the optimum pilot area is thereby determined.

Methodology
The three key steps for choosing the location and size of a pilot established are: 1. Pilot candidate areas specification; 2. Reservoir representativeness quantification and calculating its value for each pilot candidate area; 3. Pilot opportunity index calculation and selecting the optimum pilot region. Figure 1 illustrates the workflow of the planned scheme.

Pilot candidate areas
In the model of hydrocarbon reservoir, the field is segmented into pilot candidate regions of an equal size. Then, the five-spot pattern are utilized to locate wells in each region. Afterwards, a sensitivity analysis is performed for the Net Present Value (NPV) in regard to the production and injection wells distance, which reflects the size of the pilot. The NPV is the most commonly used economic objective function and is shown in equation (1): where t 1 is the start time of production, T is the overall time of econometry, P o is the unit price of oil, q o is the oil production rate, C prod w is the unit cost of handling the produced water, C prod q is the water production rate, C inj w is the unit cost of injecting water, q inj w is the water injection rate, C o is the unit operating cost per barrel of produced oil and r is the discount rate. R(t 0 ) is the initial capital expenditure, as shown in equation (2), where C facility is the cost of facility installation, n well is the entire number of injection and production wells and C well is the drilling cost of a well. Consequently, the optimal pilot size has the highest NPV. Finally, the size and number of pilot candidates are determined.

Impression of reservoir representativeness
Each candidate region has a reservoir dynamic behavior. Considering all areas, it is possible to classify the behavior. The class which has the highest number of areas is the representative of reservoir behavior. The behavior of each area is characterized by two arrays of cell RF values and the covariance matrix of the residual oil volume in cells positioned in each candidate region.

Quality maps
The various static and dynamic parameters could be combined together to form the quality maps. Due to dynamic parameters such as oil saturation, the quality maps change during the production period [18,19]. The quality map is applied to solve decision making problems as an auxiliary analysis with parameters like cumulative oil production [20].
The 3-D time-dependent quality maps of residual oil volume is computed by using equation (3) to aggregate various parameters: where A i,j , h k , ; i,j,k and So i,j,k,t are the surface area, the height, the porosity and the simulated oil saturation of cell (i, j, k) in the given time period, respectively. Equation (4) is applied to calculate the RF in each cell based on these quality maps: where t s and t e are the beginning and termination times of the simulation, correspondingly. Equation (5) is applied to illustrate the covariance matrix of the residual oil volume for a typical candidate region. For instance, Vo cell#;1 is a set of annual residual oil volume sequence from t s to t e in the first cell of the area:

Clustering
The number of clusters of reservoir behaviors could be specified by Fuzzy clustering. Clustering is a subset of unsupervised machine learning [21] and data mining [22] methods. It is used to assess the flow field in hydrocarbon reservoirs [23][24][25].
The similar objects are placed in a same group using clustering analysis. Equation (6) [26] is utilized to define the objective of fuzzy clustering. The number of clusters and cluster center are signified by c and c i . A multi-dimensional space is denoted by X that includes n candidate areas. For the jth candidate region in the cluster i, the membership degree is indicated by u ij .
Furthermore, v and d ij are identified as the fuzzifier and the Euclidean distance of the jth candidate region from the c i .  8 i; 1 i c ð Þ and 0 u ij 1 À Á and ð7Þ The optimal number of clusters is determined using two cluster validity indices: (1) XB (Xie and Beni) [27] and (2) fuzzy-silhouette [28] as illustrated in equations (9) and (10)- (14), correspondingly. The higher square distance between cluster centers (a j ) is more favorable, as well as lower mean square distance between a candidate area and its cluster center (b j ). Therefore, the lower and higher values of the XB and fuzzysilhouette are the result of high-quality clustering: inter rs j; k ð Þ > 0; 1 r < s cg; inter rs d jk

Reservoir Similarity Index
For a candidate area, the intensity of the reservoir representativeness is indicated by the Reservoir Similarity Index (RSI). Therefore, the RSI is composed of the two normalized distances: 1) d(RF array, RF-center) and 2) d(COV array, COV-center). The RF-center and COV-center are the centers of the clusters that have the maximum size after conducting clustering on the RF array and the covariance matrix of the residual oil volume, correspondingly. The RF array and COV array are the Recovery Factor and Covariance matrix arrays of the residual oil volume of the candidate area. The weights of the two normalized distances are w 1 and w 2 . The RSI is calculated as the sum of the product of weight and the normalized value for multiplicative inverse of the two criteria using equation (15): The Shannon entropy technique is utilized to determine the w 1 and w 2 .

Shannon entropy
The information content of an event as the concept of entropy was introduced by Shannon in information theory [29]. Shannon entropy is used to optimize data acquisition of the hydrocarbon fields [30]. Furthermore, the Shannon entropy can quantity the relative importance of criteria in a multi-criteria problem [31]. The more important criteria have greater dispersion and their effects are higher on the RSI value. Considering m alternatives and n criteria, a decision matrix S of m Â n elements is shown in equation (16), where s ij is the jth criterion value for the ith alternative (candidate area). Equation (17) is applied to normalize the decision matrix S. Then, equation (18)  ; ð16Þ s ij ði ¼ 1; :::; m; j ¼ 1; :::; nÞ; ð17Þ

Pilot opportunity index
The attractiveness of pilot implementation in a candidate area is indicated by the Pilot Opportunity Index (POI). Two significant features for POI computations are reservoir geological and operational-economic. Reservoir geological factors correspond to the reservoir representativeness intensity. Operational-economic factors refer to the reliability of the pilot performance results and the implementation cost of the pilot. Therefore, the POI is based on the RSI, the level of knowledge, the interference between internal and external production of a candidate area, the distance from the surface facilities to candidate area, and the number of existing applicable wells for running a pilot in an area. Hence, the RIM in multi-criteria decision-making methods is utilized to quantify the POI. Moreover, criteria weights are calculated based on the Analytical Hierarchy Process (AHP). The higher the POI is, the more desirable the pilot implementation in an area is. The pilot area is an area with the highest POI value.

Decision criteria
The criteria to evaluate the POI for each candidate area include: RSI: In a candidate area, this criterion shows the value of the reservoir representativeness. Higher values of this criterion are preferable. Interference between internal and external production of a candidate area: Pilot production performance could be affected considerably by nearby existing production wells. Minimizing this interference lead to a more accurate pilot production performance analysis. The drainage radius, the distance from a well where its normal pressure gradient approaches zero (i.e. zero fluid flux), is applied to quantify this issue [32]. In other words, there would be pressure interference in case the distance between two wells is less than two times the drainage radius. Therefore, to determine the interaction between the candidate area and the existing wells, two criteria are established, namely, the number of existing wells around a candidate area with a distance less than two times the drainage area, known as "interfered wells", and the average distance between these wells and a candidate area. A lower number of interfered wells and higher values of the average distance are more favorable. Level of knowledge: A higher quality and quantity of available data in a candidate area lead to a lower uncertainty in pilot design and evaluation. The variogram range, the distance limit beyond which the data are no longer spatially correlated, is applied to quantify this criterion [33]. The properties in cells that contain an existing well are known while the properties of other cells can be estimated by the known cell value. The estimation is possible in the case of the spatial distance between known and unknown cells being smaller than the variogram range. Furthermore, the error of estimation decreases as the distance decreases. Therefore, to determine the level of knowledge in a candidate area, two criteria are established, namely, the total number of existing wells inside and outside an area with a distance less than the variogram range to an area, known as "adjacent wells", and the average distance between these wells and a candidate area. A higher number of total wells and lower values of the average distance are more favorable. Distance between a candidate area and surface facilities: the lower the distance, the lower the cost. The closest route from areas to the surface facilities is the most desirable path. Number of existing applicable wells for running a pilot in an area: This is the number of existing wells in a candidate area that matches the five-spot pattern in terms of location. Higher values of this criterion are more preferable due to less capital expenditure to drill new wells that fit the pattern in an area.

AHP
Analytical Hierarchy Process (AHP) combines mathematics and intuition to express the comparative importance of each criterion in a rational and consistent approach [34,35]. Table 1 is applied to make the pairwise comparisons matrix among criteria.
In the pairwise comparisons matrix, i rows are compared with the j column for n number of criteria such that the main diagonal is equal to one. The results of the pairwise comparison on criteria are summarized in an (n Â n) matrix in which elements w ij (i, j = 1, 2, . . . n) are shown by equation (20): . . . :

Á Á Á w nn
The pairwise comparisons matrix is normalized by dividing the value of each element with the sum of the corresponding column. After normalization, the relative weight of each criteria is calculated by computing the average of each row. The accuracy of the AHP output is related with the consistency of the pairwise comparison judgment. A Consistency Ratio (CR) is calculated by dividing the Consistency Index (CI) for the judgment by the Random Consistency Index (RI) for the corresponding random matrix using equations (21) and (22). The RI is in accordance with the degree of consistency, as shown in Table 2 for a different number of criteria. Moreover, the Cl is based on the largest eigenvalue (k max ) and the number of criteria Very strong importance 9 Extreme importance (n), as shown in equation (21). The weighted sum vector is calculated with a multiplication of the pairwise comparisons matrix of criteria in the vector of relative magnitudes. Then, the consistency vector is computed by dividing the magnitudes of the weighted sum vector of criteria by the vector of relative magnitudes. The largest eigenvalue (k max ) is calculated by averaging elements of this vector: The CR above 0.1% or 10% indicates that the pairwise comparisons should be revised, while smaller numbers mean comparison are more consistent.

RIM
Multi Criteria Decision Making (MCDM) is applied to solve selection issues including multiple criteria and alternatives for petroleum field development problems [36]. The RIM [37] multi-criteria decision-making technique is applied to prioritize the alternatives (candidate areas) with respect to the criteria outlined in Section 2.3.1. Equation (24) is applied to normalize the decision matrix S, as illustrated by equation (23). The range of values and ideal values for each criterion are shown by [A, B] and [C, D], respectively. Equation (25) is utilized to compute the weighted normalized matrix. Equation (26) is applied to calculate the distances between the ideal alternatives and a candidate area. In conclusion, equation (27)   ; ð25Þ y ij À Á 2 s i ¼ 1; :::; m; j ¼ 1; :::; n ð Þ ; ð26Þ 3 Results and discussion on a case study The studied field has been producing since 2006 and 31 wells have been drilled so far. The field's oil in place is around 4 billion standard barrels. The number of grids in the three main directions x, y and z is 88 Â 275 Â 22, respectively. The field has 15 different rock types. Variogram model and its parameters are calculated for the porosity as shown in Figure 2. The variogram range is equal to 1500 m. Furthermore, drainage radius is 320 m according to interpretation of well test in the field.
A history-matched model is applied to predict production scenarios by reservoir numerical simulation. In case of enhancing the history match, it is proposed to use the integration of multi-start simulated annealing with genetic algorithm [38].
Waterflooding is planned in the field to increase production from 20,000 to 110,000 barrels per day. Moreover, the maximum allowable bottom hole pressure in injection wells and the minimum flowing bottom hole pressure in production wells are 5880 and 3290 psia, correspondingly.
The pilot project is required prior to the design and field implementation of EOR. Hence, it is necessary to determine  the size and location of the pilot area. The method established in this research was utilized to choose the best pilot region for this field.
Initially, the reservoir model is subdivided into equal size pilot candidate areas. In each area, production and injection wells are located according to a five-spot well pattern. The sensitivity analysis on the NPV is utilized to determine the optimum pilot dimensions. Data to compute the NPV via equations (1) and (2) are illustrated in Table B1 (Appendix B). The result of varying distance between production and injection wells, which reflects the pilot size on the NPV and cumulative oil production, is illustrated in Figure 3. There is the highest of NPV in case of well distance is 900 m.
The positions of the injection and production wells, along with existing wells in the reservoir, are illustrated in Figure 4 according to this distance. Moreover, reservoir segmentation resulted in 72 candidate regions, as illustrated in Figure A4 (Appendix A). Therefore, the number of grids for a candidate area in the directions x, y and z is 9 Â 9 Â 22, respectively. If an area does not contain all the 22 reservoir layers, it is removed. Therefore, six regions were removed. Consequently, there are 66 valid candidate regions in the case study reservoir.
Running of the production and injection well pattern in the simulator leads to oil saturation computation. With equation (3), the product of the pore volume in oil saturation resulted in the quality maps of the residual oil volume as shown in Figure A5 (Appendix A). By using equations (4) and (5), 1782-member array of the recovery factor and 1588653-member array of the covariance between cells are allocated to an area. By using equations (6)-(8), the candidate areas are clustered. The values of the hyper-parameter (fuzzifier v) and convergence criterion are 1.5 and 1e-9 in fuzzy clustering, respectively. By using equations (9)- (14), fuzzy clustering of the candidate areas are validated as shown in Figure 5. According to Figure 5, if the number of clusters is considered to be two and five, the clustering of the covariance between cells (Sil.F_COV and XB_COV) and the recovery factor (Sil.F_RF and XB_RF) arrays are optimal, accordingly.
According to the preceding graph, clustering of candidate regions leads to two and five clusters for covariance matrixes and recovery factor arrays, respectively. Figure 6 shows membership of clusters based on the recovery factor arrays. Moreover, Figure 7 shows membership of clusters based on the covariance matrixes. Each point on these two charts corresponds to a candida region. Additionally, the * symbol shows centers of the clusters. Also,    Tables B2 and B3 (Appendix B) show results of clustering such as cluster size. The dominant clusters are cluster 5 and 2 for recovery factor and the covariance matrixes, respectively. Figure A6 (Appendix A) shows the distribution map of the clusters in the field with reference to Figures 6 and 7. The candidate areas in the same cluster are close to each other with respect to location. Therefore, there is a spatial relationship between the reservoir behavior, which is characterized by the covariance of the residual oil volume and the RF, and the position in the field.
The covariance-center and RF-center are utilized to show the cluster centers. Averaging all regions in dominant clusters leads to determination of these centers. Afterward, the distance from these two centers to candidate areas is computed as displayed in Table B4 (Appendix B).
Prior to computing the RSI value of each region, the weights of two criteria: (1) the covariance-center distance from an area and (2) the RF-center distance from an area are determined. Using equations (16)- (19) besides the criteria values in Table B4 (Appendix B), the weight of the first criterion is 0.313 while the weight of the second criterion is 0.686. The sum of product of weight and normalized value for multiplicative inverse of the two criteria has resulted in a RSI value that is shown in Figure A7 (Appendix A). For instance, the candidate area numbers 29, 34 and 33 have the highest reservoir representativeness intensity with RSI values equal to 0.99, 0.92 and 0.82, respectively.
To calculate the POI of each candidate area, the criteria of RSI, the distance from the surface facilities to candidate area, the number of interfered wells, the mean distance from these wells to candidate area, number of existing applicable wells for running a pilot, number of adjacent wells, and finally the mean distance from these wells to candidate area have been considered with the weight factors of 0.33, 0.145, 0.145, 0.055, 0.055, 0.09 and 0.18, respectively, according to the AHP weighting method. The pairwise comparison matrix (Eq. (20)) and weights of the criteria are shown in Table B5 (Appendix B). In this method, every pair of criteria is compared with each other with respect to the expert judgments. In each comparison this question must be answered: which criterion is more important and how much? Table 1 is applied to express the rating preferences between each pair of criteria. For instance, experience and judgment strongly favor the RSI over the number of adjacent wells with importance intensity value equal to 5 as shown in Table B5 (Appendix B). Furthermore, the consistency ratio is computed to verify the reliability between the pairwise judgements using equations (21) and (22). The values of the largest eigenvalue (k max ), the random consistency index (see Tab. 2), the Consistency Ratio (CR), and the Consistency Index (CI) are 7.06, 1.35, 0.008, and 0.011, respectively. The Consistency Ratio of 0.011 shows that the pairwise comparisons are consistent.
The POI is computed for each region utilizing the Reference ideal method via equations (23)- (27) and the aforementioned criteria values, as shown in Figure A7 (Appendix A). For example, the values of criteria that contributed to the POI for candidate area numbers 8, 45, 34 and 29 are shown in Table 3.
The values of calculated POI in these regions are equal to 0.73, 0.66, 0.64 and 0.59, accordingly. Therefore, Pilot execution in these regions is relatively desirable. In Table 3, the average distance between interfered wells and a candidate area is indicated by the expression of NA when the number of interfered wells is equal to zero. According to Table 3 and Figure A7 (Appendix A), for example, although candidate area number 29 has the highest amount of RSI value and its distance between a candidate area and surface facilities is relatively low in  comparison with the other candidates (i.e. 8 and 45), candidate area number 8 has the highest POI value and it is suggested for running the pilot because candidate area number 29 does not have any existing applicable wells for the implementation of a pilot. Furthermore, candidate area number 29 has two interfered wells that considerably reduce the accuracy of interpretation of gathered data after running a pilot in it. Figure A8 (Appendix A) shows contour maps of the RSI and POI values on top of the reservoir. There is a spatial relationship between the covariance matrix for the residual oil volume as well as recovery factor and location in the field as shown in the figure. Therefore, there is a spatial relationship between the RSI and the location in the field because RSI is based on covariance matrix for the residual oil volume as well as recovery factor. Figure A9a (Appendix A) shows assigned numbers of candidate regions. In Figure A9b (Appendix A), Ranking of each region derived from RSI is shown. Moreover, the pilot area number 29 has the highest reservoir representative intensity value. In addition, in Figure A9c (Appendix A), each candidate region is ranked according to its POI value. Therefore, the candidate area number 8 has the first priority to be selected as the pilot area. It should be noted that the POI value is the most influential parameter for selecting the best candidate area, because it already covers RSI and other factors.
A sensitivity analysis is performed to evaluate how changes in the weights assigned to the criteria would change the ranking of the pilot candidate areas. Table B6 (Appendix B) shows sensitivity analysis by preference of certain criteria through 7 scenarios. In Table B6 (Appendix B), the A-G is a criteria name as shown in Table B5 (Appendix B). The S-1 is a base case scenario in which the weight of the criteria is determined using AHP method, as shown in Table B5 (Appendix B). In scenarios S-2 to S-8, the weight of criteria A to G is increased by 0.2 compared to the base case scenario, respectively. For instance, the weight of the RSI has increased by 0.2 while the ratio of the weight is the same for other criteria [39] in scenario S-2 as shown in Table B6 (Appendix B).
By making a comparison between the rankings of each candidate area number (e.g. 8, 45, 26 and 34) in different scenarios, it is demonstrated that candidate area number 8 is ranked first and second in six out of eight scenarios as shown in Table 4. Candidate area number 8 could be selected as pilot area based on the rankings obtained by using different weight of criteria. Therefore, candidate area number 8 is clearly dominant compared to other candidate areas.

Conclusion
Based on this research, six important conclusions have been found: 1. An innovative quantitative and systematic approach consisting of reservoir-geology and operationaleconomic criteria has been presented. 2. A cluster analysis as an unsupervised machine learning method could be utilized to quantify reservoir representative intensity values. 3. MCDMs as decision-making methods have been utilized to integrate the factors, i.e., the distance from the surface facilities to candidate area, the number  Moreover, the proposed procedure could be developed using other criteria and well patterns.        Appendix B Table B1. Data for determining pilot size. 1 Price of oil ($/BBL) 55 2