Optimized Random Vector Functional Link network to predict oil production from Tahe oil field in China

. In China, Tahe Triassic oil ﬁ eld block 9 reservoir was discovered in 2002 by drilling wells S95 and S100. The distribution of the reservoir sand body is not clear. Therefore, it is necessary to study and to predict oil production from this oil ﬁ eld. In this study, we propose an improved Random Vector Functional Link (RVFL) network to predict oil production from Tahe oil ﬁ eld in China. The Spherical Search Optimizer (SSO) is applied to optimize the RVFL and to enhance its performance, where SSO works as a local search method that improved the parameters of the RVFL. We used a historical dataset of this oil ﬁ eld from 2002 to 2014 collected by a local partner. Our proposed model, called SSO-RVFL, has been evaluated with extensive comparisons to several optimization methods. The outcomes showed that, SSO-RVFL achieved accurate predictions and the SSO outperformed several optimization methods.


Introduction
The Tahe Triassic oil field block 9 reservoir has serious heterogeneity, wide interlayer distributed, and complex connectivity. Carbonate tanks in China have various accumulating pattern characteristics, complex structures, and different reservoir conditions [1]. The Tahe oil field is characterized by the existence of variable types of fluids, including normal oils, heavy, waxy, and condensate. Similar to other marine oils in the Tarim Basin, the origin and accumulation action for oils in the Tahe oil field are still hotly controversial [2][3][4]. Block 9 Tahe carbonate oil reservoirs fracture-cavitytype belong to the paleokarst ones with bottom water [5,6].
Although the challenges in characterization, heterogeneity conditions, complex interlayers structure and varying [1], because of the channel's frequent changes and serious intersections, vertical and horizontal lithology changes, and limitations of interlayer as well as poor continuity. Various oil fields are ignored out with a remain oil saturation more than 30% [7]. As the production rate of existing fields starts to reduce, it is necessary to operate enhanced oil recovery techniques to produce residual oil and gas [8]. It is becoming hotpot to realize the distribution of remaining oil after water-flooding in order to make development adjustment and improving the oil recovery rate [9].
Tahe Triassic oil field block 9 reservoir was discovered in 2002 by drilling wells S95 and S100. The distribution of the complex interlayers and the distribution of the reservoir sand body are not clear. Therefore, many studies have been done to carry out further research on reservoir characteristics of the reservoir in block 9. The total numbers of wells drilled in the reservoir are 36, and the producing wells are 34 (the flow wells are 3, and the pumping wells are 31), the daily fluid production is 1265.5 t/d, the daily oil production is 287.2 t/d, the average single well production is 8.4 t/ d, the integrated water cut 77.3%, the oil production rate is 0.92%, the total oil production is 193.65 Â 1004 t, the degree of recovery is 22.9%. The annual decline is 10.34%, and the natural decline is 14.24%.
Moreover, it is very important to analyze the production history of the Tahe oil field, in order to do necessary prediction and statistical analysis for its production. Previously, there are some studies that presented to predict oil production using different techniques. For example, in [10], the authors first generated the dataset by using "OmegaLand" program, then used preprocessing operation to normalize (or de-normalize) that data before passing it through the Artificial Neural Network (ANN) model to generate the final model to predict the optimum pressure and temperature. In [11], they used Solver evolutionary algorithm with the proposed methodology as an optimization tool for the dataset of Reshadat oil field to solve many non-linear optimization challenges and to improve flow-rate prediction. Guevara et al. [12] applied three different machine learning methodologies to generate the final models. First, the datasets of the vertical and horizontal wells were applied to many processes like cleaning, filtering, and the other preprocessing to be ready for training; then, Generalized Additive Models (GAMs), and Shape-Constraint Additive Models (SCAMs) were used to get the final models. Furthermore, Kriging was used to generate petrophysical maps by analyzing specific and interesting values. A Shut-In (SI) was used as a preprocessing in the dataset which is collected from Alberta to predict future shale-gas production in [13]. SI works to delete or exclude all zero values. The input Zero values affect badly the prediction and the creation model of Long Short Term Memory (LSTM) [14]. LSTM has the ability to analyze various features together to predict future production rates. In addition, to improve the performance and void overfitting, dropout, fully connected layers, and L2 regularization were used.
Furthermore, in literature, there are various methods and approaches that have been proposed for various oil production optimization and forecasting tasks [15]. Monteiro et al. [16] applied both data analysis and multiphase flow simulation to address uncertainties in oil flow rate forecasting. They evaluated the proposed method with production test data of 13 production wells at a representative Brazilian offshore field. Maschio and Schiozer [17] studied the integration of geostatistical realizations in data assimilation and uncertainty reduction process using a hybrid optimization algorithm by combining the Genetic Algorithm (GA) with multi-start Simulated Annealing (SA). The main idea of this hybrid algorithm is to apply the GA to find the best solution, and then, this solution will be the starting point of the multi-start SA with geostatistical realizations. They evaluated the proposed method with a benchmark of the Namorado Field, Brazil. Also, Schiozer et al. [18] presented 12 steps to help in updating and production optimization under uncertainties using the UNISIM-I-D benchmark of the Namorado Field, Brazil. Therefore, in this paper, we present a prediction model to predict the oil production from Tahe oil field based on an optimized Random Vector Functional Link (RVFL) network [19]. The Spherical Search Optimizer (SSO) is applied to optimize the RVFL network.
Artificial neural networks have been utilized in various applications and research domains, such as classification and regression, and forecasting problems. Random Vector Functional Link (RVFL) network is a randomize type of neural network that was studied in [19][20][21]. The RVFL enhances the training of the neural network by generating the weights of input layer to hidden layer randomly.
Therefore, it has been applied in various applications, including prediction problems. Lian et al. [22] used RVFL to predict intervals for landslide displacement. In [23], the authors applied RVFL network to forecast short-term solar power. Ren et al. [24] employed RVFL network to forecast wind power ramp. Tang et al. [25] used RVFL network to forecast crude oil prices. Yu et al. [26] utilized RVFL network to build a forecasting model to forecast fashion sales. In [27], Qiu et al. proposed a forecasting approach depending on RVFL network for forecasting and prediction of wind power ramp. Furthermore, the RVFL has been employed in various research domains, such as [28][29][30][31][32][33][34][35].
The SSO is a new Metaheuristic (MH) and optimization method proposed by Zhao et al. [36]. Generally, different from other MH approaches that use basic search style, the SSO depends on a spherical search style. It has an integration of search styles to avoid other metaheuristics limitations [36]. Also, it was utilized by Naji Alwerfali et al. [37] for image segmentation, since it was employed for multilevel thresholding. It was hybrid with Sine Cosine Algorithm (SCA), and it showed promising performance compared to several optimization algorithms.
The main contribution points of the current study are as follows: The rest of this study is arranged as follows. The preliminaries of the RVFL and SSO are presented in Section 2. The proposed SSO-RVFL is described in Section 3. Section 4 presents evaluation experiments, comparisons, and results. Finally, we conclude the study in Section 5.

SSO
In this section we present the preliminaries of the Spherical Search Optimizer (SSO). Here, the basic operators of the SSO are defined [36]. There are two solutions X and Y are selected from the population X by the tournament selection method. After that, the spherical search operators are applied to update X, as defined in the following equations: in which i, j, and k are random selected integers representing the dimensions, and p is a set of integers (i.e., p = i, j, k). Where ||.|| 2 represents the l 2 norm (i.e., Euclidean distance). More so, F 2 [0, 1] is a scaling factor, where h 2 [0, p] represents the angle between X and Z-axis.

Random Vector Functional Link network
In this section, the basic mathematical definition of the Random Vector Functional Link (RVFL) network is introduced [19]. In general, the RVFL is a special kind of Single Layer Feedforward Neural Network (SLFNN), however, RVFL contains direct link that connects the input layer with output layer as described in Figure 1. This link provides RVFL by suitable tools to prevent the over-fitting problem that faced the traditional SLFNN. In addition, the main role of functional link is to extend a feature space to be able to extrapolate a desired function. The RVFL receives data (X) with N s samples and this data is represented as pair (x i , y i ) where y i represents the target variable. Then the input data are passed through the hidden nodes (also called enhancement) and the output of each hidden node is computed as: In equation (5), b j refers to the weight between the input layer and hidden nodes, and a j is the bias. While, S is the scale factor that computed during the optimization process. Then the final output of RVFL is calculated as: where W is the output weight and F represents a matrix which consists of the input samples F1 and the output of hidden layer F2: F1 ¼ x 11 ::: x 1n ::: ::: ::: x N 1 ::: x Nn

5: ð8Þ
Followed by updating the weight of the output using equation (9) or equation (10) which represent Moore-Penrose pseudo-inverse and ridge regression, respectively: where I is the identity matrix, C refers to trading-off parameter, and y is the Moore-Penrose pseudo-inverse.

The proposed SSO-RVFL method
In this section, the Oil prediction model based on a modified RVFL using SSO is introduced. The developed SSO-RVFL starts by construct a set of N cf solutions X that represent the configuration from N par parameters of RVFL. This can be formulated as x ij ¼ l j þ rand Â u j À l j À Á ; i ¼ 1; :::; N con ; j ¼ 1; :::; N par ; where u j and l j are the boundaries of the search space for jth parameter, respectively. For clarity, assume the X i has the following elements where N h refers to the number of hidden neurons; and "Bias" represents the parameter which used to set either a bias in the output neurons or not. "AF" indicates the activation function that is used inside the network of RVFL and in this study we applied tribas, sign, hardlim, radbas, sin, and sig. "mode" is the approach applied to enhance the weights Moore-Penrose pseudoinverse and regularized least square. For example, x i = [200, 1, 3, 1] refers to the number of neurons is 200, also, there is bias. Whereas 3, and 1 refers to sig function, and Moore-Penrose pseudo-inverse are applied, respectively. The next step is to divide the dataset into two parts training and testing then applying the training part to the current configuration and computing the quality of its output using the following equation: where b Y is the predict output and Y is the original output. Thereafter, the best configuration is determined and updating the current population X using the operators of SSO as given in Section 2.1. The updating process is performed until the termination criteria are reached.
Then the testing part is applied to the best configuration of RVFL and computing the output of the testing set with computing the performance using different performance measures. The steps of SSO-RVFL are given in Figure 2.

Experiment
This section presents the comparison results between the SSO-RVFL method and other methods.

Study area
Block 9 Tahe oil field is in the territory of Tabei Uplift within the Tarim Basin of north western China [38]. It was discovered in 2002 by drilling wells S95 and S100. It is located at east longitude as 84°13 0 9 00 -84°18 0 52 00 degree, and north latitude as 41°15 0 56 00 41°16 0 4 00 about 60 km southwest of Luntai County of Xinjiang Uygur autonomous region in China as shown in Figures 3-5.

Dataset description
The data are provided by SINOPEC Oil Company, Northwest Branch, China, which provided access to data. The production history for the oil field was known for a period of 12 years (between 2002 and 2014). The data used in this paper are the raw production data of Tahe actual oil field, so daily production data were collected from almost 38 wells. The daily, monthly and yearly production varied from one well to another according to the date of the first day of production in the well. The initial production of the field began in two wells' S95 February 2002 and well S100 in September 2002. By the end of 2014-11-15, all wells were closed.

Performance metric
In this study, a set of performance metric have been used to assess the quality of the prediction techniques to predict the oil production. These metrics are: Coefficient of Determination (R 2 ): where Y represents the average of Y.
Root Mean Square Error (RMSE): in which Y p and Y represent the predicted and original values, respectively.
Mean Absolute Error (MAE): where N s is the sample size of the data.

Results and discussion
This experiment was conducted to show the effectiveness of the SSO-RVFL algorithm in predicting the oil production. We compared the proposed SSO-RVFL with two other modified RVFL models using the Whale Optimization Algorithm (WOA-RVFL) and the Salp Swarm Algorithm (SSA-RVFL). The results are listed in Table 1 and also the prediction results for all algorithms are plot as in Figures 6-11. The results in Table 1 show the average of dif-    Table 1 that, in terms of the training set, the SSO-RVFL algorithm was performed like SSA-RVEL in R 2 , RMSE, and MAE measure whereas, the results of the WOA-RVFL were worst than the other algorithms.
In terms of the testing set, the SSO-RVFL algorithm outperformed other algorithms in all measures, it obtained correlation (R 2 ) equals to 0.75 whereas, the SSA-RVEL and WOA-RVFL obtained (R 2 ) equal to 0.63 for each one. Also, in the RMSE and MAE the lowest values were obtained by the proposed SSO-RVFL algorithm, whereas, the other algorithms showed the same results and were ranked second. In terms of CPU time(s) required for finding the optimal configuration using each algorithm, it has been noticed that the SSO has taken a shorter time than the other two algorithms, WOA and SSA.
These results conclude that the proposed SSO-RVFL algorithm is considered as an effective algorithm than the compared methods.
Furthermore, Figures 6-8 depict the prediction results of all algorithms using the training set. From these figures we can see that, all algorithms showed similar performance in the curves and in the error curve. This observation was noticed in both SSA-RVEL and WOA-RVFL in the training set as in Figures 10 and 11. But there was small difference in these curves compared to the Figure 9 as a output of the training set for the proposed algorithm.
In general, the SSO effectively improves the performance of the RVEL in forecasting oil production because the    different search operators of the SSO algorithm can effectively explore the search space to produce an appropriate parameters to train the RVFL. In future, this algorithm will be improved and evaluated in more fields.

Conclusion
This paper proposes an efficient prediction model using an improved version of the Random Vector Functional Link (RVFL) network to estimate and to predict the oil production from Tahe oil field in China. A new optimization approach, the Spherical Search Optimizer (SSO) was applied to optimize the RVFL and to enhance its parameters. A historical dataset of the production of the Tahe oil field collected from 2002 to 2014 and provided by a local partner, was used to train and to test the model. More so, we implemented extensive experiments and comparisons between the proposed model, and other optimization methods, including, WOA and SSA. The evaluation showed that SSO outperformed other optimizer in enhancing RVFL for analyzing and predicting the oil production from this oil field. Depending on the promising performance of the SSO-RVFL, it may be used various optimization tasks, such as, data clustering, image classification, and other forecasting tasks.