Efficient estimation of hydrolyzed polyacrylamide (HPAM) solution viscosity for enhanced oil recovery process by polymer flooding

. Polymers applications have been progressively increased in sciences and engineering including chemistry, pharmacology science, and chemical and petroleum engineering due to their attractive properties. Amongst the all types of polymers, partially Hydrolyzed Polyacrylamide (HPAM) is one of the widely used polymers especially in chemistry, and chemical and petroleum engineering. Capability of solution viscosity incrementofHPAMisthekeyparameterinitssuccessfulapplications;thus,theviscosityofHPAMsolutionmustbedeterminedinanystudy.ExperimentalmeasurementofHPAMsolutionviscosityistime-consumingandcanbeexpensiveforelevatedconditionsoftemperaturesandpressures,whichisnotdesirableforengineeringcomputations.Inthiscommunication,MultilayerPerceptronneuralnetwork(MLP),LeastSquaresSupport Vector Machine approach optimized with Coupled Simulated Annealing (CSA-LSSVM), Radial Basis Function neural network optimized with Genetic Algorithm (GA-RBF), Adaptive Neuro Fuzzy Inference System coupled with Conjugate Hybrid Particle Swarm Optimization (CHPSO-ANFIS) approach, and Committee Machine Intelligent System (CMIS) were used to model the viscosity of HPAM solutions. Then, the accuracy andreliability ofthedevelopedmodels inthisstudywere investigatedthroughgraphicalandstatisticalanalyses,trendprediction capability,outlierdetection,andsensitivityanalysis.Asaresult,ithas beenfoundthattheMLPandCMISmodels give the most reliable results with determination coef ﬁ cients ( R 2 ) more than 0.98 and Average Absolute Relative Deviations (AARD) less than 4.0%. Finally, the suggested models in this study can be applied for ef ﬁ cient estimation of aqueous solutions of HPAM polymer in simulation of polymer ﬂ ooding into oil reservoirs.


Introduction
Polymer flooding is one of the popular technologies for Enhanced Oil Recovery (EOR) that is utilized to monitor the mobility ratio of displacing phase to the displaced fluid, which is a function of viscosity and relative permeability [1]. In details, addition of polymer modifies the properties of injected water via increasing viscosity and decreasing relative permeability of water even at low concentrations of polymer leading to a decrease in mobility ratio [2]. It means that for a longer period of time polymer solution could effectively sweep the oil in a piston-like manner through the porous media before occurring viscous fingering or early breakthrough. As a consequence, more stable displacement and higher sweep efficiency could be achieved during this process [2,3].
One of the extensively employed polymers in petroleum engineering especially in EOR process partially hydrolyzed polyacrylamides (HPAMs) due to their moderately good solubility in water and reasonable economic [4]. As a definition, HPAMs are synthesized from acrylamide monomers; thus, they will have negatively charged linear chain macromolecules, in which some of their monomers are hydrolyzed by an electrolyte [5]. The large viscosities of HPAM solutions are caused by repulsion forces amongst the negatively charged polymer chains, which are dependent to chain extension and degree of acrylamide hydrolysis [6].
The salt concentration and hardness have paramount impacts on the rheological properties of polymers. In this regards, when a monovalent cationic solution like Na + Cl À is in contact with polymer, electrostatic attraction forces are created between the aforementioned polymer chains of negative charges and Na + which lead to the so-called phenomenon of polymer coiling up [7,8]. As a result, the polymers might precipitate and thus the solution viscosity could be decreased. In the presence of divalent cations like Ca 2+ , Ba 2+ , and Sr 2+ , the processes of degradation and precipitation of the polymer will severely happen, which lead to crucial decreases in the viscosity of HPAM solution compared with the impact of monovalent cations [8,9].
In addition to high salinity environment, polymer degradation is the other major concern taking place at harsh reservoir conditions. Based on the origin of degradation, it can be comprised of mechanical, chemical, and biological processes [10,11]. Mechanical degradation occurs when polymer solution is injected at high flowrates or it passes through the porous media of high permeability; in this manner, the HPAM solution viscosity will decrease [12]. Biological degradation frequently happens when we are dealing with biopolymers such as Xanthan. Moreover, chemical degradation takes place via oxidation reactions and accelerates at high temperatures of porous media [10,11]. It is worth mentioning that the impact of temperature is not restricted to the chemical degradation, it also affects its hydrolysis degree causing precipitation of polymers. The other issues associated with the application of polymers in porous media are the hydrodynamic retention [13,14] and macromolecular adsorption [1]. From operational point of view, injection of highly viscous fluid through the wellbore into the reservoir leads to an extreme pressure drop. Therefore, polymers as non-Newtonian fluids exhibiting shear thinning behavior are employed to preclude this phenomenon.
A large number of investigations have been conducted to study the impact of polymer concentrations, shear rate, temperature, salt concentration, HPAM hydrolysis degree, and hardness on the solution viscosity as the main rheological property of HPAM solutions [5,7,[15][16][17][18][19][20][21][22][23]. Some researchers have undertaken a number of experimental investigations to examine the performance of Hydrophobically Associating Polyacrylamide (HAPAM) [24] and HPAM/Cr(III) [25] to improve the performance of HPAM components in EOR. Apart from the experimental measurements, empirical and semi-empirical methodologies, different approaches based on mechanical models [26], continuum theories [27] and molecular theories [28,29] have been utilized to calculate and also to apprehend some of the detected laboratory phenomena.
The experimental measurements of HPAM solution viscosity are valuable; however, they are expensive and time-consuming to be conducted. Additionally, conducting experiments covering wide ranges of data are not normally feasible in practice. Furthermore, numerical models normally solve Partial Differential Equations (PDEs) and use different iterative approaches for converging to the target solution [30,31]. Physical and mathematical natures of these models require using some assumptions for the purpose of simplification. These assumptions introduce some errors in the results. In addition, numerical models are not always user-friendly and need advanced knowledge of physics of the model and advanced mathematics. The developed empirically derived equations in literature were mostly established on the basis of limited range of data at a specific condition [30,31]. The other shortcoming of the existing correlations is that they do not take into account the effects of the all input variables on the model output; thereby, their application to other conditions may lead to noticeable uncertainties in output estimation. Consequently, it is crucially important to develop robust modeling techniques on the basis of the most comprehensive database existing in literature.
Soft computation technique is a robust computer technology which could handle and optimize the highly non-linear engineering problems. The Artificial Neural Network (ANN)-based algorithms are the early versions of soft computational methods, which have been proven to produce precise estimations; however, the main deficiency of ANN-based approaches is non-reproducibility of results, which is comparatively caused by the stopping criteria variation and networks random initialization [32]. To solve this issue, improved versions of soft computations namely, Support Vector Machine (SVM)-based schemes as a supervised machine learning method, have been presented. The chief advantages of SVM-based schemes over classical algorithms are fewer adjustable parameters, less probable over-fitting problems, no earlier requirement for determination of the network topology, satisfactory generalization performance, and no needs for selection of the hidden nodes quantity [32]. In last years, several investigators have utilized the SVM-based algorithms in a wide range of petroleum and chemical engineering as a robust tool successfully [30,[33][34][35][36][37]. In addition to the above-mentioned SVM framework modeling, Adaptive Neuro Fuzzy Inference System (ANFIS) is another strong approach for precise estimation of different industrial and engineering goals. In this methodology, by combination of both fuzzy logic and neural network based systems the disadvantages of both systems will be overcome. The successful usages of ANFIS modeling have been witnessed in the open literature through several researches [38][39][40][41].
In this work, the innovative advantages of MLP neural network, GA-RBF neural network, CSA-LSSVM, CHPSO-ANFIS and CMIS modeling approaches have been used to model the viscosity of HPAM solution over a wide range of operational conditions. Based on the authors' knowledge, there is no report on modeling of the HPAM solution viscosities by the aforementioned approaches in the open literature. These methodologies provide an accurate and reliable estimation for achieving the target parameter. The available empirical correlations in the literature are not as accurate as the proposed smart techniques in this study. Moreover, the empirical correlations have a lot of tuning parameters to be determined; although, the above-mentioned smart tools applied in this study use the least numbers of tuning parameters which make them superior to the existing models. The higher the numbers of the model parameters cause more overfitting of the model to the experimental data. Graphical and statistical analyses have also been done on the results to check the validity of the proposed models. In addition, trend estimation capability of the proposed models was checked and confirmed. Finally, outlier detection and sensitivity analyses were done to check the validity of the proposed models. Support Vector Machines (SVM) have been used as a robust tool for many function approximations in a wide range of engineering applications [42][43][44]. Generally, ordinary SVM method uses the quadratic programming subjected to the inequality constrains to solve the function approximation problems, in which its convergence is achieved very slowly. This scheme consumes a great deal of time and memory [45], which causes it to be applicable for small problems dealing with a small number of input data points. A new type of SVM, called Least Squares SVM (LSSVM) was developed by Suykens and Vandewalle [46] in which it was tried to reduce the complexity of the process and increase its convergence speed. This was achieved by replacing the inequality constrains in the ordinary SVM by the equality constrains in LSSVM, which can be solved more rapidly by an iterative process [45][46][47]. This causes LSSVM to be more desirable for problems dealing with a large number of experimental data and in cases where the time and memory are limited. As mentioned earlier, the final goal of LSSVM is to find the optimum separating hyper plane. If the hyper plane creating vector is called as w, the LSSVM will try to minimize the below cost function as follows [45]: subjected to the following linear constraints: where e is error variable, m ≥ 0 is regularization constant, g (x) is the mapping function, w and b are weight vectors and bias terms, respectively, and superscript t denotes the transpose of the weight matrix. Coupling the two equations will result in the following equation [45]: in which b k are Lagrangian multipliers. Based on Lagrangian multipliers, initial conditions can be derived as follows: In a linear regression problem, in which dependent and independent variables can be linearly separable, the LSSVM equation will be as follows [45]: For using the last equation for the nonlinear problems, kernel functions must be used. By introducing kernel functions to Eq. (5), the below equation will be resulted [45]: where K (x k . x) is the kernel function that is the product of g (x) and g(x k ) in feasible margin as it is provided as follows [30,45]: Gaussian Radial Basis kernel Function (RBF) has been more attractive to the researchers among the other ones which is defined as follows [42]: where s 2 is squared bandwidth. In this study, this parameter is optimized using an external optimization technique namely, Coupled Simulated Annealing (CSA). More information about CSA is presented in open literature [41].

Neuro Fuzzy Inference System (ANFIS)
For the first time, Jang and Chuen-Tsai [48] proposed a novel version of intelligent-based approaches by combining Fuzzy Inference Systems (FIS) with Artificial Neural Network (ANN). By conducting this methodology, the capability of both ANN and FIS would be enhanced by achieving advantages of both fuzzy systems and neural network simultaneously. After that, Takagi and Sugeno [49] applied ANFIS mathematical strategy by extending to the highly nonlinear and complex problems. For this goal, two different training algorithms of hybrid and back propagation techniques were used. In back propagation scheme, the resulting parameters and their premise values are kept constant via gradient descent method; however, back propagation algorithm modifies the output parameters by the approach of least squared error approximation [50]. A typical ANFIS structure has five hidden layers of 1-5. There are some modifiable and constant nodes in this structure, which perform as rules and Membership Function (MF). Takagi and Sugeno [49] proposed two rules for development of ANFIS model as followings: Rule 1: Rule 2: where, A 1 , A 2 , B 1 and B 2 are linguistic labels, and m 1 , m 2 , n 1 , n 2 , r 1 , and r 2 are the consequent parameters. In addition, five hidden layers including fuzzification, rule, normalization, defuzzification, and output layers are applied in ANFIS strategy [39]. In fuzzification layer, the input data are converted to the linguistic expressions. In rule layer, the truthfulness of the constructed expressions in previous part will be determined. In next layer, the firing strength taken from the antecedent layers undergoes normalization process. A linguistic term for the output parameter called consequent layer is provided in the fourth layer. At final stage, the above-mentioned rules for one output are gathered as individual numerical outputs using an averaging method [38]. It is notable that ANFIS method can be optimized through the Conjugate Hybrid Particle Swarm Optimization (CHPSO). More information about CHPSO algorithm is presented in the open literature [41,51].

Multilayer Perceptron (MLP) neural network
MLP is a special structure of neural systems with three different types of input, hidden and output layers. There are some neurons exist in each layer. The number of neurons in hidden layers can be optimized using trial and error methods or an intelligent approach like least squares method. System errors will back propagated through the network and weights and bias iteratively estimated using epoch iteration system [52]. Undertraining and over training of the network quietly dependent on number of epochs in the network. This method will poorly forecast output result of test data set in the network.

Radial Basis Function (RBF) neural network
This method predicts different functions iteratively and has the same application like MLP method. The internal structure of this network has only three layers which are capable to highly noisy input data [53]. The internal structure of this method is simpler than MLP which uses feed forward structure with supervise training technique. In this framework, a nonlinear transformation is applied on the weighting vector of hidden layer through activation function f (r). This network has three major properties as follows [54]: -It can predict multivariable continuous function with the good accuracy interval if enough units present. -The network uses linear unknown coefficient which somehow improves its optimal prediction capability. -The solution obtained with this method will be the best solution according to minimization of cost function and control oscillation around best solution.
There are some differences between two previous mentioned neural network systems of MLP and RBF which include [55]: -MLP structure is more complicated in comparison with RBF method.
-Training process of RBF is easier than MLP due to its simple structure of three layers with only one hidden layer. -MLP network performs global approximation on input data and the outputs are estimated by neurons, while in RBF networks, the local performance can be obtained on input data with certain units determination of outputs. -Discrimination process of these two artificial systems have shown some differences. In MLP networks, cluster classification is performed using hyper surfaces, while in RBF hyper spheres are used for this purpose.
It is notable that RBF network can be optimized through the Genetic Algorithm (GA) technique. More information about GA strategy is presented in the open literature [41].

Committee Machine Intelligent System (CMIS)
Committee Machine (CM) was introduced by Nilsson [56] for the first time as a way for doing supervised learning tasks. CM has a parallel structure, in which different experts run simultaneously and the results of these experts are combined together to find the better solution rather than the solution of individual experts [57,58]. CMs are categorized in two distinct categories including static and dynamic structures which are completely defined and expressed in literature [57]. In this study, ensemble averaging, which is one of the members of static structures, is used. After the creation of intelligent network of experts, it is essential to find a method for combining the results of individuals to find the optimum solution [59][60][61], which can be the linear averaging using the weighted average [62,63] that was used in this study to increase the contribution of the better expert's result in the final solution. Note that in this study a modified version of weighted average with a bias term for better fitness was used. In this work, the genetic algorithm, which is a very applicable method in optimum solution finding problems was used and its details has been discussed previously. In this study, Mean Square Error values (MSE) between the CMIS results and experimental values were used as the objective function for the genetic algorithm. The MSE can be defined as below: in which n and m are the number of data and the number of experts, w is the weight of expert, y is the value of the HPAM solution viscosity, and w 0 is the bias term.

Data gathering
Accuracy and reliability of a model is completely dependent on the accuracy and comprehensiveness of the data set used for its developing and testing [64][65][66][67][68][69][70][71][72][73][74][75] HPAM solution viscosity = f (degree of hydrolysis, HPAM molecular weight, HPAM concentration, temperature, salt concentration, water hardness, shear rate) (12) The final data set used for developing and testing the models contained 403 data points, which were gathered from the most accurate and confident literature [5,21,76]. In a model development process, the data set is needed to be divided into two sets of data for development and testing of the models. For this reason, the gathered data set was randomly divided into two data sets, one containing 80% of data points namely train data set and the other containing the remaining 20% of data points namely test data set used for training and testing the models, respectively. The train data set was used for building the models and the test data set was used for testing the validity, generality, and overfitting and under-fitting problems detection of models. The proposed model will be valid and reliable if the statistical parameters of the model for train and test data sets are nearly the same, which show that the model is welldesigned. It is very crucial that the range of the applied database in each of the mentioned data set must be nearly the same. The minimum, maximum and average of the parameters in each of the data sets are given in Table 1. As shown in this table, the minimum, maximum, and average of the data in each of the data sets are nearly the same which are consistent with the main data set confirming the appropriate division of the data set.

.1 LSSVM
In this study, the LSSVM modeling approach developed by Pelckmans et al. [77] and Suykens et al. [78] was used for modeling of HPAM solution viscosity. In the CSA-LSSVM model development, the radial basis kernel function, which is the most widely used kernel function [42,79], was used. In addition, MSE parameter between model results and experimental values was used as the objective function. For optimization of parameters of the LSSVM model, CSA was used which resulted the optimum values of 22.97105 and 0.000361 for g and s, respectively.

CHPSO-ANFIS
For development of CHPSO-ANFIS model, after preparing the train and test data, the first step is to create an initial FIS. There are three different methods provided by MATLAB. The first method, namely, genfis1 uses grid partitioning. This method consumes a great deal of time and memory. The second method, namely, genfis2, which was utilized in this study enjoys from subtractive clustering method. The third method which is none of concern of this study, namely, genfis3 generates an FIS using Fuzzy C-Means (FCM) clustering by extracting a set of rules that models the data behavior.
As mentioned earlier, genfis2 benefits from subtractive clustering method to generate a Sugeno-type initial FIS. In this method, using the subclass function, the number of   rules and antecedent membership functions are determined. After that, each rule's consequent equations are determined using least squares estimation. After that, the least squares estimation is utilized to determine each rule's consequent equations. These membership functions include a set of fuzzy rules to cover the feature space. In this method, the most important parameter is the radius of influence. This parameter spans between 0 and 1. Lower values of this parameter result in better performance of the FIS; however, it makes the FIS more complex and training stage consumes more time and memory. In this study, this parameter is assumed to be a value between 0.7 and 1. It is possible to find the optimum value by trial and error; however, it is more sophisticated to use an optimization method to determine the optimum value. In this study, GA was utilized to determine the optimum value of the radius of influence such that the generated FIS has acceptable accuracy and do not make the FIS too complex for training stage. For the data set used in this study, the optimum value for the radius influence was determined to be 0.8485. The convergence of the radius influence is depicted in Figure 1. In this figure, the vertical axis is the value of the MSE and the horizontal axis is the number of generation regarding the GA optimization method.
With this value, 9 rules were generated. The membership functions are depicted in Figure 2. As shown in Figure 2, all the values of the input parameters are normalized between À1 and 1. In this study, we have 7 input parameters; thus, there are correspondingly Now, it is time to tune the initial FIS to reach the best solution, i.e. the minimum MSE between the target and output values. MATLAB provides two methods to train the FIS, namely, back propagation and hybrid method. Another method to train the initial FIS is using optimization methods. In this case, the MF parameters are regarded as the tuning parameters. In this study, a population-based optimization algorithm, namely Particle Swarm Optimization (PSO) algorithm was utilized. PSO considers the MFs parameters as the tuning parameters and the MSE between the target and ANFIS output as the cost function. This method continues searching the best tuning parameters in search space until reaching to the stopping criteria.
In this study, a combination of hybrid method (provided by MATLAB) and innovative PSO method was utilized. The method is called Conjugate Hybrid PSO ANFIS (CHPSO-ANFIS). In this method, there is a determined number of stages. At first stage, the initial FIS undergo hybrid training for 3 successive times. Afterwards, the output FIS undergoes PSO method just for the first time. It should be noted that individual hybrid and PSO training steps are comprised of 10 and 200 epochs, respectively. After training by the PSO optimization method one stage is completed. The resulted tuned FIS is passed to the next stage. This process will be continued until reaching the stopping criteria. The parameters of the ANFIS training functions using the CHPSO method are listed in Table 2. Figure 3 shows the performance of the CHPSO-ANFIS. The vertical and horizontal axes are cost function (i.e., the MSE between the target and output values) and the  73, 22 (2018) number of stages, respectively. The overfitting problem is controlled during the training. The best stage is a stage that there is a balance between the MSE value for the train and test data sets. In this figure, the solid red line is the best MSE of the initial FIS. As it is evident, there have been two great jumps in the MSE value for both train and test data sets at 3rd and 13th stages. However, the best stage was determined to be the 36 th stage. After this stage, although the MSE value for the training data sets continuously reduces, the MSE for the testing data sets little by little increases and this leads to overfitting problem. The final membership functions for each input data are illustrated in Figure 4. As it is obvious, the MFs will tremendously change after training. As it is evident, the MSE value for the CHPSO-ANFIS is much less than the MSE of the initial FIS. In order to have a better comparison, the statistical parameters of the initial FIS and the CHPSO trained FIS are listed in Table 3.

MLP
It has been demonstrated that the efficient estimation of each nonlinear function could be done by applying one hidden layer in MLP structure [80]. Thus, in this study, only one hidden layer was selected so as to reduce the calculation time. Moreover, the suggested MLP model has one neuron in its output layer and seven neurons in its input layer corresponding to one output parameter and seven input variables, respectively. By varying the neurons number from 4 to 25, the MSE value as an Objective Function (OF) was evaluated. In other words, the number of neurons with the lowest MSE value was selected. In this study, there are six neurons in hidden layers in which the best performance of the MLP would be achieved.

GA-RBF
The best performance of the RBF network will be accomplished when the tuning parameters, which control the accuracy of the model, are optimal values. These parameters are known as Maximum Neurons Number (MNN) and spread. For tuning these values, the GA technique was utilized. First, 60 arbitrary solution pairs were generated and categorized based on the value of MSE between the measured data and model estimates. The best values for MNN and spread parameters were obtained after 30 generations. As a result, the MNN and spread values are, respectively, 173 and 1.15612.

CMIS
For CMIS development, the GA method was employed to find the optimum weights of the model according to the following equation: where, l, m, n, r and s were found to be À0.031781, 0.201029, 0.641679, 0.2011969, and 0.004969, respectively.

Models validation
In this section, the accuracy and validity of the proposed models are investigated through statistical and graphical analyses. The graphical agreement between the estimated HPAM solution viscosity values using the proposed models and experimental values is shown in Figure 5. As it is shown in this figure, there exists a good agreement between the experimental viscosities and estimated values with high value of the determination coefficient of 0.9826 for CMIS model, respectively. This figure also shows that the CSA-LSSVM and GA-RBF models estimations are more scattered around the 45°line as compared with other models. In addition, it shows that CMIS model has a good estimation capability for lower HPAM solution viscosity values, which decrease slightly but not unreliably for higher viscosity values. For further investigation of the reliability of the proposed models, the relative error distributions of the models results against corresponding experimental values are shown in Figure 6. As shown in this figure, the data points for CMIS model are highly concentrated around the zero horizontal error line, which confirm the relative superiority of this model in comparison with the other proposed models.
The statistical investigation of the proposed models is shown in Table 4 through numerous statistical parameters including coefficient of correlation (R 2 ), Average Relative Deviation (ARD), Average Absolute Relative Deviation (AARD), Root Mean Square Error (RMSE), minimum Relative Deviation (min RD) and maximum Relative Deviation (max RD) for all train, and test data sets. As known, the two most important parameters for evaluation of any model are AARD and R 2 . When the value of AARD is close to zero and R 2 value is near unity, it can be concluded that the suggested model is well-predictive and efficient. According to Table 4, it is obvious that the CMIS and GA-RBF models are, respectively, the most accurate and the least accurate techniques for prognostication of HPAM solution viscosity in this study. Moreover, MLP statistical quality measures show that it is the second most precise model after CMIS model.

Models trend estimation capability
One of the very important points in reliability and estimation capability of the models is the ability to estimate the true trend of viscosity changes by variations of different parameters. It is a critical point for examining the usefulness of any developed model. In this section, the Determination coefficient: trend predictability of the best developed models (i.e., MLP and CMIS) is investigated. Figure 7 shows the trend estimation capability of CMIS and MLP models for changes with respect to the HPAM concentration, temperature, salt concentration, water hardness, and shear rate. As it is shown in these figures, trends have successfully been captured by both models; however, the proposed CMIS model is a slightly better tool rather than the proposed MLP model. In dealing with polymer solutions, the behavior of the solution with respect to the shear rate is very important. For this reason, a model will be reliable if it can estimate the HPAM solution viscosity variations with respect to shear rate. Figure 7e shows the estimation capability of the proposed MLP and CMIS models for shear rate changes. As it is shown in this figure, both proposed models have estimated the trend, successfully; however, the proposed CMIS model has better estimation performance. To deeper investigate, the Figure 7e was re-drawn with the logarithmic x-axis (shear rate) as it is shown in Figure 8. This figure clearly confirms the better performance of the proposed CMIS model rather than MLP model. For shear rates more than 10, CMIS model exhibits a decreasing and linear trend for estimating HPAM solution viscosity.

Outlier detection
In any modeling study, the accuracy and the reliability of the proposed model is completely dependent on the accuracy of used experimental data, as mentioned earlier [81]. The erroneous data lower the accuracy and applicability of the proposed models; thus, these data must be specified. Fortunately, these data normally behave in a different scheme rather than the bulk of data, which could be distinguished by using outlier detection methods. Normally, the outlier detection is done graphically through William's plot [42,[82][83][84]. This graphical approach needs standardized residual values and hat values. The outliers will be detected using two limits including leverage and residual limits. The hat values for each of the data points are the diagonal values of H matrix, which can be calculated as follows [82][83][84]: in which, X is a matrix containing m rows and n columns. The parameter m is the number of data points and n is the number of input parameters. The residual values are the difference between the estimated values using proposed model and experimental values. The standardized residual values can be calculated using Eq. (15) as follows [82,83]: in which SR i is the standardized residual value of i-th data point, m exp is the experimental HPAM solution viscosity, m pred is the estimated HPAM solution viscosity using the proposed model, MSE is the mean square error, and H i is the hat value of the i-th data point. The warning leverage value is normally denoted by H * and from the leverage point of view, the data that applies in 0 h H, in which h is the hat value of the data point. This limit is indicated by a red dashed line in William's plot shown in Figure 9. The H * can be calculated using the following equation: in which n is the number of input parameters and m is the number of data point. In this study, H * was calculated to be equal to 0.06045. The other limit, which is the residual limit, is normally selected as the radius of 3, which means the standard residual values to be between 3 and À3. This means that from the residual point of view, the data points with À3 SR 3 are reliable for the developed model, which are shown by two green dashed lines in Figure 9. Considering both leverages and residual limits, the data points that are applied to 0 h H and À3 SR 3 are highly accurate and the model is increasingly reliable. These data points are shown by blue star in Figure 9. Figure 9 shows the William's plots for the proposed CSA-LSSVM, CHPSO-ANFIS, MLP, GA-RBF and CMIS models in this study over the entire database applied for modeling. This figure shows that only 2.23% of data (9 data points), 1.24% of data (5 data points), 1.99% of data (8 data points), 0.74% of data (3 data points) and 1.24% of data (5 data points) are suspected outliers for the proposed CSA-LSSVM, CHPSO-ANFIS, MLP, GA-RBF and CMIS models, respectively, which are very small and negligible; therefore, the major portion of the data are concentrated in the valid region, which are bounded by 0 h H * and À3 SR 3. This confirms almost always our modeling techniques in this study lie in valid range based on the prediscussed outlier analysis presented in Figure 9.

Sensitivity analysis
A model can be reliable when the sensitivity of the model to the input parameters is the same as the sensitivity of the experimental values to them. This approach was first used by Chen et al. [85] for sensitivity analysis of models, which shows the degree and the sign of the effect of that parameter on the output values. In this approach, Equation (17) is used to calculate the relevancy factor of each of input parameters on the output estimation: Relevency factor I k :m ð Þ¼ X n i¼1 I k:i À I k ð Þm i À m ð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X n i¼1 I k: In the previous equation, m is the HPAM solution viscosity, I k is the k-th input parameter, I k.i is the i-th value of the k-th parameter, and I k and m are the average values of k-th input parameter and HPAM solution viscosity, respectively. Figure 10 shows the calculated normalized relevancy factors for all the input parameters affecting the experimental target values as well as the proposed CSA-LSSVM, CHPSO-ANFIS, MLP, RBF and CMIS models. This figure shows that the sensitivities of the proposed models are nearly the same as the sensitives of the experimental values, which confirm the reliability of all the proposed models in this study. As it is obvious from this figure, the temperature is the most effective parameter on the HPAM solution viscosity, and it has the negative effect which is normally expected for HPAM solution viscosity.

Conclusion
In this study, Multilayer Perceptron (MLP) neural network, Least Squares Support Vector Machine approach optimized with Coupled Simulated Annealing (CSA-LSSVM), Radial Basis Function neural network optimized with Genetic Algorithm (GA-RBF), Adaptive Neuro Fuzzy Inference System coupled with Conjugate Hybrid Particle Swarm Optimization (CHPSO-ANFIS) approach and Committee Machine Intelligent System (CMIS) were used to model the viscosity of HPAM solution over a wide range of operational conditions. The accuracy of the proposed models was investigated through statistical and graphical analyses, which show the good accuracy and reliability of the proposed models. As a result, it was found that the CMIS model gives the most accurate estimates for HPAM solution viscosity. Trend analysis of the CMIS and MLP methods as the best models confirms the large fitness of the proposed models estimations to the target solution viscosities. Outlier detection was executed through leverage values statistics; accordingly, the applicability of the developed models here with a very small number of outlier data points was confirmed. Based on the results of sensitivity analysis, it was demonstrated that the temperature is the most affecting parameter on HPAM solution viscosity estimation. Finally, it can be stated that the results of this study are of significant usefulness for engineers and researchers working on polymer flooding projects of hydrocarbon reservoirs.