Predictive models for density correction factor of natural gas and comparison with standard methods

Two intelligent-based models which do not require complete gas compositions are presented to estimate natural gas density correction factor using comprehensive datasets (nearly 60 000 instances) originating from the AGA8-DCM (Detail Characterization Method) standard: (1) NGDC-ANN model (Natural Gas Density Calculator based on Artificial Neural Network) and (2) AGA8-GCMD model (Gross Characterization Method Developed by applying genetic algorithm technique). In the suggested models, only five input variables (specific gravity at base condition, operating temperature and pressure and molar composition of CO2 and N2) are employed. The experimental datasets obtained from this work (68 instances) and literature (505 instances) are applied to validate the developed model showing a very good agreement between experimental and estimated data. Simplicity, improving accuracy and satisfactory results of the suggested models over a wide range of operational conditions show that these models would be excellent alternatives for the traditional standard methods, so that, the NGDC-ANN model prediction besides of its simplicity to use show the highest accuracy over a wide of operational range in comparison to similar models.


Introduction
In recent years, significant attention has been paid to developing an easy and accurate measurement methods of natural gas mass flow rate in gas industries especially in European and Asian countries.As an example, Compressed Natural Gas (CNG) is sold at the marketing level by mass that would lead to noticeably economic loss [1].Thus, density metering along with volume flow metering which needs less cost is essential in natural gas marketing.Devices such as multiple ultrasonic transient-time meters and conventional orifice plates are employed to measure the volume flow rate of natural gas [2].Volumetric meters cannot account with changing gas composition, pressure, and temperature.Moreover, direct measurement of natural gas density is difficult as it needs highly expert staffs and costly instruments such as Coriolis density meters, gas chromatographs, etc. [3,4].Also these instruments have their own problems; the potential of erosion, sensitivity to pulsation and vibration close to operating frequency, and requirements of regular calibration [5][6][7][8].
Physical and thermodynamic properties of natural gas mixture could be obtained by employing Equation of State (EOS) with acceptable accuracy and without spending much time.To estimate these properties in gas industries, real time natural gas properties such as temperature, pressure, gas compositions, specific gravity, etc. are needed [9,10].Though, composition measurements are costly and may not be applicable, it is necessary to develop correlation based on real time measurable property.Various EOS or correlations have been suggested for computing natural gas density over a wide range of temperature, pressure and composition; either in terms of accuracy such as GERG-2008 and AGA8-DCM (Detail Characterization Method) or with respect to simplicity in calculation such as NX-19 and AGA8-GCM (Gross Characterization Method) [4].Dranchuk and Abou-Kassem [11] developed a gas density calculator correlation employing 1500 data points.Londono et al. [12] reported simplified correlations for estimating the density of natural gas, while AlQuraishi and Shokir [3] reported a new equation for computing the density of hydrocarbon gases using Alternating Conditional Expectations (ACE) algorithm.Farzaneh-Gord et al. [13] and Farzaneh-Gord and Rahbari [14] developed novel correlations for calculating density of natural gas based on AGA8 EOS.
AGA8-DCM is mostly employed to predict density or compressibility factor of natural gas at high accuracy [15,16].Though, it is complex and needs the natural gas mixture temperature, pressure, and composition.Pressure and temperature are measurable, while as mentioned before, the compositions could be measured by utilizing experimental facilities such as gas chromatography and it depends on each measurement site and compositions of crude oil differ from region to region which resulting erroneous density.
The complexity of AGA8-DCM EOS makes it difficult to apply, especially for mixtures with large number of components.On the other hand, NX-19 and AGA8-GCM EOSs are simpler to implement and need less input information; temperature, pressure, specific gravity at base condition, and only compositions of N 2 and CO 2 of natural gas.By using these EOSs, density and compressibility factor cannot be calculated as high accurate and wide ranges of temperature and pressure as using AGA8-DCM EOS.
Representing of a novel model and modifying other methods for natural gas density calculation implementing intelligent-based techniques is the goal of this work.These models are more efficient in such cases which involve nonlinear mathematical modeling along time consumption, and when there is not any significant relation between input and output of a system [17][18][19].The intelligent models developed in this study are Artificial Neural Network (ANN) and Genetic Algorithms (GAs) Using Matlab (8.1.0.604).
This work tries to present a new Natural Gas Density Calculator based on ANN (NGDC-ANN) in order to keep high accuracy of AGA8-DCM model for natural gas density prediction over a wide range of operational variables.Moreover, the required natural gas data for density estimation is reduced to those applied for less accurate models such as AGA8-GCM and NX-19 containing specific gravity at base condition, operating temperature and pressure and molar composition of CO 2 and N 2 .Because of less required variables, the less accurate models are widely applied in estimation software of natural gas mass flow rate.Therefore, by the help of Genetic Algorithm and introducing a tuning coefficient, this work is trying to increase the prediction accuracy of AGA8-GCM EOS close to accuracy and operational range of AGA8-DCM EOS.
The performance accuracy and effectiveness of proposed models are tested against reported experimental measurements from this work and literature by statistical techniques for error expression like the coefficient of determination (R 2 ), Mean Square Error (MSE) and Mean Absolute Error (MAE).

Theory 2.1 Artificial Neural Network (ANN) model
ANN is an information-processing paradigm that takes numeric inputs, performs computations on those, and outputs one or more numeric values while no need any preassumptions about the relationships between them.ANNs are inspired by biological nervous systems such as brain and consist of elements that receive inputs and generate a single output, where the output is a function of the inputs.ANN has large numbers of computational units connected in a massively parallel structure and do not require a mathematical formulation or physical relationships of the handled problem [20,21].This property is a significant advantage of the flexible ANN model to predict complicated systems.ANN structure consists of several parallel-interconnected units called neurons in one or more hidden layers.A neuron sums inputs multiplied by their respective weight and then applies a transfer function to the sum.The connection of each neuron with other neurons makes the next layer [22,23].The pattern of interconnection between neurons is called the network architecture.Multi-Layer Feed Forward (MLFF) networks are the most common architecture of ANN for static regression applications, consisting of one input layer, one output layer, and one or more intermediate or hidden layer(s) [24][25][26][27][28]. Figure 1 shows the MLFF network involving an input layer with five neurons, an output layer with one neuron, and two hidden layers.An ANN is considered an adaptive system, i.e., each parameter is changed during its operation and is trained to recognize the process under investigation.
An MLFF network can be represented as follows: where y jk is the neuron j's output from layer k, b jk is the bias weight for neuron j in layer k, and w ijk (the modelfitting parameters) are the randomly selected connection weights.F k is the nonlinear activation transfer function, which is one of the main characteristic elements of an ANN with the most common type of sigmoidal transfer functions.According to equation ( 2), input and output values were employed as normalized values in the range of 0-1 to gain higher ANN performance and consistent results [29,30]: where X norm i ; X i ; X max i and X min i are the normalized, actual, maximum, and minimum values of variable X, respectively.In this study, NGDC-ANN model is presented as a replacement for AGA8-DCM EOS.Natural gas density (as an output of this model) is estimated by requiring less input variables.Table 1 presents the input variables range of NGDC-ANN model equaled with normal ranges of input variables of AGA8-DCM EOS.
As shown in Figure 1, the output of the MLFF ANN model, dimensionless density correction factor (C) (an estimated value by AGA8-DCM called C DCM ) can be related to natural gas density as: where q is natural gas density at operating temperature (T) and pressure (P).q b , q air b , and SG b are natural gas density, air density, and specific gravity at base temperature (T b = 60 °F) and pressure (P b = 14.73 psia), respectively.

Genetic Algorithms (GAs)
Genetic Algorithm (GA) is a search heuristic in computer science which was first introduced by J. Holland in order to solve optimization problems [31].Based on Darwin's theory, the species of organisms have evolved over a long period of time through natural selection while all of them share a common ancestor [18,32].GA is a particular class of evolutionary algorithms such as inheritance, mutation, selection, and crossover (also called recombination) [33].GA starts from an initial population of randomly generated individuals and proceeds in an iterative process resembling the genome evolution.In each generation, the fitness of every individual in the population is evaluated, multiple individuals are selected from the current population (based on their fitness), and modified to form a new population using in the next iteration of the algorithm.The process terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population.By utilizing GA technique, a tuning coefficient c is introduced.This coefficient is a function of input variables used for AGA8-GCM EOS.By the help of this tuning coefficient, this modified EOS can be applied over a wider range of input variables of AGA8-GCM EOS with higher accuracy: where, unknown variables V j are obtainable by regression of data resulted from the solution of equations of AGA8-DCM and AGA8-GCM.The unknown variables can be estimated using GA technique and MSE as a fitness function.
3 Material and methods

Materials
The natural gas applied in the experiments was supplied from Gas Company of Isfahan Province's (Isfahan, Iran).

Density correction factor measurement
The 68 experimental datasets containing the density correction factor of natural gas with above composition measured at various levels of five model input variables of T, P, SG b , y CO 2 and y N 2 (X i , i = 1-5).X i s were selected within the ranges presented in Table 1.The experimental datasets (68 instances) along with other experimental datasets from literature (505 instances) [34][35][36][37] were only employed to validate suggested models.
4 Results and discussion  1), ( 6) updating the weights and biases to obtain optimum ANN performance up to the minimum network error like MSE and complete the training process of ANN, (7) validation and test of ANN trained.In the ANN model constructed based on above seven steps, approximately 60 000 datasets resulted from AGA8-DCM EOS were applied according to Figure 2. Seventy percent of the datasets were randomly utilized in the training process and the rest was used in the processes of test and validation.As mentioned previously, the experimental datasets (68 instances form this study and 505 instances from literature [34][35][36][37]) were applied to investigate the performance of NGDC-ANN model, and were not employed in the construction of the suggested models.
According to the universal approximation theory, an ANN model with a single hidden layer and a sufficiently large number of neurons can interpret any input-output neurons [38,39].However the results show that applying one hidden layer for this research data cannot adequately decrease MSE.Therefore as an optimum architecture of NGDC-ANN model, a MLFF network with one input layer, two hidden layers, and one output layer was considered in NGDC-ANN model as shown in Figure 1.
In order to select appropriate transfer function, the tansigmoid (tansig) and log-sigmoid (logsig) transfer functions were applied at two hidden layers.According to the highlighted row of the Table 2, applying "tansig" transfer function at both of hidden layers and a linear transfer function (purelin) at the output layer resulting estimates with having less MSE.Tansig, logsig and purelin transfer functions are defined by the following equations: As mentioned before, the adjustable N Ln is one of the features of ANN architecture.Note that there is no individual method to select appropriate N L n .Even though the neuron numbers can be determined by trial and error; in this study, Kolmogorov theory was employed to determine the neuron numbers for hidden layers, due to faster response and higher accuracy.The theory creates an upper limit for N Ln , so that it generates required N Ln to be less than two times of neuron numbers utilized for input layer of MLFF ANN.In fact, using more neurons in hidden layers leads to over fitting problems and weak performance in network predictions [20,40].Considering five neurons used in input layer of NGDC-ANN model, the network performance was evaluated by applying two to ten neurons for each hidden layer to obtain minimized MSE. Figure 3 compares performances of MLFF NGDC-ANNs by applying different N L n and using constant conditions: tansig transfer function for hidden layers, purelin transfer function for output layer, and trainlm as BPLA.It can be inferred that NGDC-ANN model with eight and five neurons in the first (L 1 ) and the second (L 2 ) hidden layers, respectively, has the least MSE indicating the best performance.
Another feature which plays an important role in enhancement learning rate of MLFF ANNs is the selection of an appropriate BPLA.The criteria to select a proper BPLA for MLFF ANN were to minimize MSE and maximize R 2 .Considering the criteria for eleven different BPLAs, the performance of MLFF ANN were compared  The results indicate that the maximum learning rate and minimum MSE can be achieved by using learning algorithm.Thus, in the present work, the MLFF ANN by applying BPLA function of trainlm was known as an optimized NGDC-ANN model to predict density correction factor of natural gas, according to the highlighted row of the Table 3. Ability of optimized NGDC-ANN model to predict C DCM was investigated through comparisons between the results obtained from AGA8-DCM EOS and corresponding data predicted by the model for three sectors (learning, validation, and test) in Figure 4. Based on the calculated R 2 (0.9995), it concluded that the NGDC-ANN model could accurately describe C DCM .
As shown in Figure 5, for all three sectors of learning, validation, and test, the best compatibility of optimized NGDC-ANN model observes at epoch 479 with MSE value 5.715 Â 10 À5 .

Investigation of NGDC-ANN model performance
To estimate natural gas properties such as compressibility factor, density, C, etc., NGDC-ANN, AGA8-GCM, and NX-19 models need similar input variables (T, P, SG b , y CO 2 , y N 2 ).Therefore, the validity of these models applied in their particular operating conditions was evaluated by comparing with experimental data from this work (68 instances) and Refs.[34][35][36][37] (505 instances) through applying the statistical analyses like R 2 , MSE, and MAE presented in Table 4.Note that the experimental data utilized in Table 4 were not used in construction of NGDC-ANN model.In comparison to other models, the results show that in addition to keeping the required number of input variables (e.g.T, P, SG b , y CO 2 , y N 2 ), the NGDC-ANN model shows the least error (MSE and MAE) in prediction of C within a wider range of operational conditions indicating the priority of this model.The NGDC-ANN model represents R 2 0.996, MSE 102.0, and MAE 374.4 for 891 experimental data.The outcomes indicate that the NGDC-ANN model has satisfactory agreement with experimental data.

AGA8-GCM EOS development
To improve the accuracy and to increase the applicability operational range of AGA8-GCM EOS regarding to natural gas density prediction, a tuning coefficient was determined by GA technique, multiplied by C GCM (eq.( 4)), and then tuned.The coefficient was presented as a power law function of required input variables of AGA8-GCM EOS due to produce the least MSE in this mathematical formulation.The developed AGA8-GCM EOS by applying the tuning coefficient has been called AGA8-GCMD.The unknown coefficients of V j in equation ( 4) were calculated using GA technique considering MSE 7.9 Â 10 À6 (between C DCM and c GCM C GCM ) as the fitness function to be minimized by using values in equation ( 8): GA parameters applied in this technique were characterized as follows: Population type: double vector; population size: 30; creation function: constraint dependent; generations: 1000; scaling function: rank; selection function: stochastic uniform; crossover function: scattered; crossover fraction: 0.8; mutation function: constraint dependent; hybrid function: fminsearch; other GA parameters were considered as Matlab (8.1.0.604) software defaults.
It is obvious that required input variables for AGA8-GCMD model are similar to NGDC-ANN, AGA8-GCM, and NX-19 models.The predicted results of physical property C of natural gas resulted from these four models were compared in Table 4.For this comparison, various natural gases introduced at this work and Refs.[34][35][36][37]      MSE and MAE, AGA8-GCMD model has applying capability within wider operating ranges of T (225-390) and P (up to 70 MPa) with acceptable values of R 2 0.980, MSE 365.85 and MAE 4.575.For example, if AGA8-GCM EOS is utilized, it produces.R 2 0.702, MSE 4175.349 and MAE 10.388 which are not satisfactory values.These results represent that the AGA8-GCMD model can predict more precisely C within wider operational temperature and pressure ranges than AGA8-GCM and NX-19 EOSs.However, the NGDC-ANN model still has the higher accuracy and applicability operational ranges rather than other models represented in Table 4.
Investigation results indicate that the presented intelligent-based models (NGDC-ANN and AGA8-GCMD) have more advantages than EOS introduced previously (AGA8-DCM, AGA8-GCM and NX-19): (1) simplicity of use, (2) covering wider ranges of operating conditions, (3) higher accuracy of natural gas properties estimation especially at high pressure or low temperature in other word higher C.

Conclusion
Prediction of compressible fluid density plays a key role in production, equipment design, processing, storage, marketing, and transportation of natural gas industries.Some methods and devices have been utilized to measure the gas mass flow rate; however, they have both their pros and cons.Based on available artificial intelligence methods and comprehensive datasets (nearly 60 000 instances), the present work studies NGDC-ANN and AGA8-GCMD models to predict natural gas density in order to keep simplicity, high accuracy, less required input variables, and wider range of operational variables in comparison to other methods such as AGA8-DCM, AGA8-GCM, and NX-19.The models are evaluated using comparison of experimental and predicted C data.
Using MLFF NGDC-ANN model, the number of required input variables are reduced from 23 to 5 items and the C prediction accuracy is improved (R 2 , MSE, and MAE of 0.989, 396.25, and 1.999, respectively, for 871 experimental data) over a wide range of operational variables as AGA8-DCM EOS.
By applying a tuning coefficient to C GCM of AGA8-GCMD model, the accuracy of gas density prediction over wider operational ranges of temperature (225-390 K) and pressure (up to 70 MPa) is improved (R 2 , MSE, and MAE of 0.980, 365.85, and 4.575, respectively).However, in comparison to AGA8-GCMD, AGA8-GCM and NX-19 models, the results show that, besides of its simplicity to use, the NGDC-ANN model prediction has the highest accuracy over a wide of operational range.

Fig. 2 .
Fig. 2. Data gathering and grouping to apply in intelligent models.

Fig. 3 .
Fig. 3. Effect of differentN Ln in both of hidden layers (L 1 and L 2 ) on performance of MLFF NGDC-ANN model to predict C DCM .

Fig. 4 .
Fig. 4. Comparison of predicted results by optimized NGDC-ANN model with (a) training set and (b) all data.

Table 1 .
Employed ranges of input variables of NGDC-ANN model.
(5).1Optimization of NGDC-ANN modelThe optimization of MLFF NGDC-ANN model was carried out in seven sectors:(1)determining the architecture of NGDC-ANN model, (2) determining the transfer functions, (3) determining N L n , (4) determining the BPLA,(5)estimating the initial values of weights and biases for

Table 2 .
Performance comparison of 4 NGDC-ANN model by applying different applied transfer functions in two hidden layers using N L 1 = N L 2 = 5 and Levenberg-Marquardt back-propagation (trainlm) as a BPLA.

Table 3 .
Comparison of ANNs performance with 11 different BPLAs to predict C DCM .
at different operational conditions specified to each predictive model were evaluated to estimate C by applying R 2 , MSE and MAE.Observing results indicate that, however AGA8-GCM EOS applying lead to the higher R 2 and lower