Prediction of engine NOx for virtual sensor using deep neural network and genetic algorithm

. The Nitrogen Oxides (NOx) from engines aggravate natural environment and human health. Institutional regulations have attempted to protect the human body from them, while car manufacturers have tried to make NOx free vehicles. The formation of NOx emissions is highly dependent on the engine operating conditions and being able to predict NOx emissions would signi ﬁ cantly help in enabling their reduction. This study investigates advanced method of predicting vehicle NOx emissions in pursuit of the sensorless engine. Sensors inside the engine are required to measure the operating condition. However, they can be removed or reduced if the sensing object such as the engine NOx emissions can be accurately predicted with a virtual model. This would result in cost reductions and overcome the sensor durability problem. To achieve such a goal, researchers have studied numerical analysis for the relationship between emissions and engine operating conditions. Also, a Deep Neural Network (DNN) is applied recently as a solution. However, the prediction accuracies were often not satisfactory where hyperparameter optimization was either overlooked or conducted manually. Therefore, this study proposes a virtual NOx sensor model based on the hyperparameter optimization. A Genetic Algorithm (GA) was adopted to establish a global optimum with DNN. Epoch size and learning rate are employed as the design variables, and R -squared based user de ﬁ ned function is adopted as the object function of GA. As a result, a more accurate and reliable virtual NOx sensor with the possibility of a sensorless engine could be developed and veri ﬁ ed.


Introduction
Air pollution caused by greenhouse gases and automobile emissions has created a worldwide need for strict regulation [1,2]. An Internal Combustion Engine (ICE) equipped vehicle has difficulty in complying with the regulations and even the eco-friendly car such as a Hybrid Electric Vehicle (HEV) using ICE also couldn't escape from the problems. Among the emission gases, NOx causes environmental pollution such as acid rain, ground ozone, birth of the fine particles, smog, as well as critical diseases in humans [3]. Recently, the California Air Resources Board (CARB) regulations assert the amount of "Non-Methane Organic Gases (NMOG) + NOx" on all vehicles produced in 2019 must be below 0.090 g/mi. Moreover, the allowance amount would be "NMOG + NOx" below 0.050 g/mi after 2025. The European Union also corresponds to this trend and car manufacturers are doing their best to minimize NOx emissions from the internal combustion engine [4][5][6].
Except for improvement by materials such as insulation coating [7], general NOx emission reduction techniques include engine-out and tailpipe NOx reduction techniques. The former techniques are, for example, injection strategy adjustment (pressure and timing) or Exhaust Gas Recirculation (EGR) applied to the combustion chamber of diesel engines [8,9]. The latter techniques are such as Selective Catalytic Reduction (SCR) or Lean NOx Trap (LNT), and minimize airborne pollutants using filters, catalysts and other post-processing methodologies [10][11][12]. To achieve the desired performance of the NOx reduction techniques mentioned above, the accurate sensing of NOx emissions is highly recommended.
Engine operating conditions are closely related to NOx emissions and these conditions can be identified by checking the status (state variables) of the engine [3]. This indicates if the engine operating conditions are adjusted through speed, torque, fuel injection, etc., the birth amount of NOx emissions changes; the optimization of NOx emission formation by engine control is possible and highly requested for a healthy driving [13,14]. In the previous studies, understanding the formation process was required to optimize the NOx emission, and researches based on the thermodynamics or Computational Fluid Dynamics (CFD) have been conducted such as the NOx emission prediction using chemical kinetics, skeletal mechanisms, 2D/3D models, and two-zone thermodynamic model [15][16][17][18][19]. They made it possible to aware of what factors influence the NOx formation and reduction. However, understanding the relationship between the engine operating condition and emissions is not straightforward and thus the relationship has been indirectly analyzed using statistical model, control-oriented model, or Artificial Neural Networks (ANN) [20][21][22][23][24][25]. However, the relationships are still complex and input parameters used in the equations were too ideal. Currently, Deep Neural Networks (DNN) are considered in various engine analyses and adjustments. DNN is an advanced ANN with multiple layers for increased accuracy. This makes it a very suitable tool for engine research since unrealistic parameters for real applications, understanding the complex physical meaning, and developing highly accurate equations are not necessarily required [26][27][28]. Besides, the various engine operating conditions swept during the engine experiment can be used as the inputs of the DNN model, unlike the map-based model. Accordingly, prediction accuracy increase from the various engine operating conditions can be guaranteed after the appropriate DNN training. In DNN, data quantity for training and input/output definitions are well known for their impacts for problem-solving, and there are some studies to predict NOx emissions using DNN. However, despite the importance of the hyperparameter definitions, these values for DNN designs have often been overlooked or found manually (intuition or trial & error) in previous researches [29][30][31][32][33].
To address such problems, this study develops DNN based virtual NOx sensor model which adopts a Genetic Algorithm (GA) to determine the optimal hyperparameters: epoch size and learning rate. In the virtual sensor model, a user-defined object function using R-squared is applied for the global optimization and the necessary inputs for the virtual sensor model are presented. With the powerful tool of DNN combined with hyperparameter optimization, prediction performance increases as much as the real sensors in the engine such that the removal of physical sensors would be feasible soon. This leads to lower costs, prevents performance decreases over time, and improves sensor durability. Also, the effectiveness of this study could be proved with the accuracy comparison between the virtual and real sensors. Figure 1 shows the overall process of this study to develop a virtual sensor model of engine NOx emissions using DNN and GA, where the fundamental data for DNN training was acquired by real engine experiments.
The rest of this study is as follows. In Section 2, the experimental setup and environment for the data acquisition is introduced. Section 3 defines the virtual sensor model and calculation methods based on DNN. Section 4 explains the DNN hyperparameter optimization using GA, and Section 5 presents the simulation results and discussion for the novel virtual sensor. Finally, Section 6 concludes the study.

Experimental setup and environment
Engine experiment was conducted to obtain the engine state variables. The target engine was a 1.6-L four-cylinder engine and the displaced volume was 1592 cc. The bore and Import engine experiment data as simulation inputs   stroke of the engine were 77 mm and 85.8 mm, respectively, and the length of the connecting rod was 142 mm. The engine was equipped with a single-stage Variable Geometry Turbocharger (VGT) which contributes to combustion inside the cylinder to keep the flow constant. The specific engine specifications are listed in Table 1.
The configuration of the engine test system and engine operating points are shown in Figure 2 where the engine was controlled by AVL PUMA dynamometer [34]. The sample experimental min/max limitations for the engine dataset is listed in Table 2.

Virtual sensor model for NOx prediction
To develop a virtual NOx sensor model, supervised DNN was adopted. The data from the engine experiment were applied to the DNN training input/output and Python with TensorFlow was used for developing a NOx prediction model. To reduce the calculation burden of the virtual NOx sensor in the embedded system, simple 2 hidden layers were adopted. Also, node number reduction (5 to 3) for preventing complexity and sudden information loss while passing through the hidden layers was used as in Figure 3.

DNN model
The output of the sensor model was defined as the engine NOx emission. The inputs were the engine operating conditions measured by the experiment in Section 2. These engine state variables were grouped into the dataset comprising 696 experimental cases where 60% are used for training, 20% are used for validation, and the other 20% are used for testing. Each case provided engine rpm, fuel injection quantity (main, pilot, total), EGR rate, boost pressure, fuel injection timing (main, pilot), injection pressure, acceleration pedal, ambient temperature, humidity, ambient pressure, exhaust gas pressure, oil pressure, DPF temperature, coolant temperature, intake manifold temperature. Dealing with many kinds of state variables can cause an overfitting problem in DNN. To prevent such the problem, drop out, batch normal, validation dataset and L2 regularization were used. Among them, batch normal helps to solve the DNN problem with both accuracy and speed, in spite of the large hyperparameters [35]. L2 regulation simplifies the model based on penalizing the loss function [36] and is expressed as follows; where y i , h h , x i , h, and k are the true value, activation function, input value, weight, and regularization rate of 0.01 [36].

Hyperparameter in DNN
Hyperparameters in Table 3

Activation function
An Activation function converts the input signals of individual neurons into output signals. It adds non-linearities to neural networks by deciding whether weighted sum of neurons should be activated or not. In this study, a nonlinear function of Exponential Linear Unit (ELU) is used as the activation function. The ELU is an improved version of Rectified Linear Unit (ReLU). In addition to possessing all the advantages of ReLU, the ELU does not cause a dying ReLU problem. The basic equation of ELU is as follows: where a is an ELU parameter. ELU has the advantage of learning considerably faster than the other sigmoid and tanh functions. Moreover, it uses the exponential function. Therefore, negative input values can also be handled without a problem because the derivative value is nonzero [37].

Adam optimizer
Adaptive moment estimation (Adam) is a one-dimensional gradient-based optimizer, which demonstrated good performance in DNN studies [38][39][40]. This optimizer combines   Included the advantages of the AdaGrad and RMSPropit not only stores the exponential mean of the slope but also the exponential mean of the square value of the slope. The main advantage of the Adam optimizer is that the step size is not affected by gradient rescaling. Therefore, it is possible to stably descend for optimization, and the step size can be adapted by referring to the past gradient size. Equations (3) and (4) are the basic equations of the Adam optimizer [41]: where m t , v t , b 1 , b 2 , and J are the momentum, adaptive term, momentum decay rate of 0.9, adaptive term decay rate of 0.999, and cost function to be minimized, respectively.

Hyperparameter optimization using genetic algorithm
Hyperparameter optimizations were conducted manually in the previous researches. However, the quality of DNN supervised learning highly depends on the definition of hyperparameters and this study proposes using GA for the high performance. As shown in Figure 4, GA emulates the evolution of living things to obtain an optimal solution. Basically, the goal is to cross the data like genes, create and analyze mutations, and continue to produce offspring until they reach their optimum [42]. In this study, the optimization was started with the random selection of learning rates and epoch sizes where 10 offsprings (population) were made in the first generation. These 10 populations continue over the generations. The best 40% different elites for the next generation were chosen preferentially. Then, the elite offsprings crossed over each other and these were followed by random mutations where crossover and mutation rates were set to be 30% and 30% among 10 offsprings. In short, 4:3:3 of elites, crossover, mutation population ratio to make the weighting of elites, crossover, and mutation similar was used and continued during the GA simulation.  Fig. 4. Genetic algorithm application.  To select the best offspring elites, design variables and fitness function definitions were necessary. The design variables were the learning rates and epoch sizes, as explained in Section 3.2. The fitness function F O of GA for maximization is defined by, where R 2 , R-squared, is called the coefficient of determination [43]. R 2 gives the information of how well the actual and predicted values are related because is the index for linear regression between the real and expected values. Accordingly, maximizing the fitness function, R 2 becoming 1, during GA process (cf. Fig. 1) is suitable for the purpose of the model development. The closer R 2 gets to 1, even though the subtle, small changes brought about by R 2 , the larger F O becomes such that the fitness function increment is evident. The overall process of GA flow to optimize DNN hyperparameters is in Figure 5.

Simulation results and discussion
The initial 10 populations of DNN were set to be random between 0.001 and 0.005 for learning rate, and between 10 000 and 40 000 for epoch size. The best score of the fitness function with the initial condition was given by 41.5 (R 2 of 0.9759), however, prediction quality to replace real sensors demands better accuracy. The learning rates and epoch sizes evolve as the generation continues in GA until the fitness function meets the convergence condition. GA is a global searching algorithm, however, it is not easy to judge whether the answer found by GA is global or not. However, it can be judged by checking the progress of fitness function. The convergence condition in this problem was defined as no more increase over 10 generations because if it is difficult to find a better optimum during such long generations, the found answer has a high possibility of the global optimum. Figure 6 denotes F O over GA generations. They show the best elite results until satisfying convergence condition. The score of the fitness function is lifted up to optimum in the 18th generation and this value continues until the 28th generation, where R 2 is 0.9909 and very close to 1. Therefore, GA searching for optimum was successful.
In the DNN training process, the training cost for 28 generations of GA was about 7 h and 39 min. The training cost greatly depends on the learning rate and epoch size such that the amount of time it takes for the single DNN training varies greatly from seconds to minutes. Figure 7 shows the performance of 10 offsprings when GA meets the convergence condition, where the X-axis is real NOx emissions by experiment, and the Y-axis is the expected value by the virtual NOx sensor model. Basically, mse (mean square error) in DNN minimizes the error between the real and predicted values. In addition to this, R 2 based object function makes the data points be lined up as much as the linear regression function. To be noted, a difference of 10% or more between the values occurs frequently even when R 2 exceeds 0.97. Therefore, stricter R 2 is requested for the real sensor replacement.
After simulation, the sweet spot was around 0.007 (learning rate) and 70 000 (epoch size), and the optimum was found in Figure 7a, where learning rate and epoch size were 0.006926 and 72 321, respectively (R 2 of 0.9909). Figures 7a-7g also presented good performance and it was deduced that the selections of learning rate and epoch size for accurate prediction were not narrowly confined. However, too small epoch size always shows poor performance. Large values of epoch size and learning rate usually presented good performance, however, too large learning rate or epoch size would cause overshooting or overfitting problem, respectively. Table 4 shows the learning rates and epoch sizes after GA optimization convergence condition. In the table, (a)-(d) show the four different elites from the previous generation, and (e)-(g) are the crossover results made by (a)-(d). Finally, (h)-(j) are the mutation results. They were found in a totally different area compared to the elites of (a)-(d)'s location. Mutation makes randomness in the chromosome. Therefore, the chromosome after the mutation can be very poor like (j), where the slope of the regression function is much less than 1. Although mutation often presents such poor performance, it has an important role in GA to overcome the local optimum problem. Figure 8 shows the virtual NOx sensor performance comparisons between the initial and optimal models, where initial and optimal R 2 were 0.9759 and 0.9909, respectively.
In Figure 8a the prediction accuracy is not guaranteed over the whole area and deviations of more than 10% are frequently found. However, overall uniform results are presented in Figure 8b. In particular, the points are attached to the regression function in an almost straight line indicates that the result of the prediction shows the performance enough to replace the actual sensor. As a result, it was found GA was able to derive the optimum hyperparameters in DNN resulting in performance improvements in the practical and straightforward handling of optimization for the virtual NOx sensor model. When we control the NOx formation in the engine, the engine operating strategy is based on the birth amount of NOx itself. The proposed virtual NOx sensor can perceive the effects of the various engine operating conditions with high sensing speed such that more exquisite engine control based on accurate prediction would be possible without real sensors.

Conclusion
In this study, a virtual NOx sensor model using DNN and a GA was developed using Python with TensorFlow. The training data was obtained from real engine experiment. The initial DNN-based NOx model produced a poor R 2 . However, after GA optimization, the optimum R 2 could be realized. To achieve such the accuracy, overlooked but influential hyperparameters, learning rates, and epoch sizes were selected and the fitness function 1/(1 À R 2 ) was maximized until the convergence condition. As a result, the optimum value was found when the learning rate and epoch size were 0.006926 and 72 321 respectively, and the inaccurate and manually-tuned hyperparameters are defined evidently and automatically with GA such that the quality can be guaranteed to a practical level. It should be noted that once this sensor model has been developed, there is potential for the model and algorithm to be applied to other types of applications with some adjustments. Moreover, the reduction of real NOx sensors would lead to lower costs, prevent performance decreases over time, and improve system durability. In the future works, virtual NOx sensor adaptation for the effective LNT/SCR operation would be also followed.