Regular Article
GPUaccelerated hydraulic simulations of largescale natural gas pipeline networks based on a twolevel parallel process
^{1}
Sichuan Energy Internet Research Institute, Tsinghua University, 610042 Chengdu, PR China
^{2}
School of Mechanical Engineering, Beijing Key Laboratory of Pipeline Critical Technology and Equipment for Deepwater Oil & Gas Development, Beijing Institute of Petrochemical Technology, 102617 Beijing, PR China
^{*} Corresponding author: wangp@bipt.edu.cn
Received:
10
March
2020
Accepted:
15
September
2020
The numerical simulation efficiency of largescale natural gas pipeline network is usually unsatisfactory. In this paper, Graphics Processing Unit (GPU)accelerated hydraulic simulations for largescale natural gas pipeline networks are presented. First, based on the Decoupled Implicit Method for Efficient Network Simulation (DIMENS) method, presented in our previous study, a novel twolevel parallel simulation process and the corresponding parallel numerical method for hydraulic simulations of natural gas pipeline networks are proposed. Then, the implementation of the twolevel parallel simulation in GPU is introduced in detail. Finally, some numerical experiments are provided to test the performance of the proposed method. The results show that the proposed method has notable speedup. For five largescale pipe networks, compared with the wellknown commercial simulation software SPS, the speedup ratio of the proposed method is up to 57.57 with comparable calculation accuracy. It is more inspiring that the proposed method has strong adaptability to the large pipeline networks, the larger the pipeline network is, the larger speedup ratio of the proposed method is. The speedup ratio of the GPU method approximately linearly depends on the total discrete points of the network.
© Y. Xiang et al., published by IFP Energies nouvelles, 2020
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nomenclature
g : Gravitational acceleration, m/s^{2}
A : Crosssectional area, m^{2}
M : Number of pipelines in the pipeline network
N : Number of sections into which the pipelines are divided
N_{st } : Max number of reduced steps, N_{st} =
Greek symbols
θ : Inclination angle of the pipe, rad
Superscripts
Subscripts
i : Node number or pipeline number
1 Introduction
With the increasing demand for clean energy resources worldwide, natural gas plays a more important role than previously in the energy structure [1, 2]. Pipeline networks are the most common means of natural gas transportation. They are thus being constructed on a large scale in recent years [3]. The scale of natural gas pipeline networks is becoming increasingly larger and the topological structure is becoming increasingly more complex [4].
Prediction of the flow parameters in a natural gas pipeline network is important to managers who create the operating schedule and forecast potential risk [5, 6]. Hydraulic simulation is the most common way to analyze the flow in a gas pipeline network [7, 8]. As early as the 1950s, hydraulic simulation for gas pipelines began with some simplified mathematical models [9, 10]. Some terms in the governing equations of pipeline flow, such as the inertia term, the transient term or the friction term, are usually ignored in these models. In later years, some numerical methods were proposed to solve the nonsimplified mathematical model, such as the method of characteristics [11], the method of lines [12], and the finite difference method [13]. Among these methods, the implicit finite difference method is one of the most commonly used methods because of its high stability and accuracy.
However, because all the equations should be solved simultaneously at each time step, there is a large computational cost for the implicit finite difference method. To solve this issue, studies have been done to accelerate the computation. Luongo [14] proposed a novel linearization method for the partial differential equations in a gas pipeline, which can save 25% of the computational time. BehbahaniNejad and Shekari [15] presented a reducedorder modeling approach based on the implicit finite model for natural gas transient flow in pipelines, showing a 71.14% time saving compared with the direct simulation of the origin nonlinear model. To further improve the efficiency of the implicit finite difference method, an adaptive implicit finite difference method for natural gas pipeline transient flow was introduced [16]. The results show that the calculation speed of the slow transient simulation is accelerated and the computation accuracy of the fast transient simulation is high. What is more, the Decoupled Implicit Method for Efficient Network Simulation (DIMENS) method was proposed in our previous study [17, 18]. DIMENS is a rapid method for the hydraulic simulation of natural gas pipeline networks. In DIMENS, the pipelines in the networks are decoupled and solved independently. Comparison with the wellknown commercial simulation software SPS, the speedup ratio is 2.5. Although these methods improve the efficiency of the hydraulic simulation of gas pipeline networks, simulation efficiency is lacking.
In recent years, Graphic Processing Unit (GPU) acceleration has attracted considerable research interest in scientific computing. Many studies in various fields, e.g., fluid simulation [19–21], weather prediction [22, 23], seismic damage simulation [24], and sheet forming simulation [25], show great performance with GPU computing. GPU computing is the use of a GPU as a coprocessor to accelerate CPUs for generalpurpose scientific and engineering computing. Compared with a CPU, on a single PC, GPU uses thousands of smaller and more efficient cores for a massively parallel architecture aimed at handling multiple functions at the same time. It provides superior processing power, memory bandwidth and efficiency over its CPU counterparts.
Obviously, if GPU technology is introduced into the simulation of natural gas pipeline networks, the computational efficiency would be improved substantially. However, to the best of authors’ knowledge, it is hard to see any literature focusing on the GPUaccelerated simulations of natural gas pipeline networks due to the lack of suitable parallel algorithm. Fortunately, the DIMENS method in our previous work [17, 18] is a highly parallel algorithm. The core idea of this method is that the pipelines are first decoupled and then solved independently, thus the parallel solution process is very suitable for GPU computing. In other word, with the combination of proposed DIMENS method with GPU technology, the computational speed of natural gas pipeline networks simulation would be greatly accelerated. Therefore, to further improve the simulation efficiency of natural gas pipeline networks, our DIMENS method is extended to a twolevel parallel process, and then it is implemented on GPU. A novel twolevel parallel simulation process and the corresponding parallel numerical method for hydraulic simulation of natural gas pipeline networks are proposed. The proposed method can effectively improve the simulation efficiency of largescale natural gas pipeline network, and can be used to guide decisions regarding the design and operation of the pipeline network system.
The outline of this paper is the following. In Section 2, the mathematical model and the DIMENS method are briefly introduced and analyzed. In Section 3, the idea of the twolevel parallel process and the corresponding parallel numerical method for the hydraulic simulation of largescale networks are proposed. In Section 4, the implementation of the twolevel parallel process with GPUs is presented in detail. In Section 5, numerical experiments are presented to verify the accuracy and efficiency of the proposed GPUaccelerated method. The conclusions are summarized in Section 6.
2 Mathematical model and the DIMENS method
In this section, the mathematical model for a gas pipeline network and the DIMENS method are reviewed. For a brief and clear description, a network that includes only the pipeline, the supply and the demand, are studied. The mathematical model of complex networks, which also includes a compressor and a valve can be found in the study [17, 18].
2.1 Mathematical model
2.1.1 Pipeline model and discretization
According to our previous study, the hydraulic governing equations of a gas pipeline, incorporating the continuity equation and momentum equation, is written as a general formula [17, 18] as follows:(1)where , , .
The hydraulic equation is a nonlinear partial difference equation, and its direct solution is usually unstable and inefficient [7, 26]. Therefore, equation (1) is linearized as equation (2) based on the Taylor expansion [27] as follows:(2)where, the elements (i, j) of matrices G and S are and , respectively. The detailed expressions of G and S can be found in Appendix A. u_{i} is the ith corresponding component of U. , , , and are calculated using the results of the previous time step.
The implicit finite difference method is adopted to discretize equation (2) on a uniform pipe mesh, as shown in Figure 1. The space grid and time grid are divided horizontally and vertically, and Δx and Δt denote the space step and the time step, respectively.
Fig. 1 Schematic diagram of the uniform mesh. 
For the ith section in Figure 1, the following discretization is conducted [17, 18, 28, 29]:(3)(4)(5)(6)
By substituting equations (3)–(6) into equation (2) and rearranging the terms, the discretization is obtained as follows:(7)where , , , I is a 2 × 2 identity matrix.
2.1.2 Boundary conditions
The supply and demand are referred to as the gas source. The mathematical model of the gas source is shown in equations (8)–(9). They are also called the external boundary conditions of pipeline network:(8)(9)
At the junction node of the components, the laws of mass conservation and pressure balancing should be satisfied. The mathematical model of a junction node is shown in equations (10)–(11), which are also called internal boundary conditions, as follows:(10)(11)
Equations (7)–(11) are the hydraulic discretized equations of a natural gas pipeline network. An Equation Of State (EOS) which expresses the density in terms of pressure and temperature needs to be close to these equations. Several different EOSs, such as Peng–Robinson (PR) and Soave–Redlich–Kwong (SRK), are commonly used in natural gas industry [30]. Solving this system of hydraulic discretized equations, the pressure and mass flowrate in the pipe can be obtained.
2.2 DIMENS method
The DIMENS method is proposed based on the idea of divideandconquer in our previous study [17, 18]. In the DIMENS method, the gas pipeline network is divided into several pipelines and a set of junction nodes, then the equations of the pipeline and those of junction nodes are solved independently. In this way, the system of origin discretized equations is split into two parts: the system of discretized equations for the pipe in equation (7) and the system of boundary equations in equations (8)–(11). In this paper, we call them PDES and BES, respectively. There are as many PDESs in the system of discretized equations as there are pipelines in a network.
To describe the decoupling of the DIMENS method, a simple pipeline network of 9 nodes and 12 pipes is used as an example, as shown in Figure 2, where red circles represent nodes and black lines represent pipes. The blue points indicate the spatial discretized points of the pipe. Using the DIMENS method, the simple network is divided into 9 nodes and 12 pipes. Using this method, one set of original equations with 9 + 12N size is divided into one BES with 9 size and twelve block PDES with N size.
Fig. 2 Decoupling sketch of the simple network of 9 nodes and 12 pipes. 
The solution process of the PDES and the BES using the DIMENS method can be shown in Figure 3. More details about the DIMENS method can be found in the literature [17, 18].
Fig. 3 Flow diagram of the DIMENS method for pipeline networks. 
In the DIMENS method, the gas pipeline network is divided into several pipelines and a set of junction nodes, which can be simulated separately. The evidence in Figure 3 shows that the solution process of the PDESs in both step 1 and step 3 are separate. These analyses show that the DIMENS method is a highly parallel simulation method. However, the DIMENS method in our previous study [16, 17] is a single process. The advantages of the DIMENS method in simulation efficiency has not been fully utilized. Parallelization of the DIMENS method is an effective method to further improve the simulation efficiency in pipeline networks.
3 Twolevel parallel simulation process and numerical method
In order to implement our DIMENS method on the GPU, the DIMENS method is extended to a novel twolevel parallel simulation process and the corresponding parallel numerical method is introduced in detail in the following sections.
3.1 Twolevel parallel simulation process
Because of the lightweight GPU computing, the parallel granularity should be fine enough to maximize the GPU utilization. Therefore, the twolevel parallel simulation process is designed to be coarse and fine level processes. In the coarse level process, the task of network simulation is first divided into several solution tasks for the PDESs and the BES. In the fine level process, the solution of the PDES or the BES is then divided into parallel arithmetic operations to minimize computational effort.
3.1.1 Coarse level parallel process
In DIMENS, the solution process of the PDESs is separate; therefore, each PDES can be treated as a subtask. Because the junction nodes and gas sources are independent of the pipe, BES can be similarly treated as another subtask. That is, in the coarse level parallel process, the task of network simulation is divided into several solution tasks for PDESs and BES, the solution process of a PDES or a BES is considered a subtask. Figure 4 shows a sketch of the coarse level division of a simple network. The sketch shows a simple network divided into several pipelines and a set of junction nodes. The task of network simulation is divided into twelve solution tasks for the PDESs and one solving task for the BES.
Fig. 4 Sketch of the coarse level division of a simple network. 
3.1.2 Fine level process
There are usually many discretized points in one pipeline and many junction nodes in a network; therefore, the solving task for PDES or BES is heavy even when using GPU, which could result in low computational speed. To solve this problem, fine level patterns for division of the above tasks are proposed. In the fine level process, the solution task for the PDES or the BES is divided into parallel arithmetic operations, and each arithmetic operation is a subtask.
Solution task for the PDES.
The PDES solution is the main computational stack of DIMENS. Therefore, parallelizing the solution process of equation (12) would greatly improve the computation speed. In our previous study [17, 18], equation (12) was solved by the block TDMA method. The TDMA method is a Gaussianeliminationbased method, which is usually difficult to parallelize [31] on a GPU.
In this study, the Parallel Cyclic Reduction (PCR) method [32] is adopted to parallelize the PDES solution. The PCR method was initially designed to solve scalar tridiagonal equation systems by Hockney and Jesshope in 1981. In this study, it is extended to solve the vectorial equation system in equation (12). The subtask of solving the PDES is the reduction and solution step of PCR. First, the basic parallel process of PCR is briefly introduced, then the parallel solution process of the PDES solution is described in detail in Section 3.2.2.
Solution task for BES.
Because of the sparsity and irregularity of the BES, a modified Conjugate Gradient (CG) algorithm was adopted to solve it with high efficiency in our previous study [17, 18]. Because of the large number of matrix–matrix and matrix–vector multiplications in the CG method, it is readily parallelized by assigning a few steps of the arithmetic operations to each computing core. The subtasks in solving the BES are the arithmetic operations.
3.2 Parallel numerical method
3.2.1 Basic process of PCR
In PCR, the one system of original equations is reduced to a system of twice the number of reduced equations, which has only half the number of unknowns. The reduction procedure does not stop until each equation system has at most 2 unknowns. The equations of 2 unknowns are readily solved. Figure 5 shows the reduction pattern of the PCR method in a simple case 8 equations. Letters e′ and e″ represent the updated equations. Equations in a yellow rectangle form a system. The parallel process is introduced briefly in the following.
Step ①: The 1st reduction is performed by reducing the evenindexed and oddindexed unknowns, reducing the one system of 8unknown equations to two systems of 4unknown equations.
Step ②: The 2nd reduction is performed by the same operation, reducing the two systems of 4unknown equations to four systems of 2unknown equations.
Step ③: The four systems of 2unknown equations are readily solved. There are reduction and solution processes in the PCR method.
Fig. 5 Reduction pattern of PCR in the 8unknown case. 
The system of 8unknown equations is thus solved in 3 steps using the PCR method. In addition, the process for each step in PCR is parallel. Therefore, when the PCR method is extended to solve the PDES, i.e., equation (12), they can be solved by N + 1 logical computing cores in only steps. In this way, the computing task for the PDES can be lightweight and efficiently executed with GPU.
3.2.2 PCR method for PDES solution
According to the basic parallel process of PCR, the parallel solution process of the PDES solution can be divided into two steps: the reduction step and the solution step. These two steps are described in detail in the following.
Reduction step:
For the internal equation in each equation (13) system, i.e., 2^{k} < i < N + 1 − 2^{k}, k = 1, 2, …, N_{st} − 1, the equation coefficients are rewritten as follows:
For the first equation in each equation (13) system, i.e., i = 2^{k}, the equation coefficients are rewritten as follows:
For the last equation in each equation (13) system, i.e., i = N + 1 − 2^{k}, the equation coefficients are rewritten as follows:
The superscripts k − 1 and k represent the (k − 1)th and the kth reduced systems, respectively.
Solution step:
Equations (10) and (11) can be readily solved and X_{i} and are obtained as follows:(16)(17)where k = N_{st} − 1, .
Equation (13) is the reduction process, and equations (16)–(17) are the solution process. The three equations compose the computing task in the fine level process of the PDES solution. There are several steps of 2 × 2 matrix multiplication for each discretized point, which is a series of arithmetic operations. Therefore, the computing task is lightweight and equation (12) can be solved efficiently on GPU.
3.3 Flow diagram of the twolevel parallel simulation process
Now, we establish a twolevel model for the division of the computing task. Figure 6 shows a flow diagram of the twolevel parallel simulation process.
Fig. 6 Flow diagram of the twolevel parallel simulation process. 
4 Implementation of the twolevel parallel simulation process on GPU
In this section, the implementation of the twolevel parallel simulation process on GPU is described. In contrast with CPU programming, programming for GPU must consider the relation between the code and the hardware, especially thread mapping, memory organization and memory access. The implementation of these three parts are discussed below.
4.1 Thread mapping model
To manage and dispatch the GPU threads as required by the algorithm, i.e., establishing a correct thread mapping model, we need a developing tool such as OpenCL or the Compute Unified Device Architecture (CUDA) to aid in the GPU programming. Between the two tools, CUDA is the most widely used GPU programming environment based on C/C++ for GPUs produced by NVIDIA Corporation [33]. As shown in Figure 7, CUDA uses a twolevel thread mapping model, i.e., the grid level and the thread block level. Host and Device indicate CPU and GPU, respectively. Kernel represents the part that can be parallelized in the problem; it is launched by CPU and executed on GPU. Each Kernel has a grid structure, consisting of multiple thread Blocks, and each thread Block also consists of multiple threads with a grid structure.
Fig. 7 CUDA programming model. 
As shown in Figure 7, GPU is a doublelevel parallel simulation process, which perfectly integrates with the twolevel parallel simulation process proposed in this paper. In general, the coarselevel and finelevel parallel computing tasks would be mapped to grid and thread blocks, respectively. Thread blocks are executed concurrently on GPU, and threads in a same block are also executed concurrently. Therefore, there are two thread mapping models: the first model is the DIMENS method, which is the coarselevel thread mapping model; and the second model is for solving PDES and BES, which is the finelevel thread mapping model.
Since the pipes are decoupled by DIMENS, the coarselevel thread mapping model is readily constructed, as shown in Figure 8. In this model, each thread block is responsible for solving the PDES of the corresponding pipe.
Fig. 8 Thread mapping model of the coarselevel parallelism. 
In the finelevel thread mapping model used to solve the PDES, the computing tasks for each discrete point, i.e., equations (13), (16) and (17), are mapped to a single thread in the corresponding thread block. Figure 9 is a sketch of the thread mapping model.
Fig. 9 Thread mapping model of the finelevel parallelism for solving the PDES. 
With respect to solving the BES, the thread mapping model is developed for matrix–matrix and matrix–vector multiplications. Fortunately, CUDA provides abundant libraries that have highly optimized functions for linear algebra, such as cuSPARSE, cuSOLVER, and cuBLAS in CUDA Toolkit Documentation v7. 5. In this study, the cuSPARSE library is used to accelerate the CG method for solving the BES.
4.2 Memory organization
Memory organization is the way of distributing different data to different types of memory on the GPU. There are five primary types of memory on GPU, i.e., registers, shared memory, global memory, constant memory and texture memory. Figure 10 shows the memory structure on GPU, where the data stored in Register and Shared Memory are private for thread and thread block, respectively. Data stored in Global, Constant and Texture memory are public for all threads and thread blocks. Table 1 demonstrates the access latency of different types of memory.
Fig. 10 Sketch of the memory structure on GPU. 
Access performance of different types of memory.
Although registers and shared memory have excellent performance compared with the other types of memory, there are very limited resources for registers and shared memory on GPU. Therefore, the memory organization should be carefully designed such that those frequently used data can be stored in highspeed memory to increase the cache hit rate. Table 2 shows the memory organization for solving PDES on GPU.
Memory organization for solving PDES on GPU.
Flowrate (WNm^{3}/d) obtained by the three methods and their comparison.
Pressure (MPa) obtained by the three methods and their comparison.
Computing time of the three methods and their comparison (smallsized nature gas pipeline network).
Component data of the five large gas pipeline networks.
Comparison of computing time of the three methods in five large nature gas pipeline networks.
4.3 Memory access
In addition to an appropriate memory organization, coalesced access for global memory is also required to avoid repetitive access with low efficiency [27], i.e., data locality is very important. Some APIs provided by CUDA, e.g., cudaMallocPitch, have been developed to support coalesced access for data in global memory.
The performance of the above optimization is tested with different numbers of discrete points in the pipe, i.e., different size of PDES. Figure 11 shows a great improvement in computing performance using highspeed memory and coalesced memory accessing. All the numerical experiments conducted in this study are based on the hardware configuration introduced in Section 5. The data in Figure 11 show that in these four cases, the speedup ratio for using both optimization methods are 5.67x, 4.00x, 3.83x and 3.57x, respectively.
Fig. 11 Comparison of the computing performance solving PDES with and without memory access optimization. 
5 Case study
In this section, experiments are provided to test the performance of the proposed GPU simulation of a largescale gas pipeline network, short as GPU method. To illustrate the efficiency of GPU acceleration, the computing speed increase is compared with that of the DIMENS method which implemented on CPU by sequential FORTRAN program and with the widely used commercial software Stoner Pipeline Simulator (SPS) in these experiments. In this paper, a PC with E52640 V3 CPU and GTX 980 GPU is used for the experiments and the CUDA version is 7.5.
5.1 Smallsized nature gas pipeline network case
5.1.1 Description of pipeline network
First, to evaluate the accuracy of the proposed GPU method, a smallsized gas pipeline network is tested. Figure 12 shows the topological structure of a gas gathering pipeline network in a particular oil field [34]. The pipeline network comprises 73 pipes, 70 junction nodes and 52 boundary nodes. The structure data and parameters of pipes are presented in the Appendix Table B1. The boundary conditions and the components of the natural gas are listed in the Appendix Tables B2 and B3, respectively.
Fig. 12 Topological structure of an actual gas pipeline network. 
5.1.2 Comparison of numerical accuracy
The steadystate simulation results computed by SPS software, DIMENS and the proposed method are shown in Tables 3 and 4. The total simulation time is 120 h. The time step is Δt = 6 s and the spatial step Δx = 600 m, based on which the grid independent numerical solution can be obtained. It should be noted that: (1) the flowrate of the junction nodes represents the net value; (2) the max error represents the maximum value of the error between SPS software and GPU method and that between DIMENS and GPU method.
It can be seen that the maximum relative error of flowrate obtained by the three methods is 4.15% at the junction node 32, as shown in the Table 3; the maximum relative error of pressure obtained by the three methods is 2.88% at the end node of pipe 1, as shown in the Table 4. The maximum relative error is acceptable in numerical simulation and engineering. This comparison indicates good precision of the proposed GPU accelerated method.
5.1.3 Comparison of computing speed
The computing acceleration of the GPU method is compared with that of the SPS software and the DIMENS method with different spatial steps Δx. The results are shown in Table 5.
It can be seen that the SPS software is most timeconsuming. The computing time of the DIMENS method is about half of that of SPS software. For example, when the spatial step Δx = 400 m, the computing time of SPS software and DIMENS method are 5848.29 s and 2735.21 s, respectively. It means that the computational speed of DIMENS method is about 2 times faster than the SPS software, which is consistent with the conclusion of our previous study [11]. Additionally, the computing time of GPU method is much less than those of DIMENS method and SPS software. The computing time of GPU method is about 1/10 and 1/5 of those of the SPS software and DIMENS method. For example, when the spatial step Δx = 400 m, the computing time of the GPU method is 516.62 s, which is about 1/10 and 1/5 of 5848.29 s and 2735.21 s. It indicates that the computational efficiency of GPU method is about 10 and 5 times than those of SPS software and DIMENS method, respectively. What is more, the speedup ratio of GPU method vs. SPS increases from 5.73 to 12.63 as spatial step decreases from 600 m to 200 m, which means the speedup ratio increases as the computation scale increases. It indicates that the GPU method has strong adaptability to the simulation of the pipeline network.
5.2 Large scale natural gas pipeline network cases
In Section 5.1, the calculation accuracy and computing speed of the GPU method have been verified in the small scale pipeline networks. In this subsection, the calculation speed is further tested with large gas pipeline networks.
With the pipeline network shown in Figure 12 as the basic pipeline network unit, we have five pipeline networks, named No. 1 to No. 5, which contain different number of network units and therefore have various sizes. The topological graphs of these five gas pipeline networks are too large to demonstrate in this paper, thus only the component data are shown in Table 6. The main purpose of designing these five pipeline networks is to verify the applicability of the proposed method to the largescale pipe network. These five networks cases are all transient simulations cases. The simulation time is 1 day, the mesh spatial size and time step are Δx = 500 m and Δt = 30 s, respectively.
The computing time of the three methods in the five large pipeline networks is shown in Figure 13. It can be seen from Figure 13 that the computing time of SPS software is the largest one, while that of GPU method is the smallest one. It suggests that the simulation efficiency of the GPU method is high. Additionally, the computing time of the SPS software and DIMENS method linearly depends on the total discrete points, while the computing time of GPU method is almost stable. In other words, when the total discrete points increase, the computing time of the SPS software and DIMENS method linearly increases, but that of the GPU method almost keeps constant. Thus, the GPU method has high computational speed and strong adaptability to the large pipeline networks.
Fig. 13 Computing time of the three methods in the five large pipeline networks. 
The comparisons of computing time of the three methods in the five large pipeline networks are shown in Table 7. The speedup ratio of DIMENS vs. SPS is about 2 in those five large networks, which is consistent with the data in Table 5. It is to say that the computational speed of DIMENS method is also about 2 times faster than the SPS software for the large pipeline network. This proves that DIMENS method is very suitable for large pipe network, which is consistent with the conclusion of our previous study [17]. In contrast, the speedup ratio of GPU is notable. The minimum speedup ratio of GPU method vs. DIMENS and GPU method vs. SPS are respectively 7.93 and 17.24, which are much larger than 2. In the case of GPU method vs. SPS, the maximum speedup ratio is even up to 57.57. It further confirms that the computational speed of the GPU method is fast.
The speedup ratio of the GPU method in the five large pipeline networks is shown in Figure 14. It can be seen that, the speedup ratio of GPU method vs. SPS and GPU method vs. DIMENS increase as the total discrete points of pipeline network increase. The speedup ratio of the GPU method approximately linearly depends on the total discrete points. It implies that the larger the pipeline network is, the larger speedup ratio of the GPU method is. It further confirms that the GPU method has strong adaptability to the simulation of the pipeline network, especially for large pipeline network.
Fig. 14 Speedup ratio of the GPU method in the five large pipeline networks. 
6 Conclusion
To further improve the simulation efficiency of natural gas pipeline networks, a novel GPUaccelerated hydraulic simulation was study in this paper. First, based on the DIMENS method in our previous study [16, 17], a twolevel pattern for division of the computing tasks and the corresponding parallel numerical method were proposed. Then the thread mapping, memory organization and memory access of GPU were carefully designed, and the twolevel parallel process was implemented on the GPU. Last, one small and five large nature gas pipeline networks were presented to verify the accuracy and efficiency of the proposed GPU method. The following conclusions can be drawn:
The proposed GPU method has high accuracy. The accuracy of the result obtained by the GPU method is almost the same with the DIMENS method and the SPS software. In the small nature gas pipeline network case, the maximum relative error of results obtained by the above three methods is only 4.15%.
The proposed GPU method has notable speedup. In the five largescale pipe networks cases, compared with the wellknown commercial simulation software SPS, the speedup ratio of the proposed method is up to 57.57 with comparable calculation accuracy.
The proposed method has strong adaptability to the large pipeline networks. In the five largescale pipe networks cases, the larger the pipeline network is, the larger speedup ratio of the proposed method is. The speedup ratio of the GPU method approximately linearly depends on the total discrete points of the network.
Appendix A
The detailed expressions of G and S in equation (2) are as presented follows,(A1)(A2)
Appendix B
Structure data and parameters of pipes.
Structure data and boundary conditions of boundary nodes.
Main components of natural gas.
Acknowledgments
The study is supported by the National Natural Science Foundation of China (No. 51806018), the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality (No. IDHT20170507) and the Program of Great Wall Scholar (No. CIT&TCD20180313).
References
 Brentner K.S., Farassat F. (1998) Analytical comparison of the acoustic analogy and Kirchhoff formulation for moving surfaces, AIAA J. 360, 8, 1379–1386. https://doi.org/10.2514/2.558. [CrossRef] [Google Scholar]
 Bai H. (2020) Mechanism analysis, anticorrosion techniques and numerical modeling of corrosion in energy industry, Oil Gas Sci. Technol.  Rev IFP Energies nouvelles 75, 42. [CrossRef] [Google Scholar]
 Shi J., AlDurra A., Matraji I., AlWahedi K., AbouKhousa M. (2019) Application of Particle Swarm Optimization (PSO) algorithm for Black Powder (BP) source identification in gas pipeline network based on 1D model, Oil Gas Sci. Technol.  Rev. IFP Energies nouvelles 74, 47. [CrossRef] [Google Scholar]
 Su H., Zio E., Zhang J., Yang Z., Li X., Zhang Z. (2018) A systematic hybrid method for realtime prediction of system conditions in natural gas pipeline networks, J. Nat. Gas Sci. Eng. 57, 31–44. [CrossRef] [Google Scholar]
 Sanaye S., Mahmoudimehr J. (2012) Technical assessment of isothermal and nonisothermal modelings of natural gas pipeline operational conditions, Oil Gas Sci. Technol.  Rev. IFP Energies nouvelles 67, 3, 435–449. [CrossRef] [Google Scholar]
 Qiao W., Huang K., Azimi M., Han S. (2019) A novel hybrid prediction model for hourly gas consumption in supply side based on improved whale optimization algorithm and relevance vector machine, IEEE Access 7, 88218–88230. [CrossRef] [Google Scholar]
 Wang P., Yu B., Deng Y., Zhao Y. (2015) Comparison study on the accuracy and efficiency of the four forms of hydraulic equation of a natural gas pipeline based on linearized solution, J. Nat. Gas Sci. Eng. 22, 235–244. [CrossRef] [Google Scholar]
 Bagajewicz M., Valtinson G. (2014) Computation of natural gas pipeline hydraulics, Ind. Eng. Chem. Res. 53, 26, 10707–10720. [CrossRef] [Google Scholar]
 Streeter V.L., Wylie E.B. (1970) Natural gas pipeline transients, Soc. Pet. Eng. J. 10, 04, 357–364. [CrossRef] [Google Scholar]
 Osiadacz A.J. (1996) Different transient flow modelslimitations, advantages, and disadvantages, in: PSIG Annual Meeting, PSIG9606, San Francisco, California. [Google Scholar]
 Yow W. (1971) Analysis and control of transient flow in natural gas piping systems. http://hdl.handle.net/2027.42/8473. [Google Scholar]
 Osiadacz A. (1984) Simulation of transient gas flows in networks, Int. J. Numer. Meth. Fl. 4, 1, 13–24. [CrossRef] [Google Scholar]
 Wylie E.B., Stoner M.A., Streeter V.L. (1971) Network: System transient calculations by implicit method, Soc. Pet. Eng. J. 11, 04, 356–362. [CrossRef] [Google Scholar]
 Luongo C.A. (1986) An efficient program for transient flow simulation in natural gas pipelines, in: PSIG Annual Meeting, PSIG8605, New Orleans, Louisiana. [Google Scholar]
 BehbahaniNejad M., Shekari Y. (2008) Reduced order modeling of natural gas transient flow in pipelines, Int. J. Eng. Appl. Sci. 5, 7, 148–152. [Google Scholar]
 Wang P., Yu B., Han D., Li J., Sun D., Xiang Y., Wang L. (2018) Adaptive implicit finite difference method for natural gas pipeline transient flow, Oil Gas Sci. Technol.  Rev. IFP Energies nouvelles 73, 21. [CrossRef] [Google Scholar]
 Wang P., Yu B., Han D., Sun D., Xiang Y. (2018) Fast method for the hydraulic simulation of natural gas pipeline networks based on the divideandconquer approach, J. Nat. Gas Sci. Eng. 50, 55–63. [CrossRef] [Google Scholar]
 Wang P., Ao S., Yu B., Han D. (2019) An efficiently decoupled implicit method for complex natural gas pipeline network simulation, Energies 12, 8, 1516. [CrossRef] [Google Scholar]
 Thibault J.C. (2009) Implementation of a Cartesian grid incompressible NavierStokes solver on multiGPU desktop platforms using CUDA, Boise State University, Boise, Idaho. [Google Scholar]
 Lacasta A., MoralesHernández M., Murillo J., GarcíaNavarro P. (2014) An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes, Adv. Eng. Softw. 78, 1–15. [CrossRef] [Google Scholar]
 Tölke J., Krafczyk M. (2008) TeraFLOP computing on a desktop PC with GPUs for 3D CFD, Int. J. Comput. Fluid. D. 22, 7, 443–456. [CrossRef] [Google Scholar]
 Michalakes J., Vachharajani M. (2008) GPU acceleration of numerical weather prediction, Parallel Process. Lett. 18, 04, 531–548. [CrossRef] [Google Scholar]
 Acuña M., Aoki T. (2009) Realtime tsunami simulation on multinode GPU cluster, in: ACM/IEEE Conference on Supercomputing. [Google Scholar]
 Lu X., Han B., Hori M., Xiong C., Xu Z. (2014) A coarsegrained parallel approach for seismic damage simulations of urban areas based on refined models and GPU/CPU cooperative computing, Adv. Eng. Softw. 70, 90–103. [CrossRef] [Google Scholar]
 Cai Y., Li G., Wang H., Zheng G., Lin S. (2012) Development of parallel explicit finite element sheet forming simulation system based on GPU architecture, Adv. Eng. Softw. 45, 1, 370–379. [CrossRef] [Google Scholar]
 Zheng J., Song F., Chen G. (2011) Development of RealPipeGas simulation software for gas pipeline network, Oil Gas Storage Transp. 30, 9, 659–662. (in Chinese). [Google Scholar]
 Luskin M. (1979) An approximation procedure for nonsymmetric, nonlinear hyperbolic systems with integral boundary conditions, Siam J. Numer. Anal. 16, 1, 145–164. [CrossRef] [Google Scholar]
 Kiuchi T. (1994) An implicit method for transient gas flows in pipe networks, Int. J. Heat. Fluid Fl. 15, 5, 378–383. [CrossRef] [Google Scholar]
 Abbaspour M., Chapman K.S. (2008) Nonisothermal transient flow in natural gas pipeline, J. Appl. Fluid Mech. 75, 3, 519–525. [Google Scholar]
 Zhang T., Li Y., Li Y., Sun S., Gao X. (2020) A selfadaptive deep learning algorithm for accelerating multicomponent flash calculation, Comput. Method. Appl. M. 369, 113207. [CrossRef] [Google Scholar]
 Li R., Saad Y. (2013) GPUaccelerated preconditioned iterative linear solvers, J. Supercomput. 63, 2, 443–466. [CrossRef] [Google Scholar]
 Hockney R.W., Jesshope C.R. (2019) Parallel computers 2: Architecture, programming and algorithms, CRC Press, Boca Raton, FL. [CrossRef] [Google Scholar]
 Harris M. (2013) How to access global memory efficiently in CUDA C/C++ kernels. http://devblogs.nvidia.com/parallelforall/howaccessglobalmemoryefficientlycudackernels. [Google Scholar]
 Yu B., Wang P., Wang L., Xiang Y. (2017) A simulation method for natural gas pipeline networks based on the divideandconquer concept, Oil Gas Storage Transp. 36, 1, 75–84. (in Chinese). [Google Scholar]
All Tables
Computing time of the three methods and their comparison (smallsized nature gas pipeline network).
Comparison of computing time of the three methods in five large nature gas pipeline networks.
All Figures
Fig. 1 Schematic diagram of the uniform mesh. 

In the text 
Fig. 2 Decoupling sketch of the simple network of 9 nodes and 12 pipes. 

In the text 
Fig. 3 Flow diagram of the DIMENS method for pipeline networks. 

In the text 
Fig. 4 Sketch of the coarse level division of a simple network. 

In the text 
Fig. 5 Reduction pattern of PCR in the 8unknown case. 

In the text 
Fig. 6 Flow diagram of the twolevel parallel simulation process. 

In the text 
Fig. 7 CUDA programming model. 

In the text 
Fig. 8 Thread mapping model of the coarselevel parallelism. 

In the text 
Fig. 9 Thread mapping model of the finelevel parallelism for solving the PDES. 

In the text 
Fig. 10 Sketch of the memory structure on GPU. 

In the text 
Fig. 11 Comparison of the computing performance solving PDES with and without memory access optimization. 

In the text 
Fig. 12 Topological structure of an actual gas pipeline network. 

In the text 
Fig. 13 Computing time of the three methods in the five large pipeline networks. 

In the text 
Fig. 14 Speedup ratio of the GPU method in the five large pipeline networks. 

In the text 