Table 1
Performance comparison between the reference Lagrange-remap solver and the Lagrange-flux solver in MCUPs, using different machine configurations. Scalability (last column) is computed as the speedup of the multithreaded vectorized version compared to the baseline purely sequential version. Tests are performed for fine meshes, such that kernel data lies in DRAM memory. The Lagrange-flux solver exhibits superior scalability, because it has — by design — better arithmetic intensity.
Scheme | 1 core | 1 core AVX | 16 cores AVX | Scalability |
---|---|---|---|---|
Lagrange-flux | 2.6 | 5.8 | 81.0 | 31.1 |
Reference | 2.5 | 3.8 | 37.0 | 14.8 |