Abstract
The purpose of this report is to exchange our experience with parallelizing existing scientific
codes by utilizing profiling tools and the OpenMP application programming interface (API) for
multi-platform shared-memory parallel programming in C/C++ and Fortran.
Profiling is a very good tool to get an indication on which parts of the program to concentrate
on in order to parallelize scientific codes. In general there will be necessary to get more
detailed information regarding the relative importance of different sub-blocks inside an interesting
subroutine. This information might be obtained by the use of manually inserted timers
inside the subroutines. In order to get more reliable results cpu-timers should be used instead
of wall-clock-timers.
The main goal was to parallelize the Simra CFD-code as much as possible. Some initial work
on less complex and smaller programs did undoubtedly lead to better results for the Simra
code. At the moment about 65% of the program has been parallelized. When using 16 cores
on the Njord supercomputer, the global speedup resulting from this work is between 2.2 and
2.3 depending on the problem size.
Oppdragsgiver SINTEF ICT Applied Mathematics
codes by utilizing profiling tools and the OpenMP application programming interface (API) for
multi-platform shared-memory parallel programming in C/C++ and Fortran.
Profiling is a very good tool to get an indication on which parts of the program to concentrate
on in order to parallelize scientific codes. In general there will be necessary to get more
detailed information regarding the relative importance of different sub-blocks inside an interesting
subroutine. This information might be obtained by the use of manually inserted timers
inside the subroutines. In order to get more reliable results cpu-timers should be used instead
of wall-clock-timers.
The main goal was to parallelize the Simra CFD-code as much as possible. Some initial work
on less complex and smaller programs did undoubtedly lead to better results for the Simra
code. At the moment about 65% of the program has been parallelized. When using 16 cores
on the Njord supercomputer, the global speedup resulting from this work is between 2.2 and
2.3 depending on the problem size.
Oppdragsgiver SINTEF ICT Applied Mathematics