high overhead on blue gene #85
Description
Hpcrun seems to add a high overhead on Blue Gene. Master adds more
than 2x for the openmp solve phase in amg2006. The ompt-tr4 branch
with llvm libomp runtime adds even more.
This is with AMG 2006 on mira/cetus at ANL, 8 nodes, 8 MPI ranks,
16 openmp threads, problem size (-r) 16,16,16. AMG compiled with gnu,
flags '-g -O2', run with WALLCLOCK at 8500 (118 samples/sec).
AMG 2006 native, no toolkit.
wall clock time = 13.350482 seconds
wall clock time = 205.818907 seconds
wall clock time = 16.934752 seconds
Toolkit master, regular libgomp.
wall clock time = 31.799200 seconds
wall clock time = 241.473654 seconds
wall clock time = 43.120992 seconds
Branch ompt-tr4 with llvm libomp runtime and OMP_IDLE.
wall clock time = 35.795240 seconds
wall clock time = 247.430433 seconds
wall clock time = 72.394108 seconds
That's about 2.5x for phases 1 and 3 with master and over 4x for ompt.