Skip to content

ee generator slow down in develop #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cipriangal opened this issue Apr 2, 2019 · 7 comments
Closed

ee generator slow down in develop #252

cipriangal opened this issue Apr 2, 2019 · 7 comments

Comments

@cipriangal
Copy link
Contributor

Environment: (where does this bug occur, have you tried other environments)

  • branch (master for latest released): develop/feature-add-SBSbunker
  • revision (HEAD for most recent): v2.1RC
  • OS or system: OSX, RHEL7
  • Special ROOT or Geant4 versions? R6.12, G4.10.4.p1

Steps to reproduce: (give a step by step account of how to trigger the bug)

  1. run macros/runexample.mac with the moller generator
  2. observe slow down compared to v2.0.8

Expected Result: (what do you expect when you execute the steps above)

Same computation time.

Actual Result: (what do you get when you execute the steps above)

3.5x slower simulation.

@wdconinc confirmed this:

I have been able to confirm the slowdown in develop vs master (v2.0.8), by a factor of about 3.5 (884 s vs about 240 s on the same macro and very similar geometry).

Somehow master picks DormandPrince while develop picks Runge Kutta

I think RK just requires 2x more field lookups which is the slowest part of remoll (about 12.5% of total time in field lookups).

Not sure why master picks Dormand Prince when we ask for RK4
@wdconinc
Copy link
Member

wdconinc commented Apr 3, 2019

tl;dr All slowdowns are due to geometry.

Test scenario with same geometry in master and develop

With the macro

/remoll/geometry/setfile geometry/mollerMother_test.gdml
/remoll/physlist/optical/disable
/remoll/physlist/register QGSP_BERT
/run/initialize
/remoll/addfield map_directory/blockyHybrid_rm_3.0.txt
/remoll/addfield map_directory/blockyUpstream_rm_1.1.txt
/remoll/evgen/set moller
/remoll/beamene 11 GeV
/remoll/kryptonite/set true
/remoll/seed 123456
/remoll/filename remollout.root
/run/beamOn 10000

and mollerMother_test.gdml containing

      <physvol>
      <file name="geometry/targetDaughter_test.gdml"/>
      <positionref ref="targetCenter"/>
      <rotationref ref="identity"/>
      </physvol>

      <physvol>
      <file name="geometry/detectorDaughter_test.gdml"/>
      <positionref ref="detectorCenter"/>
      <rotationref ref="identity"/>
      </physvol>

      <physvol>
      <file name="geometry/upstreamDaughter_test.gdml"/>
      <positionref ref="upstreamCenter"/>
      <rotationref ref="identity"/>
      </physvol>

      <physvol>
      <file name="geometry/hybridDaughter_test.gdml"/>
      <positionref ref="hybridCenter"/>
      <rotationref ref="identity"/>
      </physvol> 

where those subsystems all all their 'standard' versions (in master), I get consistent running times between develop and master (technically 2 revs before master since I needed a common base for git bisect):

Good (master^2) ===
real    0m28.837s
user    0m44.183s
sys     0m0.494s
Bad (develop) ===
real    0m26.286s
user    0m42.158s
sys     0m0.534s

Clearly there is a remaining issue, though!

So, then, why for runexample.mac (with 10k events) in develop:

real    9m58.885s
user    18m39.772s
sys     0m1.816s

or about 20 times slower???

Using runexample.mac with minimal geometry

When modifying runexample.mac (with 10k events) in develop to use the minimal geometry above, we get:

real    0m33.114s
user    0m49.102s
sys     0m0.622s

So, yeah, this is all geometry. What's in the default runexample.mac geometry that takes so long? Here is the relevant section:

    <physvol>
      <file name="target/subTargetRegion.gdml"/>
      <positionref ref="targetCenter"/>
    </physvol>

    <physvol>
      <file name="hall/hallDaughter_merged.gdml"/>
      <positionref ref="hallCenter"/>
    </physvol> 

 
    <physvol>
      <file name="upstream/upstreamDaughter_merged.gdml"/>
      <positionref ref="upstreamCenter"/>
    </physvol>

    <physvol>
      <file name="hybrid/hybridDaughter_merged.gdml"/>
      <positionref ref="hybridCenter"/>
    </physvol>    

We'll just look at replacing each one of those individually in the minimal version (and hope for no overlaps).

Impact of larger world

Instead of a long and skinny world, we have to take a bigger world to fit hall etc

     <box lunit="mm" name="boxMother" x="200000" y="200000" z="200000"/>

but by itself this change does not do anything:

real    0m26.063s
user    0m41.205s
sys     0m0.459s

It does have an impact below when adding the hall.

Impact of subTargetRegion

real    0m28.641s
user    0m45.787s
sys     0m0.596s

No impact.

Impact of hallDaughter_merged

(this is an addition, not a replacement of a minimal version by a more complicated "merged" version)

real    7m59.733s
user    14m24.655s
sys     0m1.821s

Slow down by a factor 16 or so.

Impact of upstreamDaughter_merged

real    3m10.299s
user    6m0.994s
sys     0m0.915s

This slows down by a factor 6.

Impact of hybridDaughter_merged

real    0m28.226s
user    0m45.812s
sys     0m0.443s

No impact.

Impact of physics list

Just adding QGSP_BERT_HP get us no meaningful slowdown for the minimal geometry above:

real    0m32.828s
user    0m45.939s
sys     0m0.500s

Hardly an impact.

Impact of stepper

Changing to

Hypothesis

I think in the end this may be a combination of a few things:

  • increase in world volume with actual physical volumes at larger distances (roof of the hall most notably)
  • since we have a magnetic field this requires additional field lookups
  • we are doing field lookups in an inefficient way, calculating stuff we don't end up using
  • upstreamDaughter_merged introduces a lot of boolean solid geometry elements which are known to be slow (see most geometry-related comments in https://twiki.cern.ch/twiki/bin/view/Geant4/Geant4PerformanceTips)

@cipriangal
Copy link
Contributor Author

cipriangal commented Apr 3, 2019

I find this hard to believe ... the branch where I ran into this problem (vacuum_test) I was running a geometry file, merged in develop and noticed the slowdown. It is not clear that epElastic is unaffected (since that one was running pretty quickly before anyway).

That is to say, I didn't make changes to the geometry between the two situations (except to remove the folder location .. "geometry_sculpt/hybrid_*" >> "hybrid_**"

@wdconinc
Copy link
Member

wdconinc commented Apr 3, 2019

All I can say is that I can modify a simple geometry that does 10k in 30 seconds to a more complicated geometry that does 10k in 10 minutes without changing any binary code.

@wdconinc
Copy link
Member

wdconinc commented Apr 3, 2019

Don't you see a similar speedup when you comment out the hall and upstream daughters? Without those, the default runexample.mac (but with 10k events) runs in ~25s. I'm not saying I have a solution...

@cipriangal
Copy link
Contributor Author

some more debugging:

branch: vacuum_test
mollerMother_krypBeamline_3regions_He; This doesn’t have a hall volume.
moller generator;
running with SD/disable_all ; SD/enable 28

Ran 10k events for each of the following.

The geometry as is takes about 158s
The geometry without DS hybrid volume: 143s
The geometry without US volume: 81s

Removing physical volumes from the US file:
no coils: 148s
no US_coll1: 145s
no UScollunion_1: 178s
=> add names to the 3 shielding blocks: 151s
no boxUSShieldColl1_logic: 148s
no boxUSShieldColl2_logic: 152s
no boxUSPolyShield1_logic: 147s
no physical volumes except the “VacuumColl” logicUpstream: 180s
just the world logical volume — remove the logical volumes for everything else as well as physical: 181s
as line above + replace the world “boxUpstream” - subtraction of two boxes - with “boxUpstream_1” -the first box-: 218s
^^^^^ This is stupid!!! .. how does this work ?!? ^^^^^

recompile without parallel world module in remoll.
run full geometry: 141s

recompile with parallel world module in remoll
comment out all detector auxtype info in upstreamLogic: 144s
change Concrete to G4_CONCRETE: 148s
change outside box from Polythene to Concrete”: 144s

@wdconinc
Copy link
Member

wdconinc commented Apr 4, 2019

I really wish geant4 had better geometry profiling tools, like producing a top 10 list of volumes where most time was spent... These studies are painful since they aren't prompt.

@cipriangal
Copy link
Contributor Author

figured out that my particular slow down was related to a shower max that was put in last May .. serves me right for merging in without paying close attention to the changes.

@wdconinc : sorry for casting so many aspersions ... the geometry was at fault as you found and I should have been more careful.

Closing issue as this is just something people have to take care of on their own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants