* use producer consumer pattern for parsing and merging #262

Lorilatschki · 2019-07-04T12:51:03Z

added setting to define parallel merge count
optimized open cover parsing
added timestamp to logging
made logging to console non blocking

* added setting to define parallel merge count * optimized open cover parsing * added timestamp to logging * made logging to console non blocking

…released

…atically

danielpalme · 2019-07-08T18:24:48Z

I will review and merge this PR within the next 1-2 weeks.

Lorilatschki · 2019-07-08T21:12:33Z

I have added a producer which first reads the content of each file and adds this content to a blocking queue. This can be controled by 'NumberOfReportsReadOnMemoryInParallel'. A set of producer (amount can be controled by NumberOfReportsParsedInParallel) listens to this collection an creates ParserResults out of that. The parser results are also added to a second blocking collection. A set of consumer (can be controled by 'NumberOfReportsMergedInParallel') listens to that queue and merges the content of the previously created parser results. At the end each parser result of the merge consumer are als merged together to a final result.
Nether the less, a have played around with this parameters with ~100 OpenCover Xml files, each about ~100MB file size. The algorithm does not really scale as expected. I tested this with a 20 logical core pc, 64GB RAM. The coverage files are located at a SSD HDD with ~500MB read and write speed per second! I did not found any configuration (numberOfReportsMergedInParallel, numberOfReportsParsedInParallel, numberOfReportsReadOnMemoryInParallel) which has more than 33 percent cpu workload. There must be a bottleneck which we currently doesn't know.
The file content has been fully loaded into memory, so this should not be a shared resource. Afterwards the parsers are running parallel, each having a xdocument which is unique and should not have no shared access between other parsers. The finished results are also merged in a separat task which in fact is also not as shared resource, since the amount of merge tasks can be configured by a setting.
Maybe you have any idea where the bollteneck is. Meanwhile a play around with some settings, dotTrace and dotMemory. I let you know if I have any news for you.

danielpalme · 2019-07-15T20:29:15Z

I justed started to review and merge your PR.

Some notes so far:

In Assembly.cs you are now using a ConcurrentBag. This was a ConcurrentBag in the past until I switched to a simple List. Reason: The ConcurrentBag caused a memory leak (see your issue: Usage of multiple cores during report generation #126)
PerformanceInfo.cs uses psapi.dll internally. This will probably not work on Linux/macOS. I'm not sure if ReportGenerator should force GC at all. Perhaps the memory usage is caused by the `ConcurrentBag?! If we really need to force GC, we have to find a platform independent solution to calculate memory usage.

Lorilatschki · 2019-07-16T05:37:09Z

Regarding the ConcuentBag, please revert it. We are investigating in finding both, the bottleneck and the memory leak. The GC.Collect was just helpful during analyzing. This should not be merged to master. We are currently analyzing the XDocument regarding performance and memory consumption.

…

Am 15.07.2019 um 22:29 schrieb Daniel Palme ***@***.***>: I justed started to review and merge your PR. Some notes so far: In Assembly.cs you are now using a ConcurrentBag. This was a ConcurrentBag in the past until I switched to a simple List. Reason: The ConcurrentBag caused a memory leak (see your issue: #126) PerformanceInfo.cs uses psapi.dll internally. This will probably not work on Linux/macOS. I'm not sure if ReportGenerator should force GC. Perhaps the memory usage is caused by the `ConcurrentBag?! If we really not to force GC, we have to find a platform independent solution to calculate memory usage. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

danielpalme · 2019-07-16T19:05:17Z

Ok, I reverted those changes.

Now I'm looking at CoverageReportParser.
First you read the whole file into memory: File.ReadAllText(reportFile) and then you use XDocument.Parse

I'm not sure, but I guess that will consume quite a lot extra memory in comparison to XDocument.Load.

Lorilatschki · 2019-07-17T06:09:17Z

This was just a try to get the cpu workload to 100%. But what ever we tried, we were not able to get it over 33%. But even when loading all the content in memory, creating separat xDocuments out of it and parse them does not lead to a scaling algorithm. You can also revert this change as this seems not to be the bottleneck during file parsing.
We are still analyzing with the help of dotTrace and dotMemory.

This reverts commit 50e01ff.

This reverts commit d84ccb8.

Lorilatschki · 2019-07-22T13:27:06Z

First analyzing results:

reverted gc.collect, the File.ReadAllText() and performance message class by b9836c8, 89920d8, 0be3ec8
because the parsing is done in parallel, also the garbage collection should be done in parallel (inside the traces we have seen that almost 25% of the overall time was consumed by gc.wait()). Therefor we added a app setting file for the core f777c80 to allow multi threaded gc collection by the system
local measurements shown the parsing is now much faster. I have started some coverage builds on our power agents. I will post the previouse and new execution time tomorrow.

danielpalme · 2019-07-22T20:22:18Z

Let me know when you have first results. I'm interested in the numbers!
Also let me know once you have finished your changes, I will then merge into the master branch.

Lorilatschki · 2019-07-24T08:52:30Z

First figures:
machine settings (local PC): 64GB RAM, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, Samsung SSD 850 PRO 512GB
1502 files, all about ~100MB OpenCover output

non parallel run with v4.0.11.0: ~8800 seconds
NumberOfReportsParsedInParallel=20, NumberOfReportsMergedInParallel=2: ~3702 seconds
NumberOfReportsParsedInParallel=10, NumberOfReportsMergedInParallel=2: ~3514 seconds

Since the parallel foreach inside the parser scales automatically, the optimal settings could slightly differ depending on the physical hardware the generator is executed on. As the figures above shown, there is a peek from which the algorithm not scale anymore, because the harddisk seems to be penetrated to much. Maybe I need to introduce the parallel foreach restriction again, in order to only scale by the number of ParserResult Producers. I will continue playing around with the settings.

The figures from our build agents are not yet finished. Will post this until friday. We are still looking for performance inprovements inside the OpenCoverParser.

For now we are finished with this pull request. It would be great if you can review it and merge it to master as well as providing a new nuget package.

Attached you find a dotTrace measurement ReportGenerator.zip.
One hotspot is for instance the method OpenCoverParser.ProcessClass(...). We are looking for improvements inside this method additionally.

Lorilatschki · 2019-07-24T12:36:57Z

Update 1: Found some improvements in the opencover parser itself. 342f329

NumberOfReportsParsedInParallel=10, NumberOfReportsMergedInParallel=2: ~2858 seconds

Lorilatschki · 2019-07-25T07:46:39Z

Update 2: Enabled historic file parse to run in parallel fabbe42

Lorilatschki · 2019-07-25T08:45:24Z

Update 3: Found another easy performance improvement inside the histroy parser 7838be3

danielpalme · 2019-09-01T09:19:49Z

I just merged your changes into the performance branch.

If you plan to make further changes, you should first merge performance branch into your master branch.

I'm currently planning to include your changes in version 4.3 of ReportGenerator.
I want to publish this release within September, once after .NET Core 3.0 has been released.

danielpalme · 2019-09-21T13:08:21Z

Published a first version containing the performance improvements:
https://www.nuget.org/packages/ReportGenerator/4.3.0-rc1
https://www.nuget.org/packages/dotnet-reportgenerator-globaltool/4.3.0-rc1
https://www.nuget.org/packages/dotnet-reportgenerator-cli/4.3.0-rc1
https://www.nuget.org/packages/ReportGenerator.Core/4.3.0-rc1

Lorilatschki · 2019-09-25T11:21:43Z

Thanks for that. First run's passed so far!
for ~1500 open cover files, each ~100MB it take now ~1h with settings 10 parsed in parallel, 2 merged in parallel.
👍

nikosatwork · 2019-10-14T11:09:52Z

Hi @danielpalme @LymanZerga - thanks for your work!

Regarding the removal of LoggerFactory.Configure and innerFactory in this PR, it seems there is no other way to extend the ILoggerFactory and ILogger anymore. Have I missed something? Thanks!

https://github.com/danielpalme/ReportGenerator/pull/262/files#diff-b0d795d33adcd6a6f927cdacc789352bL47

danielpalme · 2019-10-14T11:15:33Z

@nikosatwork
Correct. I'm not sure if that was intended.

Lorilatschki · 2019-10-14T11:27:35Z

this was not intended. @daniel: could you please add this back to the master branch?

…

Am 14.10.2019 um 13:15 schrieb Daniel Palme ***@***.***>: @nikosatwork Correct. I'm not sure if that was intended. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

danielpalme · 2019-10-14T17:41:09Z

Done (see 1f42d7b)

nikosatwork · 2019-10-14T20:31:11Z

Thanks. I guess this is going to v4.3.1?

* use producer consumer pattern for parsing and merging

71c4849

* added setting to define parallel merge count * optimized open cover parsing * added timestamp to logging * made logging to console non blocking

danielpalme added the enhancement label Jul 4, 2019

Lorilatschki added 6 commits July 4, 2019 20:31

* fix issue with multiple gc.collect calls until the memory has been …

66f1626

…released

- reverted foreach restrictions because parallel foreach scales autom…

77d7385

…atically

- fixed generator issue

5a9d28a

- fixed possible divide by zero issue

0088076

- fixed readline issue and spelling issue

0282631

- fixed tests

d84ccb8

- restrict memory load

50e01ff

Lorilatschki and others added 5 commits July 22, 2019 07:45

Revert "- restrict memory load"

89920d8

This reverts commit 50e01ff.

Revert "- fixed tests"

0be3ec8

This reverts commit d84ccb8.

- revert

b9836c8

add gc setting and read with xmlreader

f777c80

Merge branch 'master' into master

7342aff

Lorilatschki added 3 commits July 23, 2019 07:20

add gc settings to all executable

e5e5060

Merge branch 'master' of https://github.com/LymanZerga/ReportGenerator

ca3c53a

remove console writeline

e15d376

enhancements in class parsing for opencover

342f329

made historic parsing run in parallel

fabbe42

Performance improvement for history parser

7838be3

Lorilatschki closed this Sep 25, 2019

Lorilatschki reopened this Sep 25, 2019

danielpalme merged commit 7838be3 into danielpalme:master Sep 25, 2019

danielpalme added a commit that referenced this pull request Oct 14, 2019

#262 Readded Configure(ILoggerFactory factory) method

1f42d7b

Uh oh!

* use producer consumer pattern for parsing and merging #262

* use producer consumer pattern for parsing and merging #262

Uh oh!

Conversation

Lorilatschki commented Jul 4, 2019

Uh oh!

danielpalme commented Jul 8, 2019

Uh oh!

Lorilatschki commented Jul 8, 2019

Uh oh!

danielpalme commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorilatschki commented Jul 16, 2019 via email

Uh oh!

danielpalme commented Jul 16, 2019

Uh oh!

Lorilatschki commented Jul 17, 2019

Uh oh!

Lorilatschki commented Jul 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielpalme commented Jul 22, 2019

Uh oh!

Lorilatschki commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorilatschki commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorilatschki commented Jul 25, 2019

Uh oh!

Lorilatschki commented Jul 25, 2019

Uh oh!

danielpalme commented Sep 1, 2019

Uh oh!

danielpalme commented Sep 21, 2019

Uh oh!

Lorilatschki commented Sep 25, 2019

Uh oh!

nikosatwork commented Oct 14, 2019

Uh oh!

danielpalme commented Oct 14, 2019

Uh oh!

Lorilatschki commented Oct 14, 2019 via email

Uh oh!

danielpalme commented Oct 14, 2019

Uh oh!

nikosatwork commented Oct 14, 2019

Uh oh!

Uh oh!

danielpalme commented Jul 15, 2019 •

edited

Loading

Lorilatschki commented Jul 22, 2019 •

edited

Loading

Lorilatschki commented Jul 24, 2019 •

edited

Loading

Lorilatschki commented Jul 24, 2019 •

edited

Loading