Description
Bug Report
I am writing, what I would think is a fairly simple usage of AngleSharp[.Css], I am extracting a html table of covid-19 cases etc.. by country. The headers [or other cells] can contain html <br>. INode.Text() [an extension] and INode.TextContent() remove the <br> returning values like “TotalCases”. My implementation parses the 3000ish cells in 4.6 seconds. Using AngleSharp.Css’s ElementExtensions’s string GetInnerText(this IElement element); takes over 8 minutes makeing it unusable.
I assume you must implement Css’s display:none and visibility:hidden. I do not require that functionality, as I do not require an implementation of Javascript. If GetInnerText() can not be sped up a reasonable solution would be to use something like my code with your implementation of html entities such as © etc..
The attached project’s interesting code is in AngleSharpCssSpeedFault.cs.
AngleSharpCssSpeedFault.zip
The last method InnerText(IElement) has a #if to switch between the two implementations of InnerText().
Prerequisites
Run the attached solution.
Description
see above
Steps to Reproduce
- Run the solution
- Change the #if in the last method InnerText()
- Run the solutino again.
Possible Solution
Use my InnerText() but add the expanding of all html & entities as that is missing.