Exponential values get deleted after parsing #79

spassarop · 2021-08-29T21:21:31Z

Bug Report

Prerequisites

Can you reproduce the problem in a MWE?
Are you running the latest version of AngleSharp?
Did you check the FAQs to see if that helps you?
Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

When parsing exponential values, the exponent letter (e or E) is internally considered as start of the dimension unit and results in no value. My tests used margin property to reproduce the issue.

Steps to Reproduce

Create a declaration containing a margin-left (could be any margin side) with a value of 1E5px which is one of the many possible valid values. I just used the first test on src\AngleSharp.Css.Tests\Declarations\CssMarginProperty.cs and changed it to my test values. It could be:
```
var snippet = "margin-left: 1E5px";
```
In my project which consumes AngleSharp.Css I used CssParser.ParseStyleSheet() and inspecting the result, the property was missing. This would skip the next two steps.

Parse the snippet with ParseDeclaration.

var property = ParseDeclaration(snippet);

Check that property value is empty with property.HasValue or property.Value.

Expected behavior: Value gets parsed as exponential, like in its input form.

Actual behavior: Value is empty.

Environment details: Windows 10, .NET Core 2.1, AngleSharp.Css 0.16.1, AngleSharp 0.16.0.

Possible Solution

I debugged that specific case. In src\AngleSharp.Css\Parser\CssTokenizer.cs, particularly on NumberRest() method, the logic processes digits until it encounters something else. With 1E5pt, it's the letter E:

var current = GetNext();

while (true)
{
    if (current.IsDigit())
    {
        StringBuffer.Append(current);
    }
    else if (current.IsNameStart())
    {
        StringBuffer.Append(current);
        return Dimension();
    }
    else if (IsValidEscape(current))
    {
        current = GetNext();
        StringBuffer.Append(ConsumeEscape(current));
        return Dimension();
    }
    else
    {
        break;
    }

    current = GetNext();
}

It would be OK if it hits the break and in the posterior switch it detects the letter to continue with the exponential, executing NumberExponential(). But with E, current.IsNameStart() returns true and proceeds returning the dimension which will be invalid. I checked IsNameStart() on AngleSharp:

public static Boolean IsNameStart(this Char c) => c.IsNonAscii() || c.IsUppercaseAscii() || c.IsLowercaseAscii() || c == Symbols.Underscore;

This will always be true for e or E and the exponential logic will never execute. I checked NumberFraction() in case the library compels you to use an exponential format as \d\.\d+[eE][+-]\d+, however, it happens the same.

My suggestion is to not only check name start but also for these two special letters. I know that would conflict with the em dimension for example, that would require looking forward one position to find a digit or not in my opinion.

As last comment, it also happens that small numbers like 0.00001 get transformed into exponentials like 1E-05 (this output format led me to try with 1E5 as input). I really prefer to preserve the input format as I use this on OWASP AntiSamy .NET which is part of OWASP AntiSamy community project and this issue came up in the Java project. I later checked that it happened too on .NET with AngleSharp.Css. My question here is: should I report this as a separate feature request, bug, or it's already known as a "won't fix"?

Many thanks beforehand.

The text was updated successfully, but these errors were encountered:

FlorianRappl · 2021-08-29T23:03:05Z

Yes, you are right that the scientific notation is right now not supported. It would be great if we could support it - and indeed, it would require a forward-lookup, which should not be a problem.

Regarding the output: I think this should be handled in a separate issue. Ideally, the stringification of the units tries to find an original representation and uses this. Then it should fall back to the "lengthy" (i.e., non-scientific) representation imho.

spassarop · 2021-08-30T23:11:16Z

Great, then I'll create a separate bug. Both are equally important to me as user need them to use exponentials correctly and to expect the same values when reading the output. Independently of the fix complexity assuming it can be done.

FlorianRappl · 2021-11-04T23:29:56Z

Available in devel / preview version (upcoming 0.16.2).

spassarop · 2021-11-14T14:57:13Z

Thanks for providing a fix! What I meant though with this:

Expected behavior: Value gets parsed as exponential, like in its input form.

Was to maintain the exponential form. Of course, having the exponential value converted is better than having it removed but my concern was different. Is this possible to implement?

FlorianRappl · 2021-11-14T15:36:10Z

Is this possible to implement?

Unfortunately no. That information is lost in the process.

spassarop added the bug label Aug 29, 2021

FlorianRappl added declaration parser labels Nov 4, 2021

FlorianRappl added a commit that referenced this issue Nov 4, 2021

Fixed exponential handling #79

5294c40

FlorianRappl closed this as completed Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Exponential values get deleted after parsing #79

Exponential values get deleted after parsing #79

spassarop commented Aug 29, 2021

FlorianRappl commented Aug 29, 2021

Uh oh!

spassarop commented Aug 30, 2021

Uh oh!

FlorianRappl commented Nov 4, 2021

Uh oh!

spassarop commented Nov 14, 2021

Uh oh!

FlorianRappl commented Nov 14, 2021

Uh oh!

Uh oh!

Exponential values get deleted after parsing #79

Exponential values get deleted after parsing #79

Comments

spassarop commented Aug 29, 2021

Bug Report

Prerequisites

Description

Steps to Reproduce

Possible Solution

FlorianRappl commented Aug 29, 2021

Uh oh!

spassarop commented Aug 30, 2021

Uh oh!

FlorianRappl commented Nov 4, 2021

Uh oh!

spassarop commented Nov 14, 2021

Uh oh!

FlorianRappl commented Nov 14, 2021

Uh oh!