Skip to content

Exponential values get deleted after parsing #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks done
spassarop opened this issue Aug 29, 2021 · 5 comments
Closed
5 tasks done

Exponential values get deleted after parsing #79

spassarop opened this issue Aug 29, 2021 · 5 comments

Comments

@spassarop
Copy link

Bug Report

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp?
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

When parsing exponential values, the exponent letter (e or E) is internally considered as start of the dimension unit and results in no value. My tests used margin property to reproduce the issue.

Steps to Reproduce

  1. Create a declaration containing a margin-left (could be any margin side) with a value of 1E5px which is one of the many possible valid values. I just used the first test on src\AngleSharp.Css.Tests\Declarations\CssMarginProperty.cs and changed it to my test values. It could be:
    var snippet = "margin-left: 1E5px";
    In my project which consumes AngleSharp.Css I used CssParser.ParseStyleSheet() and inspecting the result, the property was missing. This would skip the next two steps.
  2. Parse the snippet with ParseDeclaration.
    var property = ParseDeclaration(snippet);
  3. Check that property value is empty with property.HasValue or property.Value.

Expected behavior: Value gets parsed as exponential, like in its input form.

Actual behavior: Value is empty.

Environment details: Windows 10, .NET Core 2.1, AngleSharp.Css 0.16.1, AngleSharp 0.16.0.

Possible Solution

I debugged that specific case. In src\AngleSharp.Css\Parser\CssTokenizer.cs, particularly on NumberRest() method, the logic processes digits until it encounters something else. With 1E5pt, it's the letter E:

var current = GetNext();

while (true)
{
    if (current.IsDigit())
    {
        StringBuffer.Append(current);
    }
    else if (current.IsNameStart())
    {
        StringBuffer.Append(current);
        return Dimension();
    }
    else if (IsValidEscape(current))
    {
        current = GetNext();
        StringBuffer.Append(ConsumeEscape(current));
        return Dimension();
    }
    else
    {
        break;
    }

    current = GetNext();
}

It would be OK if it hits the break and in the posterior switch it detects the letter to continue with the exponential, executing NumberExponential(). But with E, current.IsNameStart() returns true and proceeds returning the dimension which will be invalid. I checked IsNameStart() on AngleSharp:

public static Boolean IsNameStart(this Char c) => c.IsNonAscii() || c.IsUppercaseAscii() || c.IsLowercaseAscii() || c == Symbols.Underscore;

This will always be true for e or E and the exponential logic will never execute. I checked NumberFraction() in case the library compels you to use an exponential format as \d\.\d+[eE][+-]\d+, however, it happens the same.

My suggestion is to not only check name start but also for these two special letters. I know that would conflict with the em dimension for example, that would require looking forward one position to find a digit or not in my opinion.

As last comment, it also happens that small numbers like 0.00001 get transformed into exponentials like 1E-05 (this output format led me to try with 1E5 as input). I really prefer to preserve the input format as I use this on OWASP AntiSamy .NET which is part of OWASP AntiSamy community project and this issue came up in the Java project. I later checked that it happened too on .NET with AngleSharp.Css. My question here is: should I report this as a separate feature request, bug, or it's already known as a "won't fix"?

Many thanks beforehand.

@spassarop spassarop added the bug label Aug 29, 2021
@FlorianRappl
Copy link
Contributor

Yes, you are right that the scientific notation is right now not supported. It would be great if we could support it - and indeed, it would require a forward-lookup, which should not be a problem.

Regarding the output: I think this should be handled in a separate issue. Ideally, the stringification of the units tries to find an original representation and uses this. Then it should fall back to the "lengthy" (i.e., non-scientific) representation imho.

@spassarop
Copy link
Author

Great, then I'll create a separate bug. Both are equally important to me as user need them to use exponentials correctly and to expect the same values when reading the output. Independently of the fix complexity assuming it can be done.

@FlorianRappl
Copy link
Contributor

Available in devel / preview version (upcoming 0.16.2).

@spassarop
Copy link
Author

Thanks for providing a fix! What I meant though with this:

Expected behavior: Value gets parsed as exponential, like in its input form.

Was to maintain the exponential form. Of course, having the exponential value converted is better than having it removed but my concern was different. Is this possible to implement?

@FlorianRappl
Copy link
Contributor

Is this possible to implement?

Unfortunately no. That information is lost in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants