Skip to content

<regex>: regex_traits::transform_primary should yield primary sort keys appropriate for the imbued locale #5444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

muellerj2
Copy link
Contributor

@muellerj2 muellerj2 commented Apr 26, 2025

Fixes #5435. Fixes #5291.

The actual work is done in two new functions __std_regex_transform_primary_char/wchar_t, which are basically 1:1 copies of _Strxfrm() and _Wcsxfrm() but pass different flags to __crtLCMapStringA/W. I also took the liberty to correct the SAL annotations.

__crtLCMapStringA/W are declared in awint.hpp which includes yvals.h. I'm uncertain if this is the best approach, but I undefined _ENFORCE_ONLY_CORE_HEADERS so that awint.hpp can be included.

transform_primary has to check the types of the collate facets using RTTI, so I made the function always returns an empty string when dynamic RTTI is disabled/_CPPRTTI is undefined. The implementation itself is heavily based on collate::do_transform (including the change in #5431). It also needs access to the internals of collate, so I made _Regex_traits a friend of it.

There is a behavior change for the C locale: As I explained in more detail in #5435, the traits requirement in [re.req]/20 is actually misleading, since it is wrong for precisely one locale: the C locale (or the POSIX locale, see the collation order definition here: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02_06). Since the equivalence classes are derived from POSIX and the definition of regex_traits::transform_primary also alludes to "primary sort keys" which indirectly reference terminology from the POSIX standard (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02), I think we should do as POSIX says: "A" should not match [[=a=]].

This has consequences:

  • When I implemented <regex>: Properly parse and match collating symbols and equivalences #5392, I assumed [re.req]/20, so I didn't add any character translation using translate and translate_nocase when parsing equivalences. Now we have to add such logic in _Parser::_Do_ex_class2 to handle potentially case-sensitive sort keys when case-insensitive regexes are used (else "A" would even fail to match [[=A=]]).
  • A number of test cases (some of my own making) failed, because they all assumed that lower and upper case characters are equivalent in the C locale.

Since matching and parsing of equivalences no longer go through collate::transform, related tests no longer have to be skipped under IDL mismatch.

@muellerj2 muellerj2 requested a review from a team as a code owner April 26, 2025 18:40
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Apr 26, 2025
@StephanTLavavej StephanTLavavej self-assigned this Apr 27, 2025
@StephanTLavavej StephanTLavavej added bug Something isn't working LWG Library Working Group issue regex meow is a substring of homeowner labels Apr 27, 2025
@StephanTLavavej
Copy link
Member

Thanks! 😻 I pushed some fixes, the most significant being INT_MAX, please double-check.

⚠️ Note to self: I'll need to perform MSVC-internal changes to add the new file regex.cpp, and I will likely need to push changes to deal with /clr:pure.

@StephanTLavavej StephanTLavavej removed their assignment May 7, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 7, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 9, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working LWG Library Working Group issue regex meow is a substring of homeowner
Projects
Status: Merging
4 participants