-
Notifications
You must be signed in to change notification settings - Fork 1.5k
<regex>
: regex_traits::transform_primary
should yield primary sort keys appropriate for the imbued locale
#5444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
muellerj2
wants to merge
13
commits into
microsoft:main
Choose a base branch
from
muellerj2:regex-transform_primary
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+289
−31
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…t keys appropriate for the imbued locale
muellerj2
commented
May 1, 2025
StephanTLavavej
approved these changes
May 7, 2025
Thanks! 😻 I pushed some fixes, the most significant being
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
StephanTLavavej
added a commit
to StephanTLavavej/STL
that referenced
this pull request
May 9, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #5435. Fixes #5291.
The actual work is done in two new functions
__std_regex_transform_primary_char/wchar_t
, which are basically 1:1 copies of_Strxfrm()
and_Wcsxfrm()
but pass different flags to__crtLCMapStringA/W
. I also took the liberty to correct the SAL annotations.__crtLCMapStringA/W
are declared inawint.hpp
which includesyvals.h
. I'm uncertain if this is the best approach, but I undefined_ENFORCE_ONLY_CORE_HEADERS
so thatawint.hpp
can be included.transform_primary
has to check the types of the collate facets using RTTI, so I made the function always returns an empty string when dynamic RTTI is disabled/_CPPRTTI
is undefined. The implementation itself is heavily based oncollate::do_transform
(including the change in #5431). It also needs access to the internals ofcollate
, so I made_Regex_traits
a friend of it.There is a behavior change for the C locale: As I explained in more detail in #5435, the traits requirement in [re.req]/20 is actually misleading, since it is wrong for precisely one locale: the C locale (or the POSIX locale, see the collation order definition here: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02_06). Since the equivalence classes are derived from POSIX and the definition of
regex_traits::transform_primary
also alludes to "primary sort keys" which indirectly reference terminology from the POSIX standard (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02), I think we should do as POSIX says: "A" should not match[[=a=]]
.This has consequences:
<regex>
: Properly parse and match collating symbols and equivalences #5392, I assumed [re.req]/20, so I didn't add any character translation usingtranslate
andtranslate_nocase
when parsing equivalences. Now we have to add such logic in_Parser::_Do_ex_class2
to handle potentially case-sensitive sort keys when case-insensitive regexes are used (else "A" would even fail to match[[=A=]]
).Since matching and parsing of equivalences no longer go through
collate::transform
, related tests no longer have to be skipped under IDL mismatch.