Skip to content

Commit 3f722a9

Browse files
authored
Unicode categories breaking change (#20605)
1 parent ea49f32 commit 3f722a9

File tree

3 files changed

+66
-0
lines changed

3 files changed

+66
-0
lines changed

docs/core/compatibility/3.1-5.0.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,9 +271,14 @@ If you're migrating from version 3.1 of .NET Core, ASP.NET Core, or EF Core to v
271271

272272
## Globalization
273273

274+
- [Unicode category changed for some Latin-1 characters](#unicode-category-changed-for-some-latin-1-characters)
274275
- [StringInfo and TextElementEnumerator are now UAX29-compliant](#stringinfo-and-textelementenumerator-are-now-uax29-compliant)
275276
- [Globalization APIs use ICU libraries on Windows](#globalization-apis-use-icu-libraries-on-windows)
276277

278+
[!INCLUDE [unicode-categories-for-latin1-chars](../../../includes/core-changes/globalization/5.0/unicode-categories-for-latin1-chars.md)]
279+
280+
***
281+
277282
[!INCLUDE [uax29-compliant-grapheme-enumeration](../../../includes/core-changes/globalization/5.0/uax29-compliant-grapheme-enumeration.md)]
278283

279284
***

docs/core/compatibility/globalization.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,17 @@ The following breaking changes are documented on this page:
99

1010
| Breaking change | Version introduced |
1111
| - | :-: |
12+
| [Unicode category changed for some Latin-1 characters](#unicode-category-changed-for-some-latin-1-characters) | 5.0 |
1213
| [Globalization APIs use ICU libraries on Windows](#globalization-apis-use-icu-libraries-on-windows) | 5.0 |
1314
| [StringInfo and TextElementEnumerator are now UAX29-compliant](#stringinfo-and-textelementenumerator-are-now-uax29-compliant) | 5.0 |
1415
| ["C" locale maps to the invariant locale](#c-locale-maps-to-the-invariant-locale) | 3.0 |
1516

1617
## .NET 5.0
1718

19+
[!INCLUDE [unicode-categories-for-latin1-chars](../../../includes/core-changes/globalization/5.0/unicode-categories-for-latin1-chars.md)]
20+
21+
***
22+
1823
[!INCLUDE [icu-globalization-api](../../../includes/core-changes/globalization/5.0/icu-globalization-api.md)]
1924

2025
***
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
### Unicode category changed for some Latin-1 characters
2+
3+
<xref:System.Char> methods now return the correct Unicode category for characters in the Latin-1 range. The category matches that of the Unicode standard.
4+
5+
#### Change description
6+
7+
In previous .NET versions, <xref:System.Char> methods used a fixed list of Unicode categories for characters in the Latin-1 range. However, the Unicode standard has changed the categories of some of these characters since those APIs were implemented, creating a discrepancy. In addition, there was also a discrepancy between <xref:System.Char> and <xref:System.Globalization.CharUnicodeInfo> APIs, which follow the Unicode standard. In .NET 5.0 and later versions, <xref:System.Char> methods use and return the Unicode category that matches the Unicode standard for all characters.
8+
9+
The following table shows the characters whose Unicode categories have changed in .NET 5.0:
10+
11+
| Character | Unicode category<br>in previous .NET versions | Unicode category<br>in .NET 5.0 and later versions |
12+
|:------------:|:---------------------------------------------:|:--------------------------------------------------:|
13+
| § (\u00a7) | `OtherSymbol` | `OtherPunctuation` |
14+
| ª (\u00aa) | `LowercaseLetter` | `OtherLetter` |
15+
| SHY (\u00ad) | `DashPunctuation` | `Format` |
16+
| ¶ (\u00b6) | `OtherSymbol` | `OtherPunctuation` |
17+
| º (\u00ba) | `LowercaseLetter` | `OtherLetter` |
18+
19+
#### Version introduced
20+
21+
.NET 5.0 RC1
22+
23+
#### Recommended action
24+
25+
If you have any code that gets the Unicode character category by using the <xref:System.Char> class and assumes the category will never change, you may need to update it.
26+
27+
#### Reason for change
28+
29+
This change was made so that the categories returned by the <xref:System.Char> type are consistent with both the Unicode standard and the <xref:System.Globalization.CharUnicodeInfo> type.
30+
31+
#### Category
32+
33+
- Core .NET libraries
34+
- Globalization
35+
36+
#### Affected APIs
37+
38+
- <xref:System.Char.GetUnicodeCategory%2A?displayProperty=fullName>
39+
- <xref:System.Char.IsLetter%2A?displayProperty=fullName>
40+
- <xref:System.Char.IsPunctuation%2A?displayProperty=fullName>
41+
- <xref:System.Char.IsSymbol%2A?displayProperty=fullName>
42+
- <xref:System.Char.IsLower%2A?displayProperty=fullName>
43+
44+
Additionally, any class that depends on <xref:System.Char> to obtain the Unicode character category, for example, <xref:System.Text.RegularExpressions.Regex>, is affected by this change.
45+
46+
<!--
47+
48+
#### Affected APIs
49+
50+
- `Overload:System.Char.GetUnicodeCategory`
51+
- `Overload:System.Char.IsLetter`
52+
- `Overload:System.Char.IsPunctuation`
53+
- `Overload:System.Char.IsSymbol`
54+
- `Overload:System.Char.IsLower`
55+
56+
-->

0 commit comments

Comments
 (0)