|
| 1 | +### Unicode category changed for some Latin-1 characters |
| 2 | + |
| 3 | +<xref:System.Char> methods now return the correct Unicode category for characters in the Latin-1 range. The category matches that of the Unicode standard. |
| 4 | + |
| 5 | +#### Change description |
| 6 | + |
| 7 | +In previous .NET versions, <xref:System.Char> methods used a fixed list of Unicode categories for characters in the Latin-1 range. However, the Unicode standard has changed the categories of some of these characters since those APIs were implemented, creating a discrepancy. In addition, there was also a discrepancy between <xref:System.Char> and <xref:System.Globalization.CharUnicodeInfo> APIs, which follow the Unicode standard. In .NET 5.0 and later versions, <xref:System.Char> methods use and return the Unicode category that matches the Unicode standard for all characters. |
| 8 | + |
| 9 | +The following table shows the characters whose Unicode categories have changed in .NET 5.0: |
| 10 | + |
| 11 | +| Character | Unicode category<br>in previous .NET versions | Unicode category<br>in .NET 5.0 and later versions | |
| 12 | +|:------------:|:---------------------------------------------:|:--------------------------------------------------:| |
| 13 | +| § (\u00a7) | `OtherSymbol` | `OtherPunctuation` | |
| 14 | +| ª (\u00aa) | `LowercaseLetter` | `OtherLetter` | |
| 15 | +| SHY (\u00ad) | `DashPunctuation` | `Format` | |
| 16 | +| ¶ (\u00b6) | `OtherSymbol` | `OtherPunctuation` | |
| 17 | +| º (\u00ba) | `LowercaseLetter` | `OtherLetter` | |
| 18 | + |
| 19 | +#### Version introduced |
| 20 | + |
| 21 | +.NET 5.0 RC1 |
| 22 | + |
| 23 | +#### Recommended action |
| 24 | + |
| 25 | +If you have any code that gets the Unicode character category by using the <xref:System.Char> class and assumes the category will never change, you may need to update it. |
| 26 | + |
| 27 | +#### Reason for change |
| 28 | + |
| 29 | +This change was made so that the categories returned by the <xref:System.Char> type are consistent with both the Unicode standard and the <xref:System.Globalization.CharUnicodeInfo> type. |
| 30 | + |
| 31 | +#### Category |
| 32 | + |
| 33 | +- Core .NET libraries |
| 34 | +- Globalization |
| 35 | + |
| 36 | +#### Affected APIs |
| 37 | + |
| 38 | +- <xref:System.Char.GetUnicodeCategory%2A?displayProperty=fullName> |
| 39 | +- <xref:System.Char.IsLetter%2A?displayProperty=fullName> |
| 40 | +- <xref:System.Char.IsPunctuation%2A?displayProperty=fullName> |
| 41 | +- <xref:System.Char.IsSymbol%2A?displayProperty=fullName> |
| 42 | +- <xref:System.Char.IsLower%2A?displayProperty=fullName> |
| 43 | + |
| 44 | +Additionally, any class that depends on <xref:System.Char> to obtain the Unicode character category, for example, <xref:System.Text.RegularExpressions.Regex>, is affected by this change. |
| 45 | + |
| 46 | +<!-- |
| 47 | +
|
| 48 | +#### Affected APIs |
| 49 | +
|
| 50 | +- `Overload:System.Char.GetUnicodeCategory` |
| 51 | +- `Overload:System.Char.IsLetter` |
| 52 | +- `Overload:System.Char.IsPunctuation` |
| 53 | +- `Overload:System.Char.IsSymbol` |
| 54 | +- `Overload:System.Char.IsLower` |
| 55 | +
|
| 56 | +--> |
0 commit comments