Skip to content

docs: fix spelling grammar and missing words in clr-code-guide.md #112222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 6, 2025
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions docs/coding-guidelines/clr-code-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Rules can either be imposed by invariants or team policy.

"Team Policy" rules are rules we've established as "good practices" – for example, the rule that every function must declare a contract. While a missing contract here or there isn't a shipstopper, violating these rules is still heavily frowned upon and you should expect a bug filed against you unless you can supply a very compelling reason why your code needs an exemption.

Team policy rules are not necessarily less important than invariants. For example, the rule to use [safemath.h][safemath.h] rather that coding your own integer overflow check is a policy rule. But because it deals with security, we'd probably treat it as higher priority than a very obscure (non-security) related bug.
Team policy rules are not necessarily less important than invariants. For example, the rule to use [safemath.h][safemath.h] rather than coding your own integer overflow check is a policy rule. But because it deals with security, we'd probably treat it as higher priority than a very obscure (non-security) related bug.

[safemath.h]: https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/safemath.h

Expand Down Expand Up @@ -185,7 +185,7 @@ Here's how to fix our buggy code fragment.
GCPROTECT_END();
}

Notice the addition of the line GCPROTECT_BEGIN(a). GCPROTECT_BEGIN is a macro whose argument is any OBJECTREF-typed storage location (it has to be an expression that can you can legally apply the address-of (&) operator to.) GCPROTECT_BEGIN tells the GC two things:
Notice the addition of the line GCPROTECT_BEGIN(a). GCPROTECT_BEGIN is a macro whose argument is any OBJECTREF-typed storage location (it has to be an expression that you can legally apply the address-of (&) operator to.) GCPROTECT_BEGIN tells the GC two things:

- The GC is not to discard any object referred to by the reference stored in local "a".
- If the GC moves the object referred to by "a", it must update "a" to point to the new location.
Expand All @@ -196,7 +196,7 @@ Note that we didn't similarly protect 'b" because the caller has no use for "b"

Having said that, no one should complain if you play it safe and GCPROTECT "b" as well. You never know when someone might add code later that makes the protection necessary.

Every GCPROTECT_BEGIN must have a matching GCPROTECT_END, which terminates the protected status of "a". As an additional safeguard, GCPROTECT_END overwrites "a" with garbage so that any attempt to use "a" afterward will fault. GCPROTECT_BEGIN introduces a new C scoping level that GCPROTECT_END closes, so if you use one without the other, you'll probably experience severe build errors.
Every GCPROTECT_BEGIN must have a matching GCPROTECT_END, which terminates the protected status of "a". As an additional safeguard, GCPROTECT_END overwrites "a" with garbage so that any attempt to use "a" afterward will fail. GCPROTECT_BEGIN introduces a new C scoping level that GCPROTECT_END closes, so if you use one without the other, you'll probably experience severe build errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verb fault means the action of reporting exception/fault in computer


### <a name="2.1.4"></a>2.1.4 Don't do nonlocal returns from within GCPROTECT blocks

Expand Down Expand Up @@ -250,7 +250,7 @@ The following code fragment shows how handles are used. In practice, of course,
{
MethodTable *pMT = g_pObjectClass->GetMethodTable();

//Another way is to use handles. Handles would be
// Another way is to use handles. Handles would be
// wasteful for a case this simple but are useful
// if you need to protect something for the long
// term.
Expand All @@ -270,7 +270,7 @@ The following code fragment shows how handles are used. In practice, of course,
There are actually several flavors of handles. This section lists the most common ones. ([objecthandle.h][objecthandle.h] contains the complete list.)

- **HNDTYPE_STRONG**: This is the default and acts like a normal reference. Created by calling CreateHandle(OBJECTREF).
- **HNDTYPE_WEAK_LONG**: Tracks an object as long as one strong reference to its exists but does not itself prevent the object from being GC'd. Created by calling CreateWeakHandle(OBJECTREF).
- **HNDTYPE_WEAK_LONG**: Tracks an object as long as one strong reference to it exists but does not itself prevent the object from being GC'd. Created by calling CreateWeakHandle(OBJECTREF).
- **HNDTYPE_PINNED**: Pinned handles are strong handles which have the added property that they prevent an object from moving during a garbage collection cycle. This is useful when passing a pointer to object innards out of the runtime while GC may be enabled.

NOTE: PINNING AN OBJECT IS EXPENSIVE AS IT PREVENTS THE GC FROM ACHIEVING OPTIMAL PACKING OF OBJECTS DURING EPHEMERAL COLLECTIONS. THIS TYPE OF HANDLE SHOULD BE USED SPARINGLY!
Expand Down Expand Up @@ -298,7 +298,7 @@ Now, while the compiler can generate any valid code for this, it's very likely i
push [A] ;push argument to DoSomething
call DoSomething

This is supposed to be work correctly in every case, according to the semantics of GCPROTECT. However, suppose just after the "push" instruction, another thread gets the time-slice, starts a GC and moves the object A. The local variable A will be correctly updated – but the copy of A which we just pushed as an argument to DoSomething() will not. Hence, DoSomething() will receive a pointer to the old location and crash. Clearly, preemptive GC alone will not suffice for the CLR.
This is supposed to work correctly in every case, according to the semantics of GCPROTECT. However, suppose just after the "push" instruction, another thread gets the time-slice, starts a GC and moves the object A. The local variable A will be correctly updated – but the copy of A which we just pushed as an argument to DoSomething() will not. Hence, DoSomething() will receive a pointer to the old location and crash. Clearly, preemptive GC alone will not suffice for the CLR.

How about the alternative: cooperative GC? With cooperative GC, the above problem doesn't occur and GCPROTECT works as intended. Unfortunately, the CLR has to interop with legacy unmanaged code as well. Suppose a managed app calls out to the Win32 MessageBox api which waits for the user to click a button before returning. Until the user does this, all managed threads in the same process are blocked from GC. Not good.

Expand Down Expand Up @@ -447,7 +447,7 @@ Why do we use GC_NOTRIGGERS rather than GC_FORBID? Because forcing every functio

**Note:** There is no GC_FORBID keyword defined for contracts but you can simulate it by combining GC_NOTRIGGER and MODE_COOPERATIVE.

**Important:** The notrigger thread state is implemented as a counter rather than boolean. This is unfortunate as this should not be necessary and exposes us to nasty ref-counting style bugs. What is important that contracts intentionally do not support unscoped trigger/notrigger transitions. That is, a GC_NOTRIGGER inside a contract will **increment** the thread's notrigger count on entry to the function but on exit, **it will not decrement the count , instead it will restore the count from a saved value.** Thus, any _net_ changes in the trigger state caused within the body of the function will be wiped out. This is good unless your function was designed to make a net change to the trigger state. If you have such a need, you'll just have to work around it somehow because we actively discourage such things in the first place. Ideally, we'd love to replace that counter with a Boolean at sometime.
**Important:** The notrigger thread state is implemented as a counter rather than boolean. This is unfortunate as this should not be necessary and exposes us to nasty ref-counting style bugs. What is important is that contracts intentionally do not support unscoped trigger/notrigger transitions. That is, a GC_NOTRIGGER inside a contract will **increment** the thread's notrigger count on entry to the function but on exit, **it will not decrement the count , instead it will restore the count from a saved value.** Thus, any _net_ changes in the trigger state caused within the body of the function will be wiped out. This is good unless your function was designed to make a net change to the trigger state. If you have such a need, you'll just have to work around it somehow because we actively discourage such things in the first place. Ideally, we'd love to replace that counter with a Boolean at sometime.

#### <a name="2.1.10.1"></a>2.1.10.1 GC_NOTRIGGER/TRIGGERSGC on a scope

Expand All @@ -467,13 +467,13 @@ One difference between the standalone TRIGGERSGC and the contract GC_TRIGGERS: t

### <a name="2.2.1"></a>2.2.1 What are holders and why are they important?

The CLR team has coined the name **holder** to refer to the infrastructure that encapsulates the common grunt work of writing robust **backout code**. **Backout code** is code that deallocate resources or restore CLR data structure consistency when we abort an operation due to an error or an asynchronous event. Oftentimes, the same backout code will execute in non-error paths for resources allocated for use of a single scope, but error-time backout is still needed even for longer lived resources.
The CLR team has coined the name **holder** to refer to the infrastructure that encapsulates the common grunt work of writing robust **backout code**. **Backout code** is code that deallocates resources or restores CLR data structure consistency when we abort an operation due to an error or an asynchronous event. Oftentimes, the same backout code will execute in non-error paths for resources allocated for use of a single scope, but error-time backout is still needed even for longer lived resources.

Way back in V1, error paths were _ad-hoc._ Typically, they flowed through "fail:" labels where the backout code was accumulated.

Due to the no-compromise robustness requirements that the CLR Hosting model (with SQL Server as the initial customer) imposed on us in the .NET Framework v2 release, we have since become much more formal about backout. One reason is that we like to write backout that will execute if you leave the scope because of an exception. We also want to centralize policy regarding exceptions occurring inside backout. Finally, we want an infrastructure that will discourage developer errors from introducing backout bugs in the first place.

Thus, we have centralized cleanup around C++ destructor technology. Instead of declaring a HANDLE, you declare a HandleHolder. The holder wraps a HANDLE and its destructor closes the handle no matter how control leaves the scope. We have already implemented standard holders for common resources (arrays, memory allocated with C++ new, Win32 handles and locks.) The Holder mechanism is extensible so you can add new types of holders as you need them.
Thus, we have centralized cleanup around C++ destructor technology. Instead of declaring a HANDLE, you declare a HandleHolder. The holder wraps a HANDLE and its destructor closes the handle no matter how control leaves the scope. We have already implemented standard holders for common resources (arrays, memory allocated with C++ new, Win32 handles, and locks.) The Holder mechanism is extensible so you can add new types of holders as you need them.

### <a name="2.2.2"></a>2.2.2 An example of holder usage

Expand Down Expand Up @@ -707,24 +707,24 @@ If you wish to set the OOM state for a scope rather than a function, use the FAU
#### <a name="2.3.2.3"></a>2.3.2.3 Remember...

- Do not use INJECT_FAULT to indicate the possibility of non-OOM errors such as entries not existing in a hash table or a COM object not supporting an interface. INJECT_FAULT indicates OOM errors and no other type.
- Be very suspicious if your INJECT_FAULT() argument is anything other than throwing an OOM exception or returning E_OUTOFMEMORY. OOM errors must distinguishable from other types of errors so if you're merely returning NULL without indicating the type of error, you'd better be a simple memory allocator or some other function that will never fail for any reason other than an OOM.
- THROWS and INJECT_FAULT correlate strongly but are independent. A NOTHROW/INJECT_FAULT combo might indicate a function that returns HRESULTs including E_OUTOFMEMORY. A THROWS/FORBID_FAULT however indicate a function that can throw an exception but not an OutOfMemoryException. While theoretically possible, such a contract is probably a bug.
- Be very suspicious if your INJECT_FAULT() argument is anything other than throwing an OOM exception or returning E_OUTOFMEMORY. OOM errors must be distinguishable from other types of errors so if you're merely returning NULL without indicating the type of error, you'd better be a simple memory allocator or some other function that will never fail for any reason other than an OOM.
- THROWS and INJECT_FAULT correlate strongly but are independent. A NOTHROW/INJECT_FAULT combo might indicate a function that returns HRESULTs including E_OUTOFMEMORY. A THROWS/FORBID_FAULT however indicates a function that can throw an exception but not an OutOfMemoryException. While theoretically possible, such a contract is probably a bug.

## <a name="2.4"></a>2.4 Are you using SString and/or the safe string manipulation functions?

The native C implementation of strings as raw char* buffers is a well-known breeding ground for buffer overflow bugs. While acknowledging that there's still a ton of legacy char*'s in the code, new code and new data structures should use the SString class rather than raw C strings whenever possible.

### <a name="2.4.1"></a>2.4.1 SString

SString is the abstraction to use for unmanaged strings in CLR code. It is important that as much code is possible uses the SString abstraction rather than raw character arrays, because of the danger of buffer overrun related to direct manipulation of arrays. Code which does not use SString must be manually reviewed for the possibility of buffer overrun or corruption during every security review.
SString is the abstraction to use for unmanaged strings in CLR code. It is important that as much code as possible uses the SString abstraction rather than raw character arrays, because of the danger of buffer overrun related to direct manipulation of arrays. Code which does not use SString must be manually reviewed for the possibility of buffer overrun or corruption during every security review.

This section will provide an overview for SString. For specific details on methods and use, see the file [src\inc\sstring.h][sstring.h]. SString has been in use in our codebase for quite a few years now so examples of its use should be easy to find.

[sstring.h]: https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/sstring.h

An SString object represents a Unicode string. It has its own buffer which it internally manages. The string buffer is typically not referenced directly by user code; instead the string is manipulated indirectly by methods defined on SString. Ultimately there are several ways to get at the raw string buffer if such functionality is needed to interface to existing APIs. But these should be used only when absolutely necessary.

When SStrings are used as local variables, they are typically used via the StackSString type, which uses a bit of stack space as a preallocated buffer for optimization purposes. When SStrings are use in structures, the SString type may be used directly (if it is likely that the string will be empty), or through the InlineSString template, which allows an arbitrary amount of preallocated space to be declared inline in the structure with the SString. Since InlineSStrings and StackSStrings are subtypes of SString, they have the same API, and can be passed wherever an SString is required.
When SStrings are used as local variables, they are typically used via the StackSString type, which uses a bit of stack space as a preallocated buffer for optimization purposes. When SStrings are used in structures, the SString type may be used directly (if it is likely that the string will be empty), or through the InlineSString template, which allows an arbitrary amount of preallocated space to be declared inline in the structure with the SString. Since InlineSStrings and StackSStrings are subtypes of SString, they have the same API, and can be passed wherever an SString is required.

As parameters, SStrings should always be declared as reference parameters. Similarly, SStrings as return value should also use a "by reference" style.

Expand Down