Merge the remaining groups of runtime tests

Related to: https://github.com/dotnet/runtime/issues/54512

As of 7/6/2022, about 1/3 of the total runtime test set has been converted to using merged wrappers and running most tests in-proc to improve Helix perf. This issue is about fanning out the effort to the various runtime feature teams to follow up by converting the remaining runtime tests.

Original conversions:
- [x] JIT/Methodical
- [x] JIT/HardwareIntrinsics

According to the analytic tool ILTransform I created for the purpose of the conversion, the biggest remaining buckets of non-converted tests are in the following subtrees:

- [x] [JIT/Regression (1312 tests)](https://github.com/dotnet/runtime/pull/83895)
- [x] [JIT/jit64 (828 tests)](https://github.com/dotnet/runtime/pull/83151)
- [x] [JIT/CodeGenBringUpTests (641 tests)](https://github.com/dotnet/runtime/pull/85847)
- [x] [JIT/Directed (541 tests)](https://github.com/dotnet/runtime/pull/83256)
- [x] [JIT/IL_Conformance (378 tests)](https://github.com/dotnet/runtime/pull/80597)
- [x] [baseservices/threading (231 tests)](https://github.com/dotnet/runtime/pull/83143)

By merging these subtrees we should be able to reach about 70% conversion rate after which we can focus on the remaining smaller test groups.

Leading JIT test directories (updated 03/08/2023):
- [ ] JIT/BBT (1 test)
- [ ] JIT/CheckProjects (1 test)
- [x] [JIT/Generics](https://github.com/dotnet/runtime/pull/85849) (222 tests)
- [ ] JIT/Intrinsics (31 tests)
- [ ] JIT/Math (2 tests)
- [x] [JIT/opt](https://github.com/dotnet/runtime/pull/85850) (304 tests)
- [x] [JIT/Performance](https://github.com/dotnet/runtime/pull/85851) (100 tests)
- [ ] JIT/PGO (5 tests)
- [ ] JIT/RyuJIT (1 test)
- [x] [JIT/SIMD](https://github.com/dotnet/runtime/pull/85852) (116 tests)
- [ ] JIT/Stress (5 tests)

Merged test groups impact day-to-day work, and directories such as JIT\Regression are actively modified.  We also need to address usability issues.

Highest priority - can cause tests to be skipped or impact the team's ability to disable failing tests:
- [ ] #84182
- [ ] Standard xunit behavior is to only look at publicly accessible methods for tests ([Fact] and friends).  This can lead to tests silently being ignored.  Make this an error.
  - [x] xunit.analyzers can check the enclosing type of a test method.  Enabled by https://github.com/dotnet/runtime/pull/83806.
  - [ ] A custom check will be needed for the methods themselves.
  - [ ] We should probably also be rejecting tests with `Main` methods because they will fail if `BuildAsStandalone` is enabled when it tries to build another entry point.
- [ ] We need to test and document how to include/exclude/disable tests – xunit attributes, SkipOnCoreClr, issues.target, particularly difficult is GC/JIT stress and related – see/update https://github.com/dotnet/runtime/blob/main/docs/workflow/ci/disabling-tests.md.  This includes command-line one-off executions as well as disabling failing tests in the repo.  Ask DavidWr to review once written.

High priority - impact day-to-day work or debugging issues with test merging:
- [x] [Write the generated Main methods to disk](https://github.com/dotnet/runtime/pull/83444).  There is a csproj property for this.  Ideally the filename printed in C# error messages would match, but that might be difficult.
- [x] Fix https://github.com/dotnet/runtime/issues/81984
- [ ] The logic for ilasm roundtripping testing doesn't work correctly. It is implemented in the scripting layer and just does an ildasm/ilasm on the entry exe, which means that test merging ends up just doing the roundtrip in the new wrappers and not the original tests. The logic should probably be changed to "for x in <any .net assembly in the folder, possibly recursively>, roundtrip x".

Lower priority - these can (somewhat) be worked around but most aren't sustainable (e.g., poor documentation means all testing issues will come to Mark/Tomas/Jeremy)
- [x] Can we leave behind Main methods in tests (in addition to [Fact] methods) to allow for extra parameterization (e.g,. `test --version`)?  [ANSWER: sort of - You can keep a `Main` method, but you need to then set `ReferenceXUnitWrapperGenerator` to `false` or BuildAsStandalone will break. `Main` will need a `[Fact]` or `[Theory]`, and arguments (if any) will need to match across the [xxxData] and csproj property for BuildAsStandalone=true/false to be consistent.]
- [ ] Write broader documentation on merged test groups.
- [ ] Wrapper doesn’t fail when tests fail.  However, don’t confuse test failures with test harness, etc., failures.
- [ ] Wrapper prints a “display name” which isn’t suitable for feeding back to disable.
- [ ] Can we measure the impact?

This isn't a specific work item but a change we all need to make: **We need to aggressively disable tests that take down the runtime (and hide other tests).**  We should consider having JIT assertions throw managed exceptions, but these can be caught.  Ideally we want monitoring (either within the wrapper - with the cost of extra code - or external to it).

<details>
<summary>Partitioning of subtrees under JIT/Regression</summary>

This was originally intended for tracking conversion progress and is useful for understand the set of tests there.  However, the entire batch was converted at once, so it won't be used for tracking progress.

- clr-x64-JIT, CLR-x86-EJIT, Dev14, v4 (21 tests total)
- Dev11 (34 tests total)
- VS-ia64-JIT
  - VS-ia64-JIT/M00 (46 tests total)
  - VS-ia64-JIT/V1.2-Beta1, VS-ia64-JIT/V1.2-M01, VS-ia64-JIT/V2.0-Beta2, VS-ia64-JIT/V2.0-RTM (39 tests total)
  - VS-ia64-JIT/V1.2-M02 (35 tests total)
- JitBlue
  - JitBlue/CoreFX*, JitBlue/DevDiv_1*, JitBlue/DevDiv_2*, JitBlue/DevDiv_3* (40 tests total)
  - JitBlue/DevDiv_4*, JitBlue/DevDiv_5*, JitBlue/DevDiv_6*, JitBlue/DevDiv_7*, JitBlue/DevDiv_8*, JitBlue/DevDiv_9* (50 tests total)
  - JitBlue/GitHub_&lt;less than 10000&gt; (33 tests total)
  - JitBlue/GitHub_&lt;10000 through 15999&gt; (45 tests total)
  - JitBlue/GitHub_&lt;16000 through 19999&gt; (57 tests total)
  - JitBlue/GitHub_&lt;20000 through 24999&gt; (46 tests total)
  - JitBlue/GitHub_&lt;25000 and above&gt;, JitBlue/GitHub_CoreRT_2073 (24 tests total)
  - JitBlue/Runtime_&lt;less than 50000&gt; (45 tests total)
  - JitBlue/Runtime_&lt;50000 and above&gt;, WPF_3226 (48 tests total)
- CLR-x86-JIT
  - CLR-x86-JIT/dev10, CLR-x86-JIT/dev11, CLR-x86-JIT/V1-M09, CLR-x86-JIT/V1-M10 (51 tests total)
  - CLR-x86-JIT/V1-M13-RTM, CLR-x86-JIT/V1-M14-SP1, CLR-x86-JIT/V1-M15-SP2 (45 tests total)
  - CLR-x86-JIT/V1-QFE, CLR-x86-JIT/V1.1-M1-Beta1, CLR-x86-JIT/V1.2-M01 (55 tests total)
  - CLR-x86-JIT/V1.2-M02, CLR-x86-JIT/V2.0-Beta2, CLR-x86-JIT/V2.0-RTM (31 tests total)
  - CLR-x86-JIT/v2.1, CLR-x86-JIT/v2.2 (29 tests total)
  - CLR-x86-JIT/V1-M09.5-PDC/b&lt;less than 15000&gt; (47 tests total)
  - CLR-x86-JIT/V1-M09.5-PDC/b&lt;15000 through 25999&gt; (48 tests total)
  - CLR-x86-JIT/V1-M09.5-PDC/b&lt;26000 through 29999&gt; (42 tests total)
  - CLR-x86-JIT/V1-M09.5-PDC/b&lt;30000 and above&gt; (32 tests total)
  - CLR-x86-JIT/V1-M11-Beta1/b&lt;less than 41000&gt; (44 tests total)
  - CLR-x86-JIT/V1-M11-Beta1/b&lt;41000 through 44999&gt; (52 tests total)
  - CLR-x86-JIT/V1-M11-Beta1/b&lt;45000 and above&gt; (42 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;less than 37000&gt; (47 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;37000 through 52999&gt; (47 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;53000 through 58999&gt; (41 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;59000 through 64999&gt; (47 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;65000 through 71999&gt; (50 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;72000 through 79999&gt; (44 tests total)
  - CLR-x86-JIT/V1-M12-Beta2/b&lt;80000 and above&gt; (20 tests total)
</details>

Thanks

Tomas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge the remaining groups of runtime tests #71732

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Merge the remaining groups of runtime tests #71732

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions