Skip to content

Emit cmp (extended register) on ARM64 to simplify cast-then-compare expressions #112411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 28, 2025

Conversation

snickolls-arm
Copy link
Contributor

Working towards #68028

Matching on GT_EQ/NE/... nodes containing GT_CAST.

Example:

<==
uxtb w1, w0
cmp w2, w1
==>
cmp w2, w1, UXTB

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 11, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 11, 2025
@snickolls-arm
Copy link
Contributor Author

@a74nh @kunalspathak

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

{
return true;
}
castOp->ClearContained();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClearContained

so what happens if the cast containment is not cleared? I am trying to understand how it works currently for this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some assertions hit related to register allocation if we don't clear it here. The problem is with chains such as:

ADD(
  op1,
  CAST(
    LCL_VAR
  )
)

Usually the LCL_VAR node will be contained by the CAST node, this allows a later optimization using LDRB/H to perform the cast with load. The register will be allocated to the CAST node which is then used by the ADD. If we contain the CAST and the LCL_VAR spills, then both the CAST and LCL_VAR nodes won't have a register allocated when we try to generate the ADD. If I understand correctly, it's not possible to tell whether a LCL_VAR will be spilled until after register allocation, but you can force the casted node into a register and contain the CAST node instead.

In most cases the casted node is a register value and the containment from CAST -> LCL_VAR had no effect, so this will result in removing the cast instruction. However, if the casted node is spilled then you would see LDR used (no cast), and the ADD node will use the cast extension, which is the behavior I'm preserving here. I've moved these containment settings from ContainCheckBinary to here because they were causing side effects with the comparison nodes.

This problem also applies to the comparison nodes, which is why I clear the RegOptional flag on some nodes.

@kunalspathak
Copy link
Member

looks like there is a test failure . Can you please double check if it is caused with this PR?

@snickolls-arm
Copy link
Contributor Author

looks like there is a test failure . Can you please double check if it is caused with this PR?

It looks like this failed on "Send to Helix", is this a network error? I can't see any failing build or test output.

@jakobbotsch
Copy link
Member

It looks like this failed on "Send to Helix", is this a network error? I can't see any failing build or test output.

You can see the test output by going to Tests -> System.Runtime.Intrinsics.Tests Work item -> Artifacts -> Console Logs. That takes you to this link: https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-112411-merge-30e75b278b4d4c6eae/System.Runtime.Intrinsics.Tests/1/console.cb1ccbdb.log?helixlogtype=result

In this case the failure looks like known failure #111922.

@@ -453,10 +455,34 @@ bool Lowering::IsContainableUnaryOrBinaryOp(GenTree* parentNode, GenTree* childN

if (parentNode->OperIs(GT_CMP))
{
if (IsInvariantInRange(childNode, parentNode))
castOp->ClearContained();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks odd to me that a function called IsContainableUnaryOrBinaryOp mutates random trees. Can this be done by the caller instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed this

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants