Skip to content

gh-132732: Automatically constant evaluate pure operations #132733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Apr 19, 2025

@python-cla-bot
Copy link

python-cla-bot bot commented Apr 19, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really neat!

Other than two opcodes I found that shouldn't be marked pure, I just have one thought:

Rather than rewriting the bodies like this to use the symbols-manipulating functions (which seems error-prone), would we be able to just use stackrefs to do this?

For example, _BINARY_OP_ADD_INT is defined like this:

PyObject *left_o = PyStackRef_AsPyObjectBorrow(left);
PyObject *right_o = PyStackRef_AsPyObjectBorrow(right);
// ...
res = PyStackRef_FromPyObjectSteal(res_o);

Rather than rewriting uses of these functions, could it be easier to just do something like this, since we're guranteed not to escape?

if (sym_is_const(ctx, stack_pointer[-2]) && sym_is_const(ctx, stack_pointer[-1])) {
    // Generated code to turn constant symbols into stackrefs:
    _PyStackRef left = PyStackRef_FromPyObjectBorrow(sym_get_const(ctx, stack_pointer[-2]));
    _PyStackRef right = PyStackRef_FromPyObjectBorrow(sym_get_const(ctx, stack_pointer[-1]));
    _PyStackRef res;
    // Now the actual body, same as it appears in executor_cases.c.h:
    PyObject *left_o = PyStackRef_AsPyObjectBorrow(left);
    PyObject *right_o = PyStackRef_AsPyObjectBorrow(right);
    // ...
    res = PyStackRef_FromPyObjectSteal(res_o);
    // Generated code to turn stackrefs into constant symbols:
    stack_pointer[-1] = sym_new_const(ctx, PyStackRef_AsPyObjectSteal(res));
}

I'm not too familiar with the design of the cases generator though, so maybe this is way harder or something. Either way, I'm excited to see this get in!

@Fidget-Spinner
Copy link
Member Author

Rather than rewriting uses of these functions, could it be easier to just do something like this, since we're guranteed not to escape?

Seems feasible. I could try to rewrite all occurences of the variable with a stackref-producing const one. Let me try that.

@Fidget-Spinner
Copy link
Member Author

I've verified no refleak on test_capi.test_opt locally apart from #132731 which is pre-existing.

@markshannon
Copy link
Member

There's a lot going on in this PR, probably too much for one PR.

Could we start with a PR to fix up the pure annotations so that they are on the correct instructions and maybe add the pure_guard annotation that Brandt suggested?

@markshannon
Copy link
Member

Could we have the default code generator generate a function for the body of the pure instruction and then call that from the three interpreters?

@brandtbucher
Copy link
Member

Could we have the default code generator generate a function for the body of the pure instruction and then call that from the three interpreters?

Hm, I think I’d prefer not to. Sounds like it could hurt performance, especially for the JIT (where things can’t inline).

@brandtbucher
Copy link
Member

I think a good progression would be:

  • Implement the pure attribute, and the optimizer changes. Remove the pure attributes where they don’t belong (so nothing breaks) and leave the existing ones as proof that the implementation works. (This PR)
  • Audit the existing non-pure bytecodes and add pure where it makes sense. (Follow-up PR)
  • Implement the pure_guard attribute, and annotate any bytecodes that can use it. (Follow-up PR)

@Fidget-Spinner
Copy link
Member Author

Could we have the default code generator generate a function for the body of the pure instruction and then call that from the three interpreters?

Hm, I think I’d prefer not to. Sounds like it could hurt performance, especially for the JIT (where things can’t inline).

I thought about this and I think we can inline if we autogenerate a header file and include that directly. But then we're at the mercy of the compiler in both the normal interpreter and the JIT deciding to inline or not to inline the body again. Which I truly do not want.

@Fidget-Spinner
Copy link
Member Author

@brandtbucher @markshannon what can I do to get this PR moving?

@tomasr8 if youd like to review, here's a summary of the PR:

  1. If a bytecode operation is pure (no side effects) we can mark it as pure in bytecodes.c.
  2. In the optimizer, we automatically generate the body that does evaluation of the symbolic constants by copy pasting the bytecodes.c definition into the optimizer's C code. Of course we check that the inputs are constants first.
  3. All changes to the cases generator is for the second point.

@tomasr8
Copy link
Member

tomasr8 commented May 8, 2025

Thanks for the ping! I actually wanted to try/review this PR, I was just very busy this week with work :/ I'll have a look this weekend :)

Copy link
Member

@tomasr8 tomasr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had time to skim the PR, I'll do a more thorough review this weekend :)

@bedevere-app bedevere-app bot requested a review from markshannon May 28, 2025 14:04
@Fidget-Spinner
Copy link
Member Author

@markshannon do you have any other comments? I think this has gone through enough rounds of review by you, Brandt, and Tomas, which I'm thankful for.

@Fidget-Spinner
Copy link
Member Author

@markshannon if there are no objections, I'm going to merge this on Friday. There's 3 reviews, 1 approval, and this PR has been up for 2 months. I think I've given everyone sufficient time to review, and I'm thinking that any further issues can be addressed in future PRs.

@@ -129,6 +140,31 @@ def __init__(self, out: CWriter, labels: dict[str, Label]):
}
self.out = out
self.labels = labels
self.is_abstract = False

def emit_to_with_replacement(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need any new code here.
All the new code should go in optimizer_generator.py

@@ -157,7 +193,7 @@ def deopt_if(
lparen = next(tkn_iter)
assert lparen.kind == "LPAREN"
first_tkn = tkn_iter.peek()
emit_to(self.out, tkn_iter, "RPAREN")
self.emit_to_with_replacement(self.out, tkn_iter, "RPAREN", uop, storage, inst)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emit_to doesn't call the replacement functions. This does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there function calls that need replacing within a DEOPT_IF?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using replacement functions to replace all references to a variable with a replacement. For example, all references to -- out get replaced with out_stackref.

The thing that is getting replaced is references like

if (PyStackRef_IsNull(out)) {....}

becomes

if (PyStackRef_IsNull(out_stackref)) {....}

@markshannon
Copy link
Member

It seems like the issue of escaping is causing problems here.
For the code generator, "escaping" means "able to run the GC", which shouldn't happen in the abstract interpreter.
So either, we are not correctly marking functions as non-escaping, or we are calling functions that do escape (which we shouldn't).

In the example you give, _PyLong_Multiply(sym_get_const(x), sym_get_const(y)) neither _PyLong_Multiply nor sym_get_const escape. They just need to be added to the whitelist.

@Fidget-Spinner
Copy link
Member Author

I've added the required functions to the allowlist, so I removed the is_abstract workaround.

@Fidget-Spinner
Copy link
Member Author

It seems like the issue of escaping is causing problems here. For the code generator, "escaping" means "able to run the GC", which shouldn't happen in the abstract interpreter. So either, we are not correctly marking functions as non-escaping, or we are calling functions that do escape (which we shouldn't).

In the example you give, _PyLong_Multiply(sym_get_const(x), sym_get_const(y)) neither _PyLong_Multiply nor sym_get_const escape. They just need to be added to the whitelist.

@markshannon it seems this is wrong. _PyLong_Multiply/Add/Subtract can trigger the GC. The failing tests currently are evidence of that. The problem is with the SIGCHECK macro in longobject.c. https://github.com/python/cpython/blob/main/Objects/longobject.c#L114

@Fidget-Spinner
Copy link
Member Author

I have a fix in a separate PR for the longobject GC issues.

@markshannon
Copy link
Member

I think we only specialize for, and are interested in compact ints (or tagged ints in the future), so maybe replace _PyLong_Add with _PyCompactLong_Add?
It would help with much the same issue I'm having with excessive escapes in TOS caching.

@Fidget-Spinner
Copy link
Member Author

For now, I'm avoiding changing the int operations in this PR. I will add back constant evaluation for them in the future once we fix this in either bytecodes.c or the long object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants