Skip to content

[WIP] Add new multithreaded TwoQubitPeepholeOptimization pass #13419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Nov 10, 2024

Summary

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substitution. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see #11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

  • Only supports the target
  • It doesn't support the XX decomposer because it's not in rust (the TwoQubitBasisDecomposer and TwoQubitControlledUDecomposer are used)

This pass doesn't support using the unitary synthesis plugin interface, since
it's optimized to use Qiskit's built-in two qubit synthesis routines written in
Rust. The existing combination of ConsolidateBlocks and UnitarySynthesis
should be used instead if the plugin interface is necessary.

Details and comments

Fixes #12007
Fixes #11659

TODO:

@mtreinish mtreinish added performance Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Nov 10, 2024
@mtreinish mtreinish added this to the 2.0.0 milestone Nov 10, 2024
@coveralls
Copy link

coveralls commented Nov 10, 2024

Pull Request Test Coverage Report for Build 14525825123

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 410 of 441 (92.97%) changed or added relevant lines in 11 files are covered.
  • 393 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.07%) to 88.3%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/two_qubit_decompose.rs 13 14 92.86%
crates/accelerate/src/unitary_synthesis.rs 2 3 66.67%
crates/accelerate/src/two_qubit_peephole.rs 359 388 92.53%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 2 93.23%
crates/accelerate/src/unitary_synthesis.rs 31 94.69%
crates/accelerate/src/basis/basis_translator/mod.rs 37 89.48%
crates/circuit/src/dag_circuit.rs 323 86.88%
Totals Coverage Status
Change from base Build 14525370807: 0.07%
Covered Lines: 74214
Relevant Lines: 84048

💛 - Coveralls

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substituion. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see Qiskit#11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

- Only supports the target
- It doesn't support any synthesizers besides the TwoQubitBasisDecomposer,
  because it's the only one in rust currently.

For plugin handling I left the logic as running the three pass series,
but I'm not sure this is the behavior we want. We could say keep the
synthesis plugins for `UnitarySynthesis` only and then rely on our
built-in methods for physical optimiztion only. But this also seems less
than ideal because the plugin mechanism is how we support synthesizing
to custom basis gates, and also more advanced approximate synthesis
methods. Both of those are things we need to do as part of the synthesis
here.

Additionally, this is currently missing tests and documentation and while
running it manually "works" as in it returns a circuit that looks valid,
I've not done any validation yet. This also likely will need several
rounds of performance optimization and tuning. t this point this is
just a rough proof of concept and will need a lof refinement along with
larger changes to Qiskit's rust code before this is ready to merge.

Fixes Qiskit#12007
Fixes Qiskit#11659
Since Qiskit#13139 merged we have another two qubit decomposer available to
run in rust, the TwoQubitControlledUDecomposer. This commit updates the
new TwoQubitPeepholeOptimization to call this decomposer if the target
supports appropriate 2q gates.
Clippy is correctly warning that the size difference between the two
decomposer types in the TwoQubitDecomposer enumese two types is large.
TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer
is only 24 bytes. This means each element of ControlledU is wasting
> 1600 bytes. However, in this case that is acceptable in order to
avoid a layer of pointer indirection as these are stored temporarily
in a vec inside a thread to decompose a unitary. A trait would be more
natural for this to define a common interface between all the two qubit
decomposers but since we keep them instantiated for each edge in a Vec
they need to be sized and doing something like
`Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer`
instead of a enum) to get around this would have additional runtime
overhead. This is also considering that TwoQubitControlledUDecomposer
has far less likelihood in practice as it only works with some targets
that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.
Also don't run scoring more than needed.
@ShellyGarion
Copy link
Member

Copy here the comment of @t-imamichi #13568 (comment)
and my reply: #13568 (comment)

I think this closes #13428. How about adding a test case of consecutive RZZ (RXX, and RYY) gates?

We should make sure that after PR #13568 and this PR will be merged, we can efficiently transpile circuits into basis fractional RZZ gates .

@mtreinish
Copy link
Member Author

I added support for using the ControlledUDecomposer to the new pass back in early December with this commit: 746758f although looking at that now with fresh eyes I need to check that the gate is continuous in the target, right now it only looks at the supported gate types.

@ShellyGarion ShellyGarion self-assigned this Feb 3, 2025
The priority for the two qubit peephole pass should be decreasing the 2q
gate count. The error rate heuristic should only matter if the 2q counts
are the same. This commit flips the heuristic to first check the 2q gate
count so the first priority is reducing the 2q gate count.
@@ -0,0 +1,102 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2017, 2024.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# (C) Copyright IBM 2017, 2024.
# (C) Copyright IBM 2017, 2025.

Seems like the copyright years have not often been updated in source files. But since this is a new file, and a new year...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did write the pass originally in 2024. I've been working on this on and off since November.

@@ -0,0 +1,533 @@
// This code is part of Qiskit.
//
// (C) Copyright IBM 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (C) Copyright IBM 2024
// (C) Copyright IBM 2025

@mtreinish mtreinish modified the milestones: 2.0.0, 2.1.0 Feb 21, 2025
@ShellyGarion
Copy link
Member

Copy here this #13428 (comment) - is this example being solved by this PR?

One case that has not been solved yet is the following:

  • the circuit has one CZ gate and one RZZ gate
  • the target has basis gates containing 'cz' and 'rzz'.

In this case, the circuit would not get consolidated and re-synthesized into a single RZZ gate.
I'm not sure that I have a quick fix for this, but hope that it would be solved in #13419.

       basis_gates = ["rzz", "rx", "rz", "cz"]
        qc = QuantumCircuit(2)
        qc.rzz(0.1, 0, 1)
        qc.cz(0, 1)
        consolidate_pass = ConsolidateBlocks(basis_gates=basis_gates)
        res = consolidate_pass(qc)
        pm = generate_preset_pass_manager(optimization_level=2, basis_gates=basis_gates)
        tqc = pm.run(qc)
        print (res)
        print (tqc)

outputs:


q_0: ─■─────────■─
      │ZZ(0.1)  │ 
q_1: ─■─────────■─
                  
                  
q_0: ─■─────────■─
      │ZZ(0.1)  │ 
q_1: ─■─────────■─

@ShellyGarion
Copy link
Member

ShellyGarion commented Mar 11, 2025

Is it possible to try to synthesize both U and U^(-1) to check which one produces a better synthesis?
We may be able to reduce some 1-qubit gates this way (as one can always invert the gates in the synthesized circuit).
See the following example:

        qc = QuantumCircuit(2)
        qc.rzz(0.1, 0, 1)
        qc.rzz(0.2, 0, 1)
        pm = generate_preset_pass_manager(optimization_level=2, basis_gates=["rzz", "rx", "rz"])
        tqc = pm.run(qc)
        print (tqc)

produces the circuit:

global phase: π
                                  
q_0: ──────────■──────────────────
     ┌───────┐ │ZZ(-0.3) ┌───────┐
q_1: ┤ Rx(π) ├─■─────────┤ Rx(π) ├
     └───────┘           └───────┘

while its inverse:

        tqc = pm.run(qc.inverse())
        print (tqc)

produces the circuit:

q_0: ─■─────────
      │ZZ(-0.3) 
q_1: ─■─────────

This commit removes the unitary synthesis plugin mechanism from the
pass. This was a layer violation to support this when the pass logic
doesn't actually support using the plugin interface. It is easier and
more clear that if the plugin interface usage is desired to handle that
in the pass manager construction rather than have this pass internally
build a pass manager and execute other passes to emulate behavior it
doesn't have.
There were two issues identified by the testing which required fixing
and adjusting the tests based on limitations in the pass. The first
issue was the parameters for the target gate was not handled correctly.
In the case of using the Controlled U decomposer we were not passing the
computed parameter value correctly to the output circuit and instead the
ParameterExpression from the target was being used. Then in the case of
controlled gates (not supercontrolled) that had a fixed angle that are
normally intended for the xx decomposer were incorrectly being passed to
the TwoQubitBasisDecomposer which can't work with them. This was
resulting in invalid circuit outputs. The use of the
TwoQubitBasisDecomposer is now correctly filtering to only be run with
supercontrolled gates. The tests were adjusted for this limitation
because they were mostly copied from the UnitarySynthesis tests which
supports xx decomposer.
UGate,
ZGate,
RYYGate,
RZZGate,
Copy link
Member

@t-imamichi t-imamichi Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the error log, RZZGate is not used.

Suggested change
RZZGate,

@ShellyGarion
Copy link
Member

I've suggested some further tests here: mtreinish#31

The typing for some of the new methods on the DAGCircuitBuilder where a
bit too strict and requried the caller to do more work than was
necessary. This commit loosens the typing to make it a bit more
ergonomic and straightforward to use. It also more closely matches the
DAGCircuit methods the builder struct is mirroring. Right now the only
signature difference is qubits and clbits are wrapped in an Option while
on DAGCircuit it's not. This commit doesn't change this difference,
although there really isn't a reason to make this distinction and both
methods could have the same signature.
This commit updates some of the pass's logic and handles some bugs that
were found during development. It also adds a couple new test cases, one
of which is failing. There still seem to be bugs in the case we need to
use a reverse edge in the target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
5 participants