Skip to content
This repository was archived by the owner on May 22, 2023. It is now read-only.

[Pass] Canonicalizing Bindings #233

Merged
merged 14 commits into from
Sep 8, 2022

Conversation

slyubomirsky
Copy link
Collaborator

@slyubomirsky slyubomirsky commented Aug 24, 2022

It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking LookupBinding for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating BlockBuilder to also update its binding map for MatchShape nodes; that was arguably a bug.) Additionally, MatchShape bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary VarBindings.

Example:

y = x
z = y
w = z
o = w
p = o

Will be replaced with

y = x
z = x
w = x
o = x
p = x

Later, unused bindings can be removed by another pass. Eventually, we can consider other kinds of identity operations like multiplying by 1 or adding 0 in this canonicalization, but this pass does not handle it for now because such operations may affect aliasing (if they allocate new tensors). We can revisit that issue once our position on aliasing is clear.

Some tricky parts of this pass:

  1. Var bindings can escape DataflowBlocks, so if a Var is assigned to a DataflowVar, we must take care never to replace a Var with a DataflowVar.
  2. ExprMutator's default behavior is to update type annotations based on the checked_type_ of the body. This can result in erasing user-provided type annotations, which we should respect.
  3. We face a similar issue for the shape_ field, with the added complexity that the shape_ for some operators can feature calls to PackedFuncs, so we will have to ensure variable replacements are performed within those.

I would greatly appreciate review or proposals for improvements, especially for test cases that may be problematic. I think removing the unused bindings should be handled by other passes. I imagine this as a pass that could be useful when combined with others. (If we don't feel it's useful, we can also drop the PR and revisit down the line).

@sunggg
Copy link
Collaborator

sunggg commented Aug 25, 2022

Thank you for the great work!
I haven't reviewed the code yet (will do!), but have a general comment I'd like to mention.
Can we design this pass to consider future extension with these relay passes? It would be great if we can reuse the basic structure.
https://github.com/apache/tvm/blob/main/src/relay/transforms/canonicalize_ops.cc
https://github.com/apache/tvm/blob/main/src/relay/transforms/canonicalize_cast.cc

@slyubomirsky
Copy link
Collaborator Author

Openness to extension is certainly a good idea. What might that entail? Naming this CanonicalizeBindings so as to open the door to other canonicalization passes in the future?

Copy link
Collaborator

@LeshengJin LeshengJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! 🎉
I love the comments in your code. They helped me a lot in understanding this pr. Really learned a lot.

I have two small questions for you:)

@sunggg
Copy link
Collaborator

sunggg commented Aug 25, 2022

That's a certainly one option. But it may result in many passes that does different kinds of canonicalization.
Or we can implement a group of benign canonicalizations (applying them is always good, like your binding pass) inside the single pass and just call it canonicalization pass.
Personally, I prefer the latter option since it is easier to configure at pipeline perspective.

Copy link
Contributor

@psrivas2 psrivas2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this important optimization!

I like that this pass composes well with dead code elimination - nice application of "separation of concerns" here :)

I have a few high-level comments:

  1. It seems we can further break down this optimization into two independent canonicalization patterns:
  • Patterns:
    • Identity folding: Replace y=x; z=y with y=x; z=x.
    • match_shape folding: Replace y = match_shape(x, shape_expr) with y = x if x.shape is equivalent to shape_expr.
  • The two patterns above would compose well and can be applied greedily until fixed point to achieve the same result. We can add more such patterns in the future.
  1. Cast Ops: y: Object = x needs special handling because of implicit casting. If we have explicit casting then this would be a cast op and we do not need special handling for these. Since the decision on that is still pending, just wanted to highlight this case. For future optimizations too implicit casting might complicate things.
  2. Usually with canonicalization patterns we might want to apply many of them greedily until fixed point. For example, the two patterns above. This way we can keep the system easily extensible.

def main(x: Tensor((m, n), _)):
y = x
# trivial check
z = R.match_shape(x, (m, n))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think match_shape can have an independent canonicalization pattern. It is trivially removed if for R.match_shape(a, shape_expr) shape_expr is equivalent to a.shape_ because we know it is always true.
This way we can have two simple canonicalization patterns: identity and match_shape which can be composed and applied greedily to get the same result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of structuring this pass in terms of different patterns instead of handling everything together, but I'm having a little trouble seeing how these patterns would be represented in the code. Could you please elaborate on how you think that should be implemented?

def main(x: Tensor) -> Object:
y = x
# z will be treated as object type even though it's a tensor
z: Object = y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good motivation to have explicit cast ops. This is not an identity operation, but a cast operation (Tensor -> Object) and hence we need special handling for this case.

Copy link
Collaborator Author

@slyubomirsky slyubomirsky Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not call this a cast per se because Object is a supertype of Tensor; a strongly typed language like Java would not require an explicit cast if assigning a variable to one with a supertype. Do you think the pass requires any additional logic for cases where the RHS and LHS have different types? (Edit: I suppose it can be called an implicit cast, and it is implicit precisely because there is no specific expression associated with it.)

Copy link
Collaborator

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution!
My high-level feedback is here: #233 (comment)
LGTM. Just a few nits.

@slyubomirsky
Copy link
Collaborator Author

@psrivas2 I like your idea about having multiple canonicalization patterns and applying them to fixpoint. Could you elaborate on how that might be structured in the code? What would be the interface for a canonicalization pattern?

@slyubomirsky
Copy link
Collaborator Author

I could imagine a fixpoint approach that does not make use of a union-find. Namely, for every var, look up its binding site. If the binding site is another var, replace that var with the parent (unless the checked_type_ or shape_ differ). Repeat until fixpoint. In principle, we could check for other canonicalization patterns when we do the variable lookup. Is that what you had in mind, @psrivas2?

@psrivas2
Copy link
Contributor

I could imagine a fixpoint approach that does not make use of a union-find. Namely, for every var, look up its binding site. If the binding site is another var, replace that var with the parent (unless the checked_type_ or shape_ differ). Repeat until fixpoint. In principle, we could check for other canonicalization patterns when we do the variable lookup. Is that what you had in mind, @psrivas2?

Yes, that is what I had in mind. It would also be nice (your call to do it in this PR or leave it to follow up PRs) if the canonicalizer can do both -- apply patterns and also constant fold aggressively. Constant folding can produce new canonicalization possibilities.

@slyubomirsky
Copy link
Collaborator Author

slyubomirsky commented Aug 29, 2022

I will try the fixpoint approach. I will also see if I can use constant folding inside this pass as well.

@slyubomirsky
Copy link
Collaborator Author

@psrivas2 Thank you for suggesting another approach--using LookupBinding without a union-find vastly simplified the code. Interestingly, I found that I did not even need to run multiple iterations of the pass to achieve fixpoint. Can you (or others) think of any cases where multiple passes are needed?

@slyubomirsky slyubomirsky changed the title [Pass] Canonicalization (coalescing variable definitions) [Pass] Canonicalizing Bindings Aug 30, 2022
Comment on lines 57 to 85
// Unlike default visitor, preserve the checked_type_
// We may need to change the shape field in case there are substitutions
// that need to be performed within the shape computation.
Expr new_value = this->VisitExpr(binding->value);
Var new_var = this->VisitVarDef(binding->var);

auto emit = [this](VarBinding b) {
if (this->builder_->CurrentBlockIsDataFlow() && !b->var.as<DataflowVarNode>()) {
this->builder_->EmitOutput(b);
} else {
this->builder_->Emit(b);
}
};

if (new_var.same_as(binding->var) && new_value.same_as(binding->value)) {
emit(GetRef<VarBinding>(binding));
return;
}

// we don't look at the new value's shape or checked type; we only consider
// if there were any substitutions performed within the original var's shape_
Var temp = WithShapeAndType(new_var, new_var->shape_, new_var->checked_type_);
if (!temp.same_as(new_var)) {
new_var = temp;
this->var_remap_[binding->var->vid] = new_var;
}

// unlike default visitor, we do not permit the var's checked_type to change
emit(VarBinding(new_var, new_value));
Copy link
Contributor

@psrivas2 psrivas2 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that this block of code is not needed if we allow type refinement. So z: Object = x can be refined to z: Tensor = x. So do we still need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am under the impression that we should respect user annotations when they appear, which is why I've done it

Copy link
Contributor

@psrivas2 psrivas2 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and checked the notes in August 16, 2022 meeting. Seems like this topic is under discussion. Would be great if can reach consensus on this soon.

Comment on lines 88 to 125
void VisitBinding_(const MatchShapeNode* binding) override {
// for match shape, we need to be cleverer and allow the shape_ to change
// due to possible substitutions
Expr new_value = this->VisitExpr(binding->value);
Expr new_pattern = this->VisitExpr(ShapeExpr(binding->pattern));

Var new_var;
if (binding->var.defined()) {
Optional<Expr> new_shape;
if (new_value->checked_type_.defined() && new_value->checked_type_.as<DynTensorTypeNode>()) {
new_shape = new_pattern;
}
// visit var def visits the var's shape_ field and may perform variable substitutions,
// so we should use that shape_ if it's defined
new_var = this->VisitVarDef(binding->var);
if (new_var->shape_.defined()) {
new_shape = Downcast<Expr>(new_var->shape_);
}

// do not permit the type to change
Var temp = WithShapeAndType(new_var, new_shape, binding->var->checked_type_);
if (!temp.same_as(new_var)) {
new_var = temp;
this->var_remap_[binding->var->vid] = new_var;
}
}

// reemit old binding if nothing changes
if (new_value.same_as(binding->value) && new_pattern.same_as(binding->pattern)) {
if (!binding->var.defined() || (binding->var.defined() && new_var.same_as(binding->var))) {
builder_->EmitMatchShape(GetRef<MatchShape>(binding));
return;
}
}

builder_->EmitMatchShape(
MatchShape(new_value, Downcast<ShapeExpr>(new_pattern)->values, new_var));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this required? None of the tests check for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests do fail if you leave this out. This is because visiting shape_ can cause variables inside the shape_ field to change. This comes up in the test cases with relax.add, since the shape_ for that is a PackedFunc call that uses variables in the program.

@psrivas2
Copy link
Contributor

using LookupBinding without a union-find vastly simplified the code.

Indeed! Thank you for making the changes. Left some comments in the code.

Can you (or others) think of any cases where multiple passes are needed

Good point. For single pattern, a single pass should be fine.

In case of multiple patterns, it depends on how different patterns are being applied. While visiting bindings in RPO, if we apply all patterns to the value of a binding till none of the patterns are applicable, and only then move to next binding, it does seem like a single pass would be enough.

masahi added a commit to masahi/relax that referenced this pull request Jan 21, 2023
commit 5bf9c8acf12dfba9865ac9f8480341298131dec4
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 16:10:16 2023 +0900

    clean up

commit 5506d92ed9a4c48c63f192ddcb576c9665d4ad5b
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 15:39:39 2023 +0900

    link and run compiled cutlass code, result correct

commit 81d39f84ebb1a7bcfe5c2fa9f97ce2130f932dbb
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 15:13:41 2023 +0900

    compile generated cutlass code

commit c2a68e14575c2711497347d5fc93d15b88c6c79b
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 07:47:31 2023 +0900

    codegen working

commit ba26344f85ebe43f88852c8c18b754bf03df1ce1
Author: Masahiro Masuda <[email protected]>
Date:   Mon Jan 16 19:41:47 2023 +0900

    wip

commit ed3ac6d632a4798e411573f30d1a090bc05a96fc
Author: Masahiro Masuda <[email protected]>
Date:   Mon Jan 16 17:53:10 2023 +0900

    wip

commit 47e09e54a0d405a14a602d7a6d31c49399c5662f
Author: Masahiro Masuda <[email protected]>
Date:   Mon Jan 16 17:32:58 2023 +0900

    wip

commit b9e5df768b188de3dda1ef0d0f3db3fd592535d9
Author: Masahiro Masuda <[email protected]>
Date:   Mon Jan 16 17:25:37 2023 +0900

    copy codegen_c base function

commit fe20e653ecf548f07432f06cd17395b554e6faa5
Author: Masahiro Masuda <[email protected]>
Date:   Sat Jan 14 08:43:57 2023 +0900

    add cutlass stub

commit 990eec78b58ca259bc067bb32e4020f28d88b7c8
Author: Masahiro Masuda <[email protected]>
Date:   Sat Jan 14 08:18:57 2023 +0900

    updated cutlass revision

commit 591a8f1ba62d9f8e923f2dcc1702e7e7590e92e2
Author: Masahiro Masuda <[email protected]>
Date:   Sat Jan 14 08:02:01 2023 +0900

    conv2d + relu DNNL offload works

commit 1365402079626eab5bf99bad96dbfa4abd750175
Author: Masahiro Masuda <[email protected]>
Date:   Fri Jan 13 16:35:49 2023 +0900

    starting DNNL codegen

commit 4a72e7810b0df31a4fb13856b5b6320ced4e978e
Author: Masahiro Masuda <[email protected]>
Date:   Thu Jan 12 14:02:19 2023 +0900

    clean up

commit 61cc55e94123f3064e0d1200c70f33b4a537c4ad
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 10 16:26:31 2023 +0900

    pattern based partitioning working

commit 2433733c5458302cbe05e534d6c99bec13fb6d36
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 10 08:30:20 2023 +0900

    add conv2d match & run test

commit 360429440acb7068fdfd982d597523ebe032eb20
Author: Ruihang Lai <[email protected]>
Date:   Mon Jan 9 17:20:05 2023 -0500

    [Op][O2e] Indexing and datatype operators (#338)

commit e45bdb73824d120bb3b848d4fdaa54f88211b509
Author: Tianqi Chen <[email protected]>
Date:   Mon Jan 9 14:59:26 2023 -0500

    [VM] Supporting "compiled" exec mode. (#331)

    * [VM] Supporting "compiled" exec mode.

    This PR adds support of "compiled" mode to the VM.
    The compiled mode translate the relax function into TIR function
    and drive it through the TIR function.

    It is different from the micro AOT codegen, which generate TIR code
    that targets the micro C runtime environment and useful for resource
    limited settings with smaller set of features. Both leverages the
    low-level TIR build that is also shared with TensorIR.

    The current implementation targets full TVM (VM) runtime, that
    comes with PackedFunc, object, tuple, closure and all kinds of rich structure
    support. This also mean that we can leverage the full runtime support
    to handle things like allocation, dynamic shape, easy plugins and python
    interaction, which are not available in more limited runtime.

    The user directly use the same API to load the generated code regardless
    of compiled mode or bytecode. And just need to change one line

    ```python
    ex = relax.vm.build(mod, target, exec_mode="compiled")
    ```

    Most of the codegen features are lifted before the codegen phase,
    so the overall implementation would be around 500 loc for each exec mode
    and can be further cut down with future introduction of PrimValue.

    The simplicity is thanks to the TVM runtime archiecture that allows us
    to compose things together in objects. The only difference is how
    the PackedFunc of high-level driving is being provided.
    In the case of bytecode it is normal interpretation and in the
    case of compiled mode it is TIR.

    It is a complete implementation Unit-testcases are added. All codegen
    build tests are updated to include two exec_modes and have passed locally.
    The only exception that we skipped some special packedfunc handling(printing)
    because can be further simplified after we introduce PrimValue.

    Co-authored-by: Junru Shao <[email protected]>

    * Address review comments

    Co-authored-by: Junru Shao <[email protected]>

commit 32c2bf74eda5ff9cb958e6d54a29c324d53f2869
Author: Ruihang Lai <[email protected]>
Date:   Mon Jan 9 13:45:14 2023 -0500

    [Op][O2d] Manipulation operators (#337)

    As tracked by #332, this PR is the O2d milestone of the high-level operator introduction plan.

    This PR introduces a few manipulation operators:
    * broadcast_to
    * concat
    * expand_dims
    * flatten
    * permute_dims
    * reshape
    * split
    * squeeze
    These operators are all well-tested.

commit b39d11a37c899a1625ecee0ffdacc5ef5444365f
Author: Ruihang Lai <[email protected]>
Date:   Mon Jan 9 10:57:19 2023 -0500

    [O2h] Neural network and linear algebra operators (#343)

commit 1d6d897ec223cc07768e0382c3e21a196ffdfac8
Author: Ruihang Lai <[email protected]>
Date:   Sun Jan 8 20:21:50 2023 -0500

    [O2g] Convolution, pooling and image operators (#341)

commit 95f784ece1d61676b88b5455be3dab5e3ddbc75a
Author: Ruihang Lai <[email protected]>
Date:   Sun Jan 8 16:53:10 2023 -0500

    [Op][O2f] Set and searching operators (#339)

commit be1c32d817bbbbd56329378d6d929dce79ecb0f8
Author: Siyuan Feng <[email protected]>
Date:   Mon Jan 9 03:38:20 2023 +0800

    simple fix jupyter error reporting (#345)

commit da11e4bf373349ce4142949099e29d11655aa88b
Author: Siyuan Feng <[email protected]>
Date:   Sun Jan 8 23:09:22 2023 +0800

    [TVMScript] Symbolic shape computing (#342)

commit 80808fbf9a02480abf337b8a5edffe34c963feec
Author: Ruihang Lai <[email protected]>
Date:   Sat Jan 7 18:31:00 2023 -0500

    [Op][O2c] Creation operators (#336)

commit 5efc8f7224f83766875e74669e139ec82119a504
Author: Ruihang Lai <[email protected]>
Date:   Sat Jan 7 11:14:23 2023 -0500

    [TIR] Create Layout with specified axis dtype (apache/tvm#13663) (#340)

commit ae71be06c8252c211642abb9d5b3e4583bdb6f6a
Author: Ruihang Lai <[email protected]>
Date:   Fri Jan 6 16:41:18 2023 -0500

    [Op][O2b] Statistical operators (#334)

commit 8220df74e339cdb6dab38a803b80edc3cd6b92e2
Author: Ruihang Lai <[email protected]>
Date:   Thu Jan 5 18:31:48 2023 -0500

    [Op][O1][O2a] Utility, arithmetic and comparison operators (#333)

    As tracked by #332, this PR is the kickoff part of high-level operator introduction in Relax.

    This PR is about the milestone O1 and O2a. Specifically, this PR
    * introduces some of common utility functions that the registration and StructInfo inference of each operator will often use.
    * introduces unary arithmetic operators: cos, log, negative, sigmoid, sin, sqrt, tanh.
    * refactors and introduces binary arithmetic operators: add, divide, floor_divide, multiply, subtract.
    * introduces binary comparative operators: equal, greater, greater_equal, less, less_equal, not_equal.

    These operators are well tested from three perspective:
    P1. the op getter can get correct op by name
    P2. their StructInfo inference result are as expected under all kinds of cases
    P3. Relax TVMScript parser can parse the scripts with the op inside

    For operators in O2a, most operators share almost the same StructInfo inference logic. Therefore, for tests in P2, in each category, not every op is tested in every case. For each case, it is good to have only part of op in this category tested. This is intended not to make overlarge testing file.

commit f1cab0a05f05829c4c35e2a7e613bd69f2a17fae
Author: Siyuan Feng <[email protected]>
Date:   Thu Jan 5 20:43:28 2023 +0800

    [TVMScript] Ensure consistent struct info between assign lhs and rhs with sinfo annotation (#328)

    * [TVMScript] Ensure consistent struct info between assign lhs and rhs with sinfo annotation

    * fix

    * fix

commit dc7072efe290d7e8c69d8e216311510981fc82e1
Author: Tianqi Chen <[email protected]>
Date:   Wed Jan 4 10:13:08 2023 -0500

    [REFACTOR] Hide VM Impl, Improve execution logic. (#326)

    * [REFACTOR] Hide VM Impl, Improve execution logic.

    This PR refactors VM by hiding most of the VM implementations
    and improve the overall execution logic.

    - Unifies PackedFunc and Closure Table.
    - Update Closure mechanism to no longer depend on string.
    - Update VMMemoryLower to VMBuiltinLower to incorporate more VM intrinsic lowering,
      move some of the codegen intrinsic to this phase.
    - Allow directly pass in function index as VM instruction.

    * Address comment

commit 2449d8c205f0b6e2c346132695b56039b07e9a10
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Tue Jan 3 22:04:16 2023 -0500

    [IR][ASTPrinter] Tweaks to AST printer's handling of struct info (#330)

commit 2d352807090ba1b7e898fbdcb83d6d9427c762cf
Author: Siyuan Feng <[email protected]>
Date:   Tue Jan 3 23:20:47 2023 +0800

    [TVMScript] Enforce `I.DeclareFunc` to have function signature (#329)

commit dcae50e836a0c2999f52d96a372fc7de584951f4
Author: Tianqi Chen <[email protected]>
Date:   Mon Jan 2 15:21:49 2023 -0500

    [BACKEND] Refactor and introduce full match-cast support. (#324)

    * [BACKEND] Refactor and introduce full match-cast support.

    This PR refactors VMShapeLower to introduce full match-cast support
    that enables nested tuples, type checks at argument boundary
    and symbolic shape computation.

    Along the way we also refactors cleans up some of vm codegen logic
    and adding unit-tests for different stages.

    * address comments

commit a36920bf672d22e1d31e1e6f81d0447fd7a55806
Author: Siyuan Feng <[email protected]>
Date:   Mon Jan 2 23:31:04 2023 +0800

    [TVMScript] Fix empty TupleStructInfo (#327)

commit 80710a826bda66532eeda978668ed157b471b186
Author: Tianqi Chen <[email protected]>
Date:   Fri Dec 30 15:57:50 2022 -0500

    [CONTAINER] Hash/Equal/JSON support for ShapeTuple (#325)

    This PR add hash/equal/json support for shape tuple.

commit 343a1e7e2174612031c70ba8547577c7d21839e4
Author: Tianqi Chen <[email protected]>
Date:   Thu Dec 29 18:33:17 2022 -0500

    [REFACTOR] StructInfo M3: MatchShape=>MatchCast (#323)

    * Introduce match cast, and code changes along

    * add match_cast parser support (#9)

    * Match cast support for VMShapeLower CanonicalizeBinding

    * Remove `match_shape` (#12)

    * Refactor ExprVisitor/Mutator to consider Expr in StructInfo.

    Co-authored-by: Siyuan Feng <[email protected]>

commit e332285559d61db1c5033b8d50cd9d4af6c6b6f4
Author: Tianqi Chen <[email protected]>
Date:   Thu Dec 29 01:28:09 2022 -0500

    [REFACTOR] StructInfo M2: Cleanups on legacy shape related items  (#320)

    * [REFACTOR] Remove shape function

    * [WIP] Remove shape_, runtime_dep shape

    * Remove shape_ pass Compile

    * Remove RuntimeDepShape (#11)

    * BlockBuilder: remove CanProveShapeEqual, consolidate binding emit to EmitNormalize

    * Remove DimType, make get_shape_of API different from op.shape_of

    Changes the init importing to direct import so the VSCode nagivator
    can directly jump to the defintion point.

    * Apply suggestions from code review

    Co-authored-by: Ruihang Lai <[email protected]>

    * Clarify cases where struct info can be determinstically derived

    * Fix remaining testcases

    * Remove InferShape/Type per comment.

    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>

commit edadf247551f526188c0a08b3812ffc0a1f9d8bd
Author: Ruihang Lai <[email protected]>
Date:   Fri Dec 23 14:46:07 2022 -0500

    [Analysis] Optionally check structure info in well-formedness check (#321)

    With the introduction of structure info in #314, the well-formedness check will report malformed whenever an Expr doesn’t have defined structure info.

    However, when writing tests for well-formedness check and normalizer, usually we will manually construct the Exprs, which means their structure info are not defined most of the time. As a consequence, the well-formedness check will always complain “the Expr xxx doesn’t have structure info populated.” Therefore, when the checker fails to complain about the original reason of malformed, which means the checker is not working, the tests will still pass and we won’t be able to realize there is something wrong with the checker.

    Thus, in this PR we add an optional flag to the well-formedness check. In well-formedness tests, we will turn off the structure info check so that the original reason of being malformed will be revealed correctly.

    ---

    This PR also cleans up the DiagnosticContext parameter in the WellFormed API - the diag_ctx has been unused since the merge of #99.

commit d548459a1736378398ab773dce413d90d49376cf
Author: Ruihang Lai <[email protected]>
Date:   Fri Dec 23 07:33:25 2022 -0500

    [Op] Enforce int64 output shape in CallTIR (#322)

commit 10a87a455bbb84b0a0d20b22bd31784b9f4b9774
Author: Chaosfan <[email protected]>
Date:   Fri Dec 23 08:03:48 2022 +0800

    [Bugfix] Handle function name properly in Relax TVMScript printer (#317)

    * remove relax_func_name_ and change logic

    * well_formed check for globalvar and gsymbol consistency

    * revise the logic in well_formed and update test

    * Remove `global_symbol` in test_function_attr.py

    * Update docs

    Co-authored-by: Ruihang Lai <[email protected]>

commit 29aebb9d24cbf52ab21fd98996633534301ef34d
Author: Tianqi Chen <[email protected]>
Date:   Wed Dec 21 20:21:57 2022 -0500

    [REFACTOR] M1: Change parser/printer to only depend on struct info (#319)

    * [REFACTOR] StructInfo M1: Parser/printer/Var/Function to only depend on struct info field

    * Update src/relax/backend/vm/vm_shape_lower.cc

    Co-authored-by: Ruihang Lai <[email protected]>

    * Address comments

    * Allow function to have default value

    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>

commit e6173430f491c1d88d2ab77ce0ab43a8c602df30
Author: Tianqi Chen <[email protected]>
Date:   Wed Dec 21 00:42:29 2022 -0500

    [REFACTOR][ARCH] Introduce StructInfo M0 (#314)

    * [IR] Introduce StructInfo

    * StructInfoFunctor and Analysis Support

    * [TVMScript] Parse type/shape annotation with StructInfo

    * remove runtime type assign

    * Remove type/shape during parsing (#2)

    * Normalizer prep: simple checks and legacy function renaming.

    * Struct info deduction in BlockBuilder.

    * Two TODOs

    * StructInfo Normalizer Fixes (#3)

    * StructInfo AST Fix

    * Fix Extern Func Deduction and shape mutator.

    * Update VoidStructInfo & globalvar (#4)

    * Fix passes and proper sinfo propagation.

    * Refactor EraseToWellDefined to Enable Remapping

    * [WIP] First stab at symbolic param tracking

    * Update EraseToWellDefined to support symbolic shape return (#5)

    * fix R.shape with ndim (#6)

    * Remove update shape/type

    * Address review comment, AnnotateTypeShape=>AnnotateStructInfo

    * Update include/tvm/script/ir_builder/relax/frame.h

    Co-authored-by: Ruihang Lai <[email protected]>

    * Address comments

    * Update printer to use structinfo (#7)

    * Update Error mechanism to prep for obj loc based reporting

    * Symbolic shape aware function call return value derivation.

    The main flow works as follows:
    - Match and populate shape_var_map and var_map by visit each pair of
      param and call arguments.
    - Call EraseToWellDefined to map the ret parameter to new result.

    * [ANALYSIS] Refactor well-form to only look at struct info.

    * Update comments according to reviews.

    * Update include/tvm/relax/struct_info.h

    Co-authored-by: Ruihang Lai <[email protected]>

    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Tianqi Chen <tqchen>
    Co-authored-by: Ruihang Lai <[email protected]>

commit 151701740fac3a53b35799a82c85d86f91b720ee
Author: Tianqi Chen <[email protected]>
Date:   Fri Dec 16 17:48:26 2022 -0500

    Update relay_translator.py

commit ad0f3179a84b3bc167f91c3eb082cb996b1d04e2
Author: Ruihang Lai <[email protected]>
Date:   Fri Dec 16 17:37:00 2022 -0500

    [Translator] Remove global symbol and follow-up fix for #262 (#316)

    This PR removes the `global_symbol` linkage added by Relay Translator. It also fixes unaddressed comments of #262.

    All tests can pass locally and I believe it is safe to merge this PR directly.

commit 850deded1201001d833ac65991fb1a4c6509cb1b
Author: Ruihang Lai <[email protected]>
Date:   Fri Dec 16 16:19:48 2022 -0500

    [Translator] Support translating op calls with Tuple input (#262)

    Previously, when a Relay function contains a Call which directly uses Tuples as arguments (the example below),
    ```
    %25 = (%23, %24) /* ty=(Tensor[(1, 160), float32], Tensor[(1, 160), float32]) */;
    %26 = concatenate(%25, axis=-1) /* ty=Tensor[(1, 320), float32] */;
    ```
    our Relay-translator is unable to generate corresponding CallTIR, because the translator always assumes a argument of a Call is mapped to a single tensor (see the code snippet below: the translator directly passes the Relax variable `new_args[-1]` to function `te_tensors`, which translate a Var to a single tensor).
    https://github.com/tlc-pack/relax/blob/60e9a01cdfdd013945790fc03d5abad29b8a7c0b/python/tvm/relax/testing/relay_translator.py#L124
    https://github.com/tlc-pack/relax/blob/60e9a01cdfdd013945790fc03d5abad29b8a7c0b/src/relax/ir/emit_te.h#L56-L61

    But in fact, the Relax variable may correspond to a Tuple of tensors, which wasn’t taken into consideration before. And such case can lead to error in `TETensor`, when creating tensors.

    Therefore, this PR fixes the issue by examine the Relax variable before the tensor creation of Relay Call arguments. If an argument has shape Tuple and type TupleType, we break down the tuple Variable and emit a TupleGetItem for each field, and meanwhile create a tensor for each field.

commit 54a0ff551adb90937073675b4fb3d5439b814398
Author: Siyuan Feng <[email protected]>
Date:   Fri Dec 16 21:02:13 2022 +0800

    Remove relax parser_v1 (#313)

commit b363dd48aced8fb939880db8cf595ed65b7ecc77
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Wed Dec 14 22:51:38 2022 -0500

    [Debugging][Arch] Expose `shape_` fields for `TupleGetItem` and `If` nodes, fix AST printer accordingly (#311)

    * Make the shape of If and TupleGetItem nodes accessible in Python

    * Remove order-dependency from AST printer tests

    * Trailing whitespace

commit 4bb01fe4eccdd59614cc264838a389b21dd40388
Author: Yuchen Jin <[email protected]>
Date:   Wed Dec 14 08:11:47 2022 -0800

    [IR] Dedicated Relax Call, Constant, Tuple, TupleGetItem, If (#306)

    * relax.Constant.

    * Add callnode;

    * Tuple, tuplegetitem, If

    * mypy.

    * lint

    * rebase & fix printer.

    * rebase & remove virtual_device_

    * address comments & leave todos.

    * address comments.

    * address comments.

    * tuple index.

    * type anno.

commit 4cda8a5881fd4cd2473258b35244fc4129b6110c
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Wed Dec 14 09:09:03 2022 -0500

    [BlockBuilder][Refactor] Normalize nested `SeqExpr`s (#310)

    Co-authored-by: Ruihang Lai <[email protected]>

commit 5aab150f322526c1a7bfe6cea0f4d7a7543a7f46
Author: Ruihang Lai <[email protected]>
Date:   Tue Dec 13 17:06:06 2022 -0500

    [ExprMutator] No prologue in VisitWithNewScope when input is SeqExpr (#305)

commit 0bf1f1b784f19298117e36016a2e522f58c143fc
Author: Tianqi Chen <[email protected]>
Date:   Tue Dec 13 15:27:05 2022 -0500

    [REFACTOR] Refactor BlockBuilder (#308)

commit 28d598b6a7c55f95f8f9c2ccd5c860ba5451232d
Author: Siyuan Feng <[email protected]>
Date:   Sun Dec 11 01:28:56 2022 +0800

    [Normalizer] Combine Nearby Blocks in SeqExprs (#298)

commit e152c50e368454afab75425fcb0863b1c328bf4c
Author: Tianqi Chen <[email protected]>
Date:   Thu Dec 8 19:33:18 2022 -0500

    [ARCH] Add VisitBinding second-level dispatcher in Expr type. (#301)

commit fed6b8fc88b824ec68260417793447dbe524c4c3
Author: Yuchen Jin <[email protected]>
Date:   Wed Dec 7 16:55:40 2022 -0800

    [Linkage] Cleanup global_symbol attachment and linkage. (#300)

    * Cleanup global_symbol attachment and linkage.

    * lint

    * Add global_symbol to the main function in translation.

commit e0907d4fd03af1731310647d3d0547bdff2cfaf6
Author: Tianqi Chen <[email protected]>
Date:   Tue Dec 6 21:35:20 2022 -0500

    [ARCH] Introduce NestedMsg to robustly handle nested-tuple analysis (#295)

commit 2eb99975dc1b40b83db7dcbb96b748503dcb3319
Author: Siyuan Feng <[email protected]>
Date:   Mon Dec 5 21:57:21 2022 +0800

    [TVMScript] Update sccript printer to enable roundtrip tests (#291)

commit f8ab9890e14c2533c401969ebf11dd591beff592
Author: Hongyi Jin <[email protected]>
Date:   Sun Nov 27 09:59:26 2022 -0500

    [RUNTIME] Correctly handling export_module when exporting modules of different type (#13489)

commit 9009840e654a9900009f7776a19e26f29b1e3f85
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Fri Dec 2 18:33:50 2022 -0500

    [Debugging] Support PackedFuncType in the AST Printer (#289)

commit bda0e42f05eaba657c40a850486e55c39924f3bf
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Fri Dec 2 18:31:39 2022 -0500

    [IR][Bugfix] Improvements to the normalizer and well-formed checker (#288)

commit d5fe87b21546995c7a88905bd04b4e944d28a0f4
Author: Yong Wu <[email protected]>
Date:   Thu Dec 1 20:00:38 2022 -0800

    Enforce i64 index in ShapeExpr (#281)

commit 9c9eb5585501a5da0f25ca38d7d3ac8269b6714c
Author: Yuchen Jin <[email protected]>
Date:   Thu Dec 1 11:00:47 2022 -0800

    [Parser] Register memory operators to new parser. (#279)

commit 28c3f68cc51d2c22936c5496debcb8c2de54040b
Author: Yong Wu <[email protected]>
Date:   Thu Dec 1 08:55:31 2022 -0800

    [TVMScript] enable the closure test (#280)

    * [TVMScript] enable the closure tests.

commit eb9d531b2565cdd000f46e5ecae2c45b9f589abe
Author: Yuchen Jin <[email protected]>
Date:   Thu Dec 1 05:47:05 2022 -0800

    [Normalizer] Enforce all Expr have checked_type_ invariance after normalization. (#287)

commit 43f81ddf4afc2f4fdb214c9f994e844f53126cdb
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Mon Nov 21 19:25:43 2022 -0500

    [Debugging][Bugfix] Debug printer improvements: Print `shape_` and `checked_type_` for all nodes and handle non-binding `MatchShape`s (#261)

    The initial AST printer only included the `shape_` and `checked_type_` fields for variables because of the potential for infinite recursion (`shape_` nodes can contain other expressions, which in turn have `shape_` nodes). This PR cuts off the potential recursion to allow for printing these fields for all Relax expressions, which should be more useful for debugging.

    This PR also fixes a bug: The AST printer previously did not handle `MatchShape` bindings that did not bind a new variable.

commit 304048c33956dddb5027fec26541d57f903d8ca2
Author: YuchenJin <[email protected]>
Date:   Thu Nov 17 17:02:11 2022 -0800

    Fix after rebase, and reorganize the TVMScript folder structure.

    Co-authored-by: Junru Shao <[email protected]>
    Co-authored-by: Siyuan Feng <[email protected]>

commit e7277460f0a2c7c980be9323cdf7919dc38153e2
Author: Siyuan Feng <[email protected]>
Date:   Thu Nov 17 00:31:32 2022 +0800

    [TVMScript] Switch to the new parser (#276)

    * [TVMScript] Support cross-function call for relax function

    This PR adds support for cross-function call for relax function, by declaring a function signature (i.e. an empty function that contains params and return type/shape but w/o body.)

    However, the PR meets the issue of block_builder shape deduction, which does not use function `ret_shape` to infer the shape of GlobalVar Calls.

commit 7152175762613130e3ba647c77cc9818312a5b06
Author: Yuchen Jin <[email protected]>
Date:   Sat Nov 5 16:45:33 2022 -0500

    [CI] Enable Mypy type checking for Relax; Fix typing errors to pass Mypy checking. (#270)

commit 6f8f6da505b835345d7709d06bdfd8dddce7e85b
Author: Lesheng Jin <[email protected]>
Date:   Thu Nov 3 08:16:35 2022 -0700

    Introduce memory primitives (#255)

    Introduce the memory primitives, including `relax.memory.{alloc_storage, alloc_tensor, kill_storage, kill_tensor}`.

commit 48b7c158cc01532f9019a2e615f2d94766a9464c
Author: Siyuan Feng <[email protected]>
Date:   Thu Oct 20 08:30:47 2022 +0800

    [TVMScript] Update Type Annotation Behavior of the Parser (#269)

    This commit changes the behavior of the parser to allow type annotations, as suggested by the community.
    The current behavior:
    - Use the more refined type/shape between user annotated and deduced type/shape.
    The updated behavior:
    - Always use user annotations
    - Only checks if the type/shape is valid.

commit 5c3079bb6e1e4eeb4dc2d9b740facb2686c67519
Author: sung <[email protected]>
Date:   Mon Oct 17 19:07:01 2022 -0700

    Reenable autotvm silencer; fix e2e_auto_tir.py; fix lint.

    Co-authored-by: YuchenJin <[email protected]>

commit 85b81292626ab6f23caf2b61095a6f957b61b21c
Author: sung <[email protected]>
Date:   Mon Oct 17 18:09:34 2022 -0700

    Recover: [Bugfix] Couple of bug fixes to run TVM-gen code together with BYOC (#249)

commit c46ae8566582f1fcd8fcda1479943d3abb95b3b0
Author: sung <[email protected]>
Date:   Mon Oct 17 17:16:01 2022 -0700

    Recover: [Pass] Separate ApplyHistoryBest from tuning passes (#226)

commit 83bc7cb144643d5823bf06220186528923835667
Author: Junru Shao <[email protected]>
Date:   Sun Oct 16 22:52:56 2022 -0700

    Enable Hexagon tests

commit f9f4f7904ec5468a725b2ba924a619a7c5ed4e43
Author: Junru Shao <[email protected]>
Date:   Sat Oct 15 15:25:56 2022 -0700

    Recover dropped commits

    [TVMScript] B4: If branch support (#263)
    B8: Local Function Support  (#258)
    [TVMScript] B3: Type annotation checks (#256)
    [TVMScript][Parser] B1: Dataflow block (#252)
    [TVMScript] B2: match shape support (#251)
    [TVMScript] B6/B7: Symbolic shape and var shadowing  (#245)
    [TVMScript] B5: Support relax op (#244)
    [TVMScript] B0: Call_tir support (#243)
    enhance parser error reporting (#242)
    [TVMScript] A1: Relax Parser infra (#240)
    update ci image versions. (#241)
    [TVMScript] B2-4: TIR IRBuilder (#239)
    [TVMScript] A0: Relax IRBuilder infra (#235)
    [TVMScript] B5-6: TIR IRBuilder (#231)
    [TVMScript] B1: IRBuilder (#228)
    [TVMScript] New Parser: Part C (#218)
    [TVMScript] New Parser: Part A (#221)
    [TVMScript] New Parser: Part B (#217)

    Not recovered:
    [Pass] Separate ApplyHistoryBest from tuning passes (#226)
    [Bugfix] Couple of bug fixes to run TVM-gen code together with BYOC (#249)

    co-authored-by: Yuchen Jin <[email protected]>
    co-authored-by: Siyuan Feng <[email protected]>
    co-authored-by: Ruihang Lai <[email protected]>

commit 65a53034bc0bee9877a1bdf363c2eadcde35f226
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Thu Oct 13 23:06:55 2022 -0400

    [Op][Debugging] Add `assert` operator (#260)

    It was brought up that Relay lacks an assert operator, so we may as well have one in Relax for debugging. One issue is that we can't name it "`assert`" because Python will treat it as a syntax error to have it as a field name for the "`relax`" module, i.e., `relax.assert` is a syntax error. Thus the op is named "`assert_op`," which is not ideal but serves its purpose.

commit 71d96e6c0a314936fa49fd7bc1ea79069027ab12
Author: Yuchen Jin <[email protected]>
Date:   Wed Oct 12 05:07:33 2022 -0700

    [Pass] Support Function and If in Normalize pass. (#268)

    * Support Function and If in Normalize pass.

    * Use structural equality for expr_memo_.

    * Change back to pointer equality for expr_memo_; Add more tests.

    * rebase.

commit 312a344cdeec66b1330a80d34ca78556fb338e7c
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Tue Oct 11 18:25:29 2022 -0400

    [Analysis] Expose analyses related to vars in Python (#265)

    Previously, analyses to gather up all variables, free variables, bound variables, all global variables, and all global variables that are called had been implemented in C++ but had not been exposed in Python or tested. This PR exposes these analyses and adds tests for them.

    Two further changes:
    * The analyses previously ignored variables bound in `MatchShape` nodes; these are now treated as bindings too.
    * `rec_global_vars` is renamed `called_global_vars`, since the analysis itself does not check recursion.

commit 132702be7e7ed0256045d7a405e532c3d5beef6d
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Mon Oct 10 18:19:38 2022 -0400

    [Expr] Allow annotating return shape on function nodes (#253)

    This PR adds a `ret_shape` field for specifying the shape of the function's return value. At present, we will not use this information, but by adding it into the AST, we will be able to parse the return shape and use it in the future. Parser V1 in this PR will just always list the `ret_shape` as `RuntimeDepShape`.

commit 7276c9e2ee13a4754775491ca36a7aae2d55b827
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Sat Sep 24 00:11:45 2022 -0400

    [Bugfix][VM] Properly convert tensor inputs in `save_function` (#257)

    It was observed that closures saved using `save_function` would crash when used over RPC with the `time_evaluator`, whereas using `set_input` and `invoke_stateful` worked as normal. While I am not entirely sure why these failures happened over RPC only in `time_evaluator` (but not in other RPC trials), it became clear that `set_input` performs a conversion of input tensor values in `SetInputTensorWithIndex`, while `save_function` was not doing this. Adding this conversion fixed the observed bug.

commit 7183c7ffbe896dd9b5f5742b62afe9c821dae682
Author: Josh Fromm <[email protected]>
Date:   Wed Sep 21 17:07:08 2022 -0700

    [Call TIR] Fix bug when invoking call_tir with scalar values. (#254)

    This small PR changes a check in the tvmscript parser to support empty shape tuples which are used to represent scalars. I added a scalar addition test to make sure it works properly.

commit 605ba8d1548efb90980f9b18ea94f1d53f9ec3ec
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Wed Sep 14 17:27:03 2022 -0400

    [Bugfix][Op] Register attributes for unique and print (#248)

    Attempting to use `dump_ast` on functions containing the operators `relax.unique` and `relax.print` previously crashed due to being unable to query their attributes' keys. It turned out that this was a problem with the operator attributes: They had not been registered on the Python side, so Python representation treated them as opaque TVM objects. This PR corrects this mistake.

commit f4525dd8a3e61f572b50107555cef4b469c971f4
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Wed Sep 14 17:24:40 2022 -0400

    [VM][Benchmarking] Add option for saving e2e results as CSV file (#247)

    This PR makes some small additions to the end-to-end AutoTIR script, namely eliminating a bug (it was incorrectly using the stateful API) and adding an option to save the test results as a CSV file for benchmarking purposes (the data can then be separately analyzed as needed).

    These changes also required a small extension to the save_function method in the VM, namely allowing it to take keyword arguments.

commit f1ee4b6cd2c3ee0596cef6f5b7ff7e715fb4ae0d
Author: Ruihang Lai <[email protected]>
Date:   Wed Sep 14 17:23:29 2022 -0400

    [BugFix] Enable emit global MatchShape (#246)

    Fix an incorrect check which disables emitting global MatchShape outside a dataflow block and mistakenly enables emitting dataflow MatchShape outside a dataflow block.

commit 0a7a0a9daf5f1a2fa06ee6cd6169a28d397821fa
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Thu Sep 8 09:49:05 2022 -0400

    [Pass] Canonicalizing Bindings (#233)

    It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

    This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

commit 7a6f91f7d4077eebf926aa1f19281404494b9362
Author: Prakalp Srivastava <[email protected]>
Date:   Thu Sep 1 07:02:57 2022 -0400

    [Hexgaon] Use uploaded path to load module. (#238)

    * Fixes a bug to use the uploaded file remote path for loading the module
    remotely.

    * Modifies the task_python_hexagon.sh script to only run passing test
    on device. This is used by Jenkins CI.

commit e50290140c204ae091e335b797a07f2f6567a163
Author: Lesheng Jin <[email protected]>
Date:   Thu Aug 18 21:51:35 2022 -0700

    [Pass] New Python ExprVisitor/ExprMutator! (#190)

    Add decorators `visitor` and `mutator` to help users create `ExprVisitor` and `ExprMutator` in Python. Users can customize visit/rewrite/post-order-rewrite function in Python.  `PyExprVisitor` and `PyExprMutator` lists the functions users can customize.

commit 7313855476cc522bf3e8bdbe7a60b82cd725fe4c
Author: Ruihang Lai <[email protected]>
Date:   Thu Aug 18 15:20:06 2022 -0400

    [BugFix] Expose `relax.expr.Constant` to `relax.Constant` (#230)

commit cdfd4e939f2d1e88c560a05d83ddf2f7afe70304
Author: Siyuan Feng <[email protected]>
Date:   Thu Aug 18 02:25:13 2022 +0800

    [FIX] Fix windows build issue when allocating a dynamic array (#219)

    In the current codebase, kNumArgs is a runtime-dependent variable (i.e. its value depends on the input shape of Array).

    Allocating arrays with runtime values is not allowed during building on Windows (I'm surprised it can be compiled on Linux and macOS)

commit 887762cd97686ae23a61609ca9ffc8d6a2c5178b
Author: Yong Wu <[email protected]>
Date:   Mon Aug 15 08:00:31 2022 +0800

    Update with rebase

commit 5a23346bc437043b48866411e39dfcf066edda59
Author: Yuchen Jin <[email protected]>
Date:   Sun Aug 14 14:44:12 2022 -0700

    [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond register to take an NDArray instead of POD. (#216)

    Fix the bug in #212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added.

    Fix the bug in https://github.com/tlc-pack/relax/issues/214#issuecomment-1211411432, the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. https://github.com/tlc-pack/relax/commit/811e877c289fa52f55886c8a3e8dce10ed84915f adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

commit 6c9d403503297a0d0e28318bafcba9fc9c99ae42
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Fri Aug 12 13:53:28 2022 -0400

    [VM][UX] Allow for saving closures to avoid extra dictionary lookups in timing trials (#208)

    This PR implements a function that allows for saving a `PackedFunc` in the VM's module that just calls an existing function with a specific set of arguments to address #179 and #178. The main use of this is for timing, to avoid some overhead in looking up functions.

commit e172b40af31dc3384adbcf6e7b0bce7f31ce41ea
Author: Jiawei Liu <[email protected]>
Date:   Thu Aug 11 19:55:57 2022 -0500

    [Pass][UX] Statement rewriter for DataflowBlock (#210)

    - Implements a few APIs to quickly perform statement-level mutation: `add`/`remove_unused`/`remove_all_unused`/`replace_all_uses`.
    - Implemented `remove_all_unused` to remove dead statements inside `DataflowBlock` cc: @psrivas2
    - Address minor issues (unnecessary headers and bad docstrings) in https://github.com/tlc-pack/relax/pull/163

commit 37791e0a5d4a495365fd647f2cecbed16f3a3785
Author: Jiawei Liu <[email protected]>
Date:   Thu Aug 11 13:50:56 2022 -0500

    Clean warning messages by Clang and Pylint (#215)

    * refact: clean clang warning in relax

    * refact: fix pylint

    * fix cpplint and clangd suggestions

    * fix: no cpplint on virtual-override

commit 0b00715dc634aa7f091e942a54a29ee9c802ccf9
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Wed Aug 10 11:47:37 2022 -0400

    [VM][UX] Implement stateful API (#207)

    This PR implements the stateful API discussed in https://github.com/tlc-pack/relax/issues/179. It ensures that if you use `set_input` to set inputs, you must use `invoke_stateful` to run the function (otherwise failing) and must obtain the results using `get_output`. It handles nested tuple returns.

commit ed7b77e040654582d1ab1b9535ebbc4da77da243
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Tue Aug 9 17:07:52 2022 -0400

    [Op][Debugging] Add a print operator (#201)

    * Attempt at adding a print operator

    * Fix the registration

    * Actually use the format string

    * Improve test

    * Fix comment placement

    * Improve the docstring for relax_print

    * Handle tuples too

    * Formatting :(

    * Correct commit message

    * Match attr name across Python and C++

    * Make print variadic

commit a9bd3053c1106d1926fce1dc5787fc8be27f3985
Author: Sunghyun Park <[email protected]>
Date:   Fri Aug 5 11:45:03 2022 -0400

    [Pass] Implement legacy lowering pass that leverages relay op strategy (#189)

    This PR implements Relax Op lowering that leverages existing Relay Op Strategy (legacy).
    As ops like conv2d, matmul are relay-, relax- independent, this pass assumes that we can always find relay op equivalents for such relax ops and use their info to leverage the relay op strategy.

commit 1a1bcf75d97b2e7e4f758b6cd08bd747b222ef36
Author: Sunghyun Park <[email protected]>
Date:   Thu Aug 4 17:56:17 2022 -0400

    [Pass] Introduce metaschedule as a tuning pass (#188)

    This PR delivers MetaSchedule tuning as a tuning passes.
    We can either tune at IRModule level with relax.transform.MetaScheduleTuneIRMod or tune at primfunc level with relax.transform.MetaScheduleTuneTIR.

commit 7144654633477ea0d2bff300ba753dc8bfdeae4d
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Thu Aug 4 14:34:10 2022 -0400

    [Example][UX] Make the RPC timeout configurable in the `e2e_auto_tir` example (#186)

    Running the e2e_auto_tir example over RPC can run into issues due to timeouts because some models can take a long time to run on some machines. This PR makes the RPC timeout configurable to more easily address these issues.

commit 81e565e5df90cfe12d22deb7b26845ea3aa13526
Author: Tianqi Chen <[email protected]>
Date:   Wed Aug 3 19:38:21 2022 -0400

    Fix BlockBuilder Scope Recovery in Misuse (#199)

    This happens in interactive usecases. When function scope
    exit triggers an error, we need to recovery the
    BlockBuilder.current properly so users can try again.

commit 21b1e7dc35dc838214cd4b6f26fbc31492323b02
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Wed Aug 3 19:09:21 2022 -0400

    [Testing][AST] Add a simple AST printer for debugging (#198)

    * Add ast printer

    * Print seq expr body

    * Match annotation field names to real AST

    * Handle call attrs and func ret types

    * Add more advanced test cases

commit 89f55c8167a80b4b9c8751309b5db648fb4db047
Author: Jiawei Liu <[email protected]>
Date:   Wed Aug 3 09:59:47 2022 -0500

    [UX] Adopt changes from tvm-main and render code with IPython.display (#192)

    Render code with IPython.display.HTML if possible to fix the ansi-escape 24-bit rendering issue in Colab.

commit 0b52b558eb14b3f113a4b543c8f0a824baaa58bc
Author: Jiawei Liu <[email protected]>
Date:   Mon Aug 1 11:59:24 2022 -0500

    Dataflow Pattern Lang: Core Matching Features (#163)

    The structure is similar to the Relay's pattern matcher (https://github.com/apache/tvm/pull/5231). The main difference is that those pattern types are adopted to be relax-compatible. Relay pattern types, some less used patterns (IfPattern) and df-topological patterns (DominatorPattern) are ignored (some of them will be brought later).

    The implementation splits patterns into two parts:
    - **Match an Expression**: match an expression syntactically (`MatchExprPattern`, i.e., `DFPatternMatcher`);
    - **Match a Graph**: match a graph (cross multiple `VarBinding`) topologically (`MatchGraphPattern`);

commit 74371634e9a011e63650b734aba20546b016c524
Author: Jiawei Liu <[email protected]>
Date:   Tue Jul 26 20:06:25 2022 -0500

    [UX] Highlight TVMScript with Pygments (#185)

commit 15e54ef215950944ffd74858c12c30aabcb0dcce
Author: Siyuan Feng <[email protected]>
Date:   Sat Jul 23 11:22:13 2022 +0800

    [Pass] Enhance BindParams to take numpy dict as input (#184)

commit cf2e3b97110c805597059c5ba8303a653417e080
Author: Steven S. Lyubomirsky <[email protected]>
Date:   Mon Jul 18 21:45:21 2022 -0400

    [Bugfix][VM] Ensure set_input works over RPC by not returning an array of argument names (#183)

    Currently, attempting to use the VM's `set_input` method will fail over RPC because `set_input` calls `get_func_param_names`, which returns an array of parameter names. RPC does not support sending arrays. This PR corrects this issue by instead having `set_input` query the function arity and then query the argument names one by one, which is the approach taken by the Relay VM (accordingly, the names for the functions used to do this, `get_function_arity` and `get_function_param_name`, are taken from the Relay VM).

    This PR also adds a unit test over RPC on localhost.

commit b0e57dbc0862499c3f2a7d91858354c41fcf5e95
Author: Yong Wu <[email protected]>
Date:   Fri Jul 15 11:50:29 2022 -0700

    Fix after rebase

commit 3494b7a47bf0f7c3219538b2e9064b825cf3258c
Author: Sunghyun Park <[email protected]>
Date:   Mon Jul 18 00:38:41 2022 -0400

    [Pass Infra] Tuning API serialization and database support (#168)

    * refactor tuning API to support serialization of Choice, Knob, Trace

    * Implement tuning api JSON database

    * Add comments

    * fix pylint

    * fix cpplint

    * reflect feedback

    * add minor comment for the future work

commit 777549a6037cc97b698f53ed629cf65c33ae7eca
Author: Siyuan Feng <[email protected]>
Date:   Mon Jul 18 00:05:14 2022 +0800

    [Fix] fix windows build issue (#182)

    TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS is needed when we have a default-like constructor (e.g. (Span span = Span()))

commit b81e6a9838f92ba412a0bd4951a46cc61a43a22d
Author: Siyuan Feng <[email protected]>
Date:   Mon Jul 18 00:04:03 2022 +0800

    fix print twice issue (#181)

commit d4cc79ed664bbe34a4d9dab2923cd5a7a7c5b52c
Author: Lesheng Jin <[email protected]>
Date:   Thu Jul 14 09:15:44 2022 -0700

    [Pass] Python ExprMutatorBase/ExprMutator (#172)

    - Rewrite ExprFunctor in Python. New ExprMutatorBase and ExprMutator in Python.
    - Implement demo passes: RewriteFMA and FuseFMA with Python ExprMutator.
    - Expose some functions to ffi in block_builder.py

commit 01cdc4d43258b1fb9dcc630f05f38f792e3bc513
Author: Prakalp Srivastava <[email protected]>
Date:   Tue Jul 12 19:25:51 2022 -0400

    [VM] Deprecate API to save/load executable to file (#176)

    Executable `save_to_file` and `load_exec_from_file` API was used to
    save/load just the executable to/from file. This was confusing as it did
    not export the TensorIR kernels in the Relax Module, thus leading to
    bugs such as https://github.com/tlc-pack/relax/issues/175.
    Moreover, the API was only used in some tests, and not useful for end
    user.

    Deprecating this API to have a single uniform way of
    serializing/deserializing TVM IRModule using `export_library` and
    `tvm.runtime.load_module` API.

commit 74b3d67e8ae74aed3446a5ae5a05b8f5586e2c3b
Author: Yuchen Jin <[email protected]>
Date:   Fri Jul 1 09:31:30 2022 -0700

    [Refactor] Generic dispatching for `IsBaseOf`; Simplify Type/Expr initializations; `relax` -> `R` in printer; Disallow local function in VMCodegen (#171)

    - Generic dispatching for `IsBaseOf`: `IsBaseOf` uses a bunch of if-else to check if the subtype relation between the base type and derived type, now it's changed to use a generic TypeFunctor to dispatch on the base class to do the check.
    - Simplify Type/Expr initializations: We had to write `RuntimeDepShape(Span()`), `ObjectType(Span())` to initialize several Types and Exprs, this is due to the `TVM_DEFINE_OBJECT_REF_METHODS` macro that sets the constructor with `= default`. By changing to use `TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS`, we can now just write `RuntimeDepShape()` without specifying an empty span.
    - `relax` -> `R` in printer: Change to print `R` rather than `relax` in TVMScript as the default behavior. This is consistent with our test cases and TIR convention: using `T` as shorthand.
    - Disallow generating code for local function in VMCodegen: these local functions should have been lifted in the lambda lifting pass before codegen.

commit 8fdc3ba3eae0d1ffc535e240be251aaae5546eb8
Author: Prakalp Srivastava <[email protected]>
Date:   Thu Jun 30 15:14:40 2022 -0700

    [Parser] Enable R.parser.pretty_print to print TIR PrimFunc (#174)

    This way we can have a uniform API to print IRModule, TensorIR
    function and Relax functions.

commit ed0414540c9fbc063aa727cfc71bdee51a4bafdd
Author: Prakalp Srivastava <[email protected]>
Date:   Wed Jun 29 08:20:17 2022 -0700

    Update tests to use `set_input` for rpc calls. (#173)

    Fix relax-hexagon tests to use set_input api, which is the correct way to invoke a function over RPC.

commit 1f962bda7a79d13fee1a4f9f4ad3ddde4f5467b2
Author: Sunghyun Park <[email protected]>
Date:   Tue Jun 28 20:49:33 2022 -0400

    [BYOC][PASS] Prototype implementation of modular compilation w/ TensorRT (#164)

    This PR delivers the prototype of the followings:
    - Relax BYOC JSON codegen
    - Relax BYOC TensorRT codegen
    - Extension in Relax VM to support external modules
    - `RunCodegen` pass: run codegen for the annotated relax functions
       - Annotation (dispatch decision) will be done by earlier passes  e.g., greedy heuristic, Collage
       - The generated runtime module and Codegen itself should be tvm object
    - Misc minor code improvement for other passes

commit f25fe0c80670272582db3aa791901c7fa49fc59e
Author: Prakalp Srivastava <[email protected]>
Date:   Tue Jun 28 12:47:07 2022 -0700

    Run static/dynamic models over Hexagon using Relax VM RPC (#167)

    * Move Relax VM builtins to src/runtime.

    * This fixes a bug we encountered while loading the module for hexagon.
    Since it was building the minimal runtime it was missing definition
    of Relax VM builtins.

    * Mark Hexagon module as DSO exportable.

    * Load Relax VM Executable over RPC

    * Support allocation for shape heap on device

    Co-authored-by: Yuchen Jin <[email protected]>

commit 25174be634b5e04f0468b48bd477f22b17e75f84
Author: Prakalp Srivastava <[email protected]>
Date:   Fri Jun 24 13:33:04 2022 -0700

    [CI] Enable Hexagon CI in Jenkins. (#169)

    Running all Hexagon tests in simulator is very slow. So we only run
    Relax related hexagon tests `test_relax_integration.py`.
    This test file is empty right now and it would be
    populated as we push relax-hexagon related changes.

commit 225aecdb5d7d33f2af048f3aef9c9a6ac758f4fd
Author: Yuchen Jin <[email protected]>
Date:   Thu Jun 23 09:47:30 2022 -0700

    [VM] Add set_input interface; Fix e2e tuning script. (#166)

    * Add set_input interface.

    * Address comments.

commit 29a707cbd9be6e02dd8a3cd1961cfb53057eb51b
Author: Lesheng Jin <[email protected]>
Date:   Thu Jun 16 09:07:45 2022 -0700

    WellFormed Instrument (#165)

    * add conftest for test/python/relax

    * [Wellformed Check]: allow TupleType as Function parameters

    * move WellFromedInstrument to relax.ir.instrument

    * add header

commit b4c3c4bb65b09db7c9b3ec114d6680d14f306d37
Author: Yong Wu <[email protected]>
Date:   Sat Jun 11 23:26:17 2022 -0700

    Update after rebase

commit 3c0e3c0ee08c78b17cc1ba0429727c199737403a
Author: Yuchen Jin <[email protected]>
Date:   Sat Jun 11 18:42:29 2022 -0700

    [Relay translator] Allow replacing default topi function with user-provided TIR PrimFunc. (#159)

    * Add replace_op_with_tir to translator.

    * came up with a better name

    * better doc.

commit f250f93eed886dc2c3a1cb1f8a4ab2077c57080e
Author: Yong Wu <[email protected]>
Date:   Sat Jun 11 15:20:21 2022 -0700

    [Pass] Lambda Lifting (#99)

commit b55fd31d4e11373b30a93f88412a3d6e2d21d3c1
Author: Siyuan Feng <[email protected]>
Date:   Tue Jun 7 10:07:17 2022 +0800

    [E2E] End-to-End tuning e2e_script (#153)

    Co-authored-by: Ruihang Lai <[email protected]>
    Co-authored-by: Hongyi Jin <[email protected]>

commit d3f94e73ec7b9c9ac7b3675f962e9030e55fa603
Author: Prakalp Srivastava <[email protected]>
Date:   Thu Jun 2 08:19:18 2022 -0700

    Fix shape lowering pass bug for non i64 dims. (#152)

    Prior to this change, VM Shape Lowering pass did not cast integer values
    to shape heap dtype (i64) which resulted in incorrect values when read
    from heap later. This PR adds a cast to i64 for such values.
    This also adds well-formed check to ensure shape dimensions are of
    integer types.

commit 9cf777f48069d598eda276be0b9aabaf301acf0f
Author: Yong Wu <[email protected]>
Date:   Wed Jun 1 17:52:40 2022 -0700

    [Parser] Add FuncType support (#154)

    * [Parser] Add FuncType support

    * Address comments

commit f99121d506df45870cd026e052f5b3c41d4bd982
Author: Sunghyun Park <[email protected]>
Date:   Wed Jun 1 09:01:40 2022 -0700

    [PASS] Remove Unused Functions in IRModule (#151)

commit a718e9f9e073ca0ea1790562254c09aaa863eaa4
Author: Sunghyun Park <[email protected]>
Date:   Tue May 31 15:15:28 2022 -0700

    [Pass Infra] Tuning Pass API (#144)

commit a485b7bdb45f8379daa45e8c923a47fd6871cbdf
Author: Tianqi Chen <[email protected]>
Date:   Sun May 29 12:51:07 2022 -0400

    [REFACTOR] Move TIR op kind analysis to relax as it is relax oriented (#155)

    This also keep TIR mostly independent from higher-level IR.

commit abd20bdc9b87aa53e0c27e8c5c3fc195be5e8c91
Author: Siyuan Feng <[email protected]>
Date:   Sun May 29 23:31:05 2022 +0800

    add test cases for FuseTIR (#156)

commit de42ec3d5ae0f0304060460764619a5a16995a33
Author: Siyuan Feng <[email protected]>
Date:   Thu May 26 22:14:51 2022 +0800

    [Pass] Relax Transform FuseTIR (#150)

    * [Pass] Relax Transform FuseTIR

    Co-authored-by: Hongyi Jin <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>

commit 153d0cc8f2d39b23e63fcd6feaf9755a0eaf8c28
Author: Yuchen Jin <[email protected]>
Date:   Wed May 25 15:44:59 2022 -0700

    [Mutator] Separate unnormalized-form and normal-form mutators (#148)

commit dfa42c09a3087605e805526ab7db7b49d6752ca5
Author: Prakalp Srivastava <[email protected]>
Date:   Fri May 20 16:30:18 2022 -0700

    Print/parse tir cast/max operations in Relax shape (#149)

    tir.cast and tir.max are commonly used operators in shape expression in
    Relax. These two operators often show up when importing Relay module
    with `Any` dims to Relax module.

commit c7186fd44ad5865d84ac61fc2981a15c8af9be4c
Author: Prakalp Srivastava <[email protected]>
Date:   Thu May 19 18:29:12 2022 -0700

    Add support to import relay models with Any dim. (#146)

    Converts Relay Any dimension to symbolic dim in Relax.

commit ef9cf6baba1c2f7215746459ad5a9193df6572c9
Author: Yuchen Jin <[email protected]>
Date:   Tue May 17 07:55:56 2022 -0700

    Refactor shape lowering pass and Blockbuilder. (#145)

commit 230def2284c21eaff520e58fa96a80313b6a7c8f
Author: Yong Wu <[email protected]>
Date:   Fri May 13 14:30:05 2022 -0700

    Support Closure (#140)

commit 0e998988aabdeb8d913e2889eb5a9d72bee35ca2
Author: Lesheng Jin <[email protected]>
Date:   Thu May 12 17:13:15 2022 -0700

    [Analysis] IRModule well-formed check (#142)

commit 1bd4e685ffcc0c4b677af47ecc8609dbfacdfd9d
Author: Yong Wu <[email protected]>
Date:   Wed May 11 09:31:13 2022 -0700

    Change after rebase

commit d0ad35b375449c7e067a1edada7502557a03dd26
Author: Siyuan Feng <[email protected]>
Date:   Tue May 10 08:44:22 2022 +0800

    FuseOps for relax (#141)

    Co-authored-by: Ruihang Lai <[email protected]>
    Co-authored-by: Hongyi Jin <[email protected]>

commit ae7b5b79c40498203842b6c9193e91bcc1937bea
Author: Prakalp Srivastava <[email protected]>
Date:   Wed May 4 20:52:16 2022 -0700

    Add `relax.unique` operator in Relax. (#135)

    * Add Unique operator in Relax.

    This adds the functionality to register a packed function implementation of
    any operator using `FCallPacked` attribute. The relax operator would be
    lowered to a call to the registered packed function during codegen.
    For example, in this change relax.unique is lowered to
    `relax.run.unique` packed function which uses torch.unique under the
    hood.

    * Add support for integer constants in Relax VM.

    This adds serialization, deserialization, and print support for
    integer constants.

commit 1ca18611ae59ab4d1667066ed9921690d2a5611c
Author: Siyuan Feng <[email protected]>
Date:   Tue May 3 09:34:55 2022 +0800

    Add ShapeType to ShapeExpr.checked_type during construction (#139)

commit 6481d533ed259a080dede704f7443c4a2221a842
Author: Sunghyun Park <[email protected]>
Date:   Mon May 2 16:26:08 2022 -0700

    Introduce Relax function attribute and drop name field in Relax function (#136)

commit d735ebd719d89c804691b29ee0d881c785384fc6
Author: Yuchen Jin <[email protected]>
Date:   Sat Apr 30 18:45:14 2022 -0700

    [BlockBuilder] Sub function call shape deduction: constant shape case. (#137)

commit 10f8e56cbcb27beb373075e3c6e3a9728ffb5eb2
Author: Yuchen Jin <[email protected]>
Date:   Thu Apr 28 16:59:38 2022 -0700

    [AST][Type] Introduce ObjectType; Infer the type of call_packed by type_args; Refactor InferType/InferShape. (#132)

commit 7e2038a8b662659dd6ba2e2a86bedbc6c3891bfa
Author: Yuchen Jin <[email protected]>
Date:   Mon Apr 25 17:20:19 2022 -0700

    [AST][BlockBuilder] Normalize relax.Function; Refactor BlockBuilder to take optional input IRModule. (#133)

commit f1eca6d74365c6b0665b64c86ececce86fd76df3
Author: Prakalp Srivastava <[email protected]>
Date:   Sun Apr 24 07:09:11 2022 -0700

    [Printer][Parser] Modify Tensor annotation printing and parsing. (#128)

commit 296876eaf1246ea7948c69d2111cfea2ca51ca0c
Author: Lesheng Jin <[email protected]>
Date:   Fri Apr 22 08:05:13 2022 -0700

    [Pass] Python pass decorator and ExprFunctor (#126)

    * Relax ExprFunctor in Python

    * fix the register bug

    * Expr_functor in relax

    * function/dataflowblock Pass in python

    * testcases

    * reformat

    * fix Tensor annotation()

    * add return type hint

    * type hint

    * new test

    * fix typo

    * remove memo

commit 5199a206cc86cee9e43b0c8ddddf704acdc4b513
Author: Ruihang Lai <[email protected]>
Date:   Thu Apr 21 22:20:33 2022 +0800

    [Relax][MS] Task extraction with proper weights (#129)

    * [Relax][MS] Task extraction with proper weights (hzfengsy#32)

    * Add a unit test

    * Update the deduplication mapping / Update the unit test

    * Update test for DummyDB reusing

    * Remove unnecessary args

    * Remove unused import

commit badee2add6700f12671d3223e43875ca050f537a
Author: Sunghyun Park <[email protected]>
Date:   Wed Apr 20 17:09:37 2022 -0700

    [Relay Translator] Use OpStrategy for lowering (#130)

    * [Relay Translator] Use OpStrategy for lowering

    * Reflect feedback and fix lint issue

    * Consider contexts for PassContext, Target, .. for both pass application and lowering

commit 4454563d240c547fb762cec770502b1e09b195f0
Author: Prakalp Srivastava <[email protected]>
Date:   Wed Apr 13 21:00:54 2022 -0700

    Deprecate `[]` in favor `()` in Tensor annotation. (#123)

commit fab2d95697f7eecce90cb0ba12db2457caf4f2e3
Author: Yong Wu <[email protected]>
Date:   Tue Apr 12 21:15:38 2022 -0700

    Add tune_relax to integrate with task scheduler (#127)

commit 39bab0d25f3e5bb48adf52534f2318149047f617
Author: Yong Wu <[email protected]>
Date:   Tue Apr 12 16:22:33 2022 -0700

    Update autotir integration after rebase

commit caae30f06d237c3aebd00290802122bbfdb2ae26
Author: Yuchen Jin <[email protected]>
Date:   Tue Apr 12 08:23:32 2022 -0700

    [VM] Support sub function call and recursion. (#125)

    * Sub function call and recursion.

    * Address comment.

commit e7c7c15972f6aa29f30a167a794db17f74a6bdeb
Author: Ruihang Lai <[email protected]>
Date:   Tue Apr 12 14:18:32 2022 +0800

    [VM] Copy constant tensors to device (#124)

    * [VM] Copy constants to device (Hzfengsy#24)

    * [VM] Copy constants to device

    * Add unit tests

    * Specify shape and dtype for constant TE tensors in EmitTE

commit ef0a3e689b3896fd30a392d094beaa8d68b6de07
Author: Lesheng Jin <[email protected]>
Date:   Wed Apr 6 11:59:33 2022 -0700

    DataflowBlockPass (#114)

    * add DataflowBlockPass

    * update fma_rewrite

    * drop the skip function

    * update test_fma_rewrite with DataflowBlockPass

    * fix the format

    * fix name

    * rewrite test in tvm script

    * add non-dataflow Vars check

    * add fail testcases

    * module->IRModule

    * add docstring to DataflowBlockNode

    * remove unused pattern

    * Transform Pass->DataflowBlock Pass

    * rename global var to global scope var

    * remove print stmt

    * reformat tests

    * add docstring to DataflowBlockMutator

    * fix filename

    * minor fix

commit 2607f3b9112197045e773b0fc7ceb9ae57e844f8
Author: Yuchen Jin <[email protected]>
Date:   Mon Apr 4 19:59:30 2022 -0700

    Remove type annotation from Var. (#121)

commit 969ffb4302f35344524ef36e74325c0d5e427b76
Author: Prakalp Srivastava <[email protected]>
Date:   Mon Apr 4 08:33:43 2022 -0700

    Add a new Expr to represent runtime dependent shapes. (#117)

    This can be used to represent runtime dependent shapes such as output of `unique` operator. Having explicit runtime dependent shape expression helps to distinguish the following two cases in AST - (1) shape has not been deduced (`shape_ = nullptr`), and (2) shape is runtime dependent. Previously both cases were mapped to `shape_ = nullptr`.

commit 1e2a11f6326c9b3fd3807bbe5d97e4a20ce9dadd
Author: Hongyi Jin <[email protected]>
Date:   Sun Apr 3 00:42:38 2022 +0800

    [PASS] Fold constant & Bind Params (#113)

    * fold constant and bind params

    * fix test

    * format

    * format

    * format

    * address comments

    * format

    * address comment

    * address comment

    * format

    * fix type bug

commit d441f1d0f2104b51287f9f29d9ec9f0e87f4b9d9
Author: Tianqi Chen <[email protected]>
Date:   Sat Apr 2 00:00:19 2022 -0400

    Temporary remove function type deduction in normalizer. (#119)

    * Temporary remove function type deduction in normalizer.

    This PR temporary removes the function type deduction in normalizer
    to unblock some of the followup passes that needs to check function
    type equality.

    Function's checked_type_ are left as nullptr for now.
    We should followup to add function type deduction from annotations.

    * revert the normalizer skip for now

    * comment out parser assert for now

commit 159f599248e3c6faf969198d4e7cf03c4f3f6c70
Author: Yuchen Jin <[email protected]>
Date:   Fri Apr 1 09:18:33 2022 -0700

    [BlockBuilder] Deduce and fill shape/type for Expr in Normalize. (#116)

commit 96c8bbc53286a0ca90ddcb92346156f23ab9efe3
Author: Yuchen Jin <[email protected]>
Date:   Wed Mar 30 11:46:50 2022 -0700

    [CI] Enable GPU tests; Add AutoTIR cuda test. (#115)

    * Add gpu ci.

    * Update autotir gpu test.

commit 1e5c2dac7b01f73c7e3e1a8b092eb0f2b6cc5e28
Author: Tianqi Chen <[email protected]>
Date:   Mon Mar 28 19:12:59 2022 -0400

    [FIX] Fix structure equal hash for MatchShape (#112)

    The pattern field of the match shape can define variables,
    as a result, we need to add DefEqual and Hash here.

    Added a regression testcase.

    Lesson: we would benefit from more testcases
    with check_save_roundtrip checks(like this one) for more relax example.

    Additional change:
    - Redirected TVMScript printer to be able to print relax fragements useful for debugging.

commit 8e466be1d1fa65b9df119e0563ef58c38e8562f2
Author: Siyuan Feng <[email protected]>
Date:   Tue Mar 29 01:30:07 2022 +0800

    introduce blockbuilder call_te (#110)

commit 6ff1614ac3c9e63ea5b615a072a1d26a197b58f9
Author: Siyuan Feng <[email protected]>
Date:   Sun Mar 27 00:02:53 2022 +0800

    [FIX] fix structural_equal_hash (#107)

    * fix structural_equal_hash

    (cherry picked from commit e7e962634999739a32129378f61cc95f58335447)

    * address comment & pass the ci

commit 31ed53c92192c74a3f55009e718b8ae0527ce078
Author: Yuchen Jin <[email protected]>
Date:   Fri Mar 25 10:49:00 2022 -0700

    [Bugfix] Fix call_tir parsing bug (#109)

    * Fix call_tir parsing bug.

    * update.

commit 3c7ff5a272d4b004b9b86b79e0f10c33635cea05
Author: Yuchen Jin <[email protected]>
Date:   Thu Mar 24 19:50:27 2022 -0700

    [VM] Fix hardcoded device type in memory lowering (#106)

    * Add is_device field to attr.

    * Update.

    * Address comment.

    * update.

    * Update.

commit 6bcdcf8d02809dbbafbbd9515ea7ada17bb00077
Author: Ruihang Lai <[email protected]>
Date:   Thu Mar 24 23:04:11 2022 +0800

    [VM] Initialize VM through packed function (#101)

commit cfc779e732933eb43cb0bca6448c51fac51dc39f
Author: Yong Wu <[email protected]>
Date:   Tue Mar 22 19:44:37 2022 -0700

    Fix after rebase

commit c368324831d378033d9b0f6621f3ee3b366624e6
Author: Lesheng Jin <[email protected]>
Date:   Tue Mar 22 18:51:40 2022 -0700

    Improve printer for DynTensorType and ShapeExpr (#97)

    * improve Printer for DynTensorType & ShapeExpr

    * add testcases

commit a861f2eeadc3ded5a98aa2947a6b17f077e29dc2
Author: Ruihang Lai <[email protected]>
Date:   Tue Mar 22 23:16:33 2022 +0800

    [VM][Refactor] Move VM files to TVM runtime directory (#98)

commit d96806093e9ff50aaf4d46a89d1003f87385bf7e
Author: Tianqi Chen <[email protected]>
Date:   Mon Mar 21 12:03:59 2022 -0400

    [VM] Refactor and improve vm. (#96)

    * [VM] Refactor and improve vm.

    - Have a separate function for RunInstCall.
    - Cache func_index lookup by table to avoid repeative lookup by str.
    - Move PackedFunc call arg stack to Frame to increase locality and avoid re-allocation in repeative calls.
    - Make frame stack of unique_ptr to avoid frame re-allocation and copy during frame.resize.
    - Pass…
masahi added a commit to masahi/relax that referenced this pull request Jan 21, 2023
commit 86f1cc147255da43569d331997591fb9994229fe
Author: Masahiro Masuda <[email protected]>
Date:   Sat Jan 21 15:22:36 2023 +0900

    properly handle binding order when remapping tuple output

commit dc6f3184fa8a8a3e36cebf31109ef3b5e755152b
Author: Masahiro Masuda <[email protected]>
Date:   Sat Jan 21 05:23:47 2023 +0900

    Improve merging algorithm following MergeCompilerRegion

commit cf8eefbf705f0e6ba248b0cc68b52200129bdc3b
Author: Masahiro Masuda <[email protected]>
Date:   Fri Jan 20 19:51:44 2023 +0900

    more update from upstream

commit 50c2b8195ed85aff83437abec801a019dcb767e6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Jan 20 19:18:13 2023 +0900

    remove WrapCompositeFunction

commit ff9de42e07f7e2ff0735afbeb2c9baa3243914e8
Author: Masahiro Masuda <[email protected]>
Date:   Fri Jan 20 19:17:23 2023 +0900

    fix

commit a5f2203ceae4c2037c3667b6e02495ba065b75a4
Author: Masahiro Masuda <[email protected]>
Date:   Fri Jan 20 14:14:08 2023 +0900

    update from upstream

commit 7cc344f0aface006835c0426d56aad2a62e4ef18
Author: Masahiro Masuda <[email protected]>
Date:   Thu Jan 19 19:54:58 2023 +0900

    Add FuseCompositeFunctions pass

commit 2348c8e4387730a3a3cf15eb61c0e193984dd42f
Author: Masahiro Masuda <[email protected]>
Date:   Thu Jan 19 19:23:58 2023 +0900

    clean std::vector<JSONGraphNodeEntry>

commit e61b5daa13e93e21d2a8647042d6df6bc87adea2
Author: Masahiro Masuda <[email protected]>
Date:   Thu Jan 19 19:09:45 2023 +0900

    update

commit 7abb4bbec9f472624047ed5182740b9103829cb6
Author: Masahiro Masuda <[email protected]>
Date:   Thu Jan 19 08:01:04 2023 +0900

    simplify RunCodegen interface

commit 111c5512dd57d732c60a3205a2d948e3b9d4a1c0
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 20:20:18 2023 +0900

    compile all functions at once in cutlass to support caching

commit 93c5bbe0acbd3c172ea2e0e3664d2c9bae2c810f
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 19:34:56 2023 +0900

    send all extern functions to codegen in one go

commit 119dfdc6bf73c44a2f63a31eb15e402e1ab60eb7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 19:15:49 2023 +0900

    refactor RunCodegen

commit d4defec923ba808a054bac9381a6e599fe10cc89
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 17:29:14 2023 +0900

    extract attributes from relax func

commit 4e5ef524883afbe242a0061092b2ef2e3de6b87c
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 16:09:04 2023 +0900

    introduce contrib.cutlass.tune_relax_function to get annotations

commit 8fe0c4611605908f051dd28262a1ca98dd369e3e
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 15:34:42 2023 +0900

    thread through compile options in RunCodegen

commit 6ea5190ee8a08218beaf5fd1b0d0c7773afd8788
Author: Masahiro Masuda <[email protected]>
Date:   Wed Jan 18 12:59:08 2023 +0900

    Add WrapCompositeFunction pass

commit 9fa6b44a7df2d515ee529820d2d322e858cdeae0
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 22:03:03 2023 +0900

    properly handle multiple patterns (not tested)

commit 37715e03879eac8523a3b2e331dde40f4eb22e71
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 21:52:04 2023 +0900

    attach name to pattern

commit 20e5ac0900656cb7769343291b17d84cb99dd87c
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 19:31:57 2023 +0900

    clean up

commit c0070146895d102ac2ee910afa56229077974828
Author: Masahiro Masuda <[email protected]>
Date:   Tue Jan 17 17:01:40 2023 +0900

    (Rebase) Squashed commit of the following:

    commit 5bf9c8acf12dfba9865ac9f8480341298131dec4
    Author: Masahiro Masuda <[email protected]>
    Date:   Tue Jan 17 16:10:16 2023 +0900

        clean up

    commit 5506d92ed9a4c48c63f192ddcb576c9665d4ad5b
    Author: Masahiro Masuda <[email protected]>
    Date:   Tue Jan 17 15:39:39 2023 +0900

        link and run compiled cutlass code, result correct

    commit 81d39f84ebb1a7bcfe5c2fa9f97ce2130f932dbb
    Author: Masahiro Masuda <[email protected]>
    Date:   Tue Jan 17 15:13:41 2023 +0900

        compile generated cutlass code

    commit c2a68e14575c2711497347d5fc93d15b88c6c79b
    Author: Masahiro Masuda <[email protected]>
    Date:   Tue Jan 17 07:47:31 2023 +0900

        codegen working

    commit ba26344f85ebe43f88852c8c18b754bf03df1ce1
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon Jan 16 19:41:47 2023 +0900

        wip

    commit ed3ac6d632a4798e411573f30d1a090bc05a96fc
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon Jan 16 17:53:10 2023 +0900

        wip

    commit 47e09e54a0d405a14a602d7a6d31c49399c5662f
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon Jan 16 17:32:58 2023 +0900

        wip

    commit b9e5df768b188de3dda1ef0d0f3db3fd592535d9
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon Jan 16 17:25:37 2023 +0900

        copy codegen_c base function

    commit fe20e653ecf548f07432f06cd17395b554e6faa5
    Author: Masahiro Masuda <[email protected]>
    Date:   Sat Jan 14 08:43:57 2023 +0900

        add cutlass stub

    commit 990eec78b58ca259bc067bb32e4020f28d88b7c8
    Author: Masahiro Masuda <[email protected]>
    Date:   Sat Jan 14 08:18:57 2023 +0900

        updated cutlass revision

    commit 591a8f1ba62d9f8e923f2dcc1702e7e7590e92e2
    Author: Masahiro Masuda <[email protected]>
    Date:   Sat Jan 14 08:02:01 2023 +0900

        conv2d + relu DNNL offload works

    commit 1365402079626eab5bf99bad96dbfa4abd750175
    Author: Masahiro Masuda <[email protected]>
    Date:   Fri Jan 13 16:35:49 2023 +0900

        starting DNNL codegen

    commit 4a72e7810b0df31a4fb13856b5b6320ced4e978e
    Author: Masahiro Masuda <[email protected]>
    Date:   Thu Jan 12 14:02:19 2023 +0900

        clean up

    commit 61cc55e94123f3064e0d1200c70f33b4a537c4ad
    Author: Masahiro Masuda <[email protected]>
    Date:   Tue Jan 10 16:26:31 2023 +0900

        pattern based partitioning working

    commit 2433733c5458302cbe05e534d6c99bec13fb6d36
    Author: Masahiro Masuda <[email protected]>
    Date:   Tue Jan 10 08:30:20 2023 +0900

        add conv2d match & run test

    commit 360429440acb7068fdfd982d597523ebe032eb20
    Author: Ruihang Lai <[email protected]>
    Date:   Mon Jan 9 17:20:05 2023 -0500

        [Op][O2e] Indexing and datatype operators (#338)

    commit e45bdb73824d120bb3b848d4fdaa54f88211b509
    Author: Tianqi Chen <[email protected]>
    Date:   Mon Jan 9 14:59:26 2023 -0500

        [VM] Supporting "compiled" exec mode. (#331)

        * [VM] Supporting "compiled" exec mode.

        This PR adds support of "compiled" mode to the VM.
        The compiled mode translate the relax function into TIR function
        and drive it through the TIR function.

        It is different from the micro AOT codegen, which generate TIR code
        that targets the micro C runtime environment and useful for resource
        limited settings with smaller set of features. Both leverages the
        low-level TIR build that is also shared with TensorIR.

        The current implementation targets full TVM (VM) runtime, that
        comes with PackedFunc, object, tuple, closure and all kinds of rich structure
        support. This also mean that we can leverage the full runtime support
        to handle things like allocation, dynamic shape, easy plugins and python
        interaction, which are not available in more limited runtime.

        The user directly use the same API to load the generated code regardless
        of compiled mode or bytecode. And just need to change one line

        ```python
        ex = relax.vm.build(mod, target, exec_mode="compiled")
        ```

        Most of the codegen features are lifted before the codegen phase,
        so the overall implementation would be around 500 loc for each exec mode
        and can be further cut down with future introduction of PrimValue.

        The simplicity is thanks to the TVM runtime archiecture that allows us
        to compose things together in objects. The only difference is how
        the PackedFunc of high-level driving is being provided.
        In the case of bytecode it is normal interpretation and in the
        case of compiled mode it is TIR.

        It is a complete implementation Unit-testcases are added. All codegen
        build tests are updated to include two exec_modes and have passed locally.
        The only exception that we skipped some special packedfunc handling(printing)
        because can be further simplified after we introduce PrimValue.

        Co-authored-by: Junru Shao <[email protected]>

        * Address review comments

        Co-authored-by: Junru Shao <[email protected]>

    commit 32c2bf74eda5ff9cb958e6d54a29c324d53f2869
    Author: Ruihang Lai <[email protected]>
    Date:   Mon Jan 9 13:45:14 2023 -0500

        [Op][O2d] Manipulation operators (#337)

        As tracked by #332, this PR is the O2d milestone of the high-level operator introduction plan.

        This PR introduces a few manipulation operators:
        * broadcast_to
        * concat
        * expand_dims
        * flatten
        * permute_dims
        * reshape
        * split
        * squeeze
        These operators are all well-tested.

    commit b39d11a37c899a1625ecee0ffdacc5ef5444365f
    Author: Ruihang Lai <[email protected]>
    Date:   Mon Jan 9 10:57:19 2023 -0500

        [O2h] Neural network and linear algebra operators (#343)

    commit 1d6d897ec223cc07768e0382c3e21a196ffdfac8
    Author: Ruihang Lai <[email protected]>
    Date:   Sun Jan 8 20:21:50 2023 -0500

        [O2g] Convolution, pooling and image operators (#341)

    commit 95f784ece1d61676b88b5455be3dab5e3ddbc75a
    Author: Ruihang Lai <[email protected]>
    Date:   Sun Jan 8 16:53:10 2023 -0500

        [Op][O2f] Set and searching operators (#339)

    commit be1c32d817bbbbd56329378d6d929dce79ecb0f8
    Author: Siyuan Feng <[email protected]>
    Date:   Mon Jan 9 03:38:20 2023 +0800

        simple fix jupyter error reporting (#345)

    commit da11e4bf373349ce4142949099e29d11655aa88b
    Author: Siyuan Feng <[email protected]>
    Date:   Sun Jan 8 23:09:22 2023 +0800

        [TVMScript] Symbolic shape computing (#342)

    commit 80808fbf9a02480abf337b8a5edffe34c963feec
    Author: Ruihang Lai <[email protected]>
    Date:   Sat Jan 7 18:31:00 2023 -0500

        [Op][O2c] Creation operators (#336)

    commit 5efc8f7224f83766875e74669e139ec82119a504
    Author: Ruihang Lai <[email protected]>
    Date:   Sat Jan 7 11:14:23 2023 -0500

        [TIR] Create Layout with specified axis dtype (apache/tvm#13663) (#340)

    commit ae71be06c8252c211642abb9d5b3e4583bdb6f6a
    Author: Ruihang Lai <[email protected]>
    Date:   Fri Jan 6 16:41:18 2023 -0500

        [Op][O2b] Statistical operators (#334)

    commit 8220df74e339cdb6dab38a803b80edc3cd6b92e2
    Author: Ruihang Lai <[email protected]>
    Date:   Thu Jan 5 18:31:48 2023 -0500

        [Op][O1][O2a] Utility, arithmetic and comparison operators (#333)

        As tracked by #332, this PR is the kickoff part of high-level operator introduction in Relax.

        This PR is about the milestone O1 and O2a. Specifically, this PR
        * introduces some of common utility functions that the registration and StructInfo inference of each operator will often use.
        * introduces unary arithmetic operators: cos, log, negative, sigmoid, sin, sqrt, tanh.
        * refactors and introduces binary arithmetic operators: add, divide, floor_divide, multiply, subtract.
        * introduces binary comparative operators: equal, greater, greater_equal, less, less_equal, not_equal.

        These operators are well tested from three perspective:
        P1. the op getter can get correct op by name
        P2. their StructInfo inference result are as expected under all kinds of cases
        P3. Relax TVMScript parser can parse the scripts with the op inside

        For operators in O2a, most operators share almost the same StructInfo inference logic. Therefore, for tests in P2, in each category, not every op is tested in every case. For each case, it is good to have only part of op in this category tested. This is intended not to make overlarge testing file.

    commit f1cab0a05f05829c4c35e2a7e613bd69f2a17fae
    Author: Siyuan Feng <[email protected]>
    Date:   Thu Jan 5 20:43:28 2023 +0800

        [TVMScript] Ensure consistent struct info between assign lhs and rhs with sinfo annotation (#328)

        * [TVMScript] Ensure consistent struct info between assign lhs and rhs with sinfo annotation

        * fix

        * fix

    commit dc7072efe290d7e8c69d8e216311510981fc82e1
    Author: Tianqi Chen <[email protected]>
    Date:   Wed Jan 4 10:13:08 2023 -0500

        [REFACTOR] Hide VM Impl, Improve execution logic. (#326)

        * [REFACTOR] Hide VM Impl, Improve execution logic.

        This PR refactors VM by hiding most of the VM implementations
        and improve the overall execution logic.

        - Unifies PackedFunc and Closure Table.
        - Update Closure mechanism to no longer depend on string.
        - Update VMMemoryLower to VMBuiltinLower to incorporate more VM intrinsic lowering,
          move some of the codegen intrinsic to this phase.
        - Allow directly pass in function index as VM instruction.

        * Address comment

    commit 2449d8c205f0b6e2c346132695b56039b07e9a10
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Tue Jan 3 22:04:16 2023 -0500

        [IR][ASTPrinter] Tweaks to AST printer's handling of struct info (#330)

    commit 2d352807090ba1b7e898fbdcb83d6d9427c762cf
    Author: Siyuan Feng <[email protected]>
    Date:   Tue Jan 3 23:20:47 2023 +0800

        [TVMScript] Enforce `I.DeclareFunc` to have function signature (#329)

    commit dcae50e836a0c2999f52d96a372fc7de584951f4
    Author: Tianqi Chen <[email protected]>
    Date:   Mon Jan 2 15:21:49 2023 -0500

        [BACKEND] Refactor and introduce full match-cast support. (#324)

        * [BACKEND] Refactor and introduce full match-cast support.

        This PR refactors VMShapeLower to introduce full match-cast support
        that enables nested tuples, type checks at argument boundary
        and symbolic shape computation.

        Along the way we also refactors cleans up some of vm codegen logic
        and adding unit-tests for different stages.

        * address comments

    commit a36920bf672d22e1d31e1e6f81d0447fd7a55806
    Author: Siyuan Feng <[email protected]>
    Date:   Mon Jan 2 23:31:04 2023 +0800

        [TVMScript] Fix empty TupleStructInfo (#327)

    commit 80710a826bda66532eeda978668ed157b471b186
    Author: Tianqi Chen <[email protected]>
    Date:   Fri Dec 30 15:57:50 2022 -0500

        [CONTAINER] Hash/Equal/JSON support for ShapeTuple (#325)

        This PR add hash/equal/json support for shape tuple.

    commit 343a1e7e2174612031c70ba8547577c7d21839e4
    Author: Tianqi Chen <[email protected]>
    Date:   Thu Dec 29 18:33:17 2022 -0500

        [REFACTOR] StructInfo M3: MatchShape=>MatchCast (#323)

        * Introduce match cast, and code changes along

        * add match_cast parser support (#9)

        * Match cast support for VMShapeLower CanonicalizeBinding

        * Remove `match_shape` (#12)

        * Refactor ExprVisitor/Mutator to consider Expr in StructInfo.

        Co-authored-by: Siyuan Feng <[email protected]>

    commit e332285559d61db1c5033b8d50cd9d4af6c6b6f4
    Author: Tianqi Chen <[email protected]>
    Date:   Thu Dec 29 01:28:09 2022 -0500

        [REFACTOR] StructInfo M2: Cleanups on legacy shape related items  (#320)

        * [REFACTOR] Remove shape function

        * [WIP] Remove shape_, runtime_dep shape

        * Remove shape_ pass Compile

        * Remove RuntimeDepShape (#11)

        * BlockBuilder: remove CanProveShapeEqual, consolidate binding emit to EmitNormalize

        * Remove DimType, make get_shape_of API different from op.shape_of

        Changes the init importing to direct import so the VSCode nagivator
        can directly jump to the defintion point.

        * Apply suggestions from code review

        Co-authored-by: Ruihang Lai <[email protected]>

        * Clarify cases where struct info can be determinstically derived

        * Fix remaining testcases

        * Remove InferShape/Type per comment.

        Co-authored-by: Siyuan Feng <[email protected]>
        Co-authored-by: Ruihang Lai <[email protected]>

    commit edadf247551f526188c0a08b3812ffc0a1f9d8bd
    Author: Ruihang Lai <[email protected]>
    Date:   Fri Dec 23 14:46:07 2022 -0500

        [Analysis] Optionally check structure info in well-formedness check (#321)

        With the introduction of structure info in #314, the well-formedness check will report malformed whenever an Expr doesn’t have defined structure info.

        However, when writing tests for well-formedness check and normalizer, usually we will manually construct the Exprs, which means their structure info are not defined most of the time. As a consequence, the well-formedness check will always complain “the Expr xxx doesn’t have structure info populated.” Therefore, when the checker fails to complain about the original reason of malformed, which means the checker is not working, the tests will still pass and we won’t be able to realize there is something wrong with the checker.

        Thus, in this PR we add an optional flag to the well-formedness check. In well-formedness tests, we will turn off the structure info check so that the original reason of being malformed will be revealed correctly.

        ---

        This PR also cleans up the DiagnosticContext parameter in the WellFormed API - the diag_ctx has been unused since the merge of #99.

    commit d548459a1736378398ab773dce413d90d49376cf
    Author: Ruihang Lai <[email protected]>
    Date:   Fri Dec 23 07:33:25 2022 -0500

        [Op] Enforce int64 output shape in CallTIR (#322)

    commit 10a87a455bbb84b0a0d20b22bd31784b9f4b9774
    Author: Chaosfan <[email protected]>
    Date:   Fri Dec 23 08:03:48 2022 +0800

        [Bugfix] Handle function name properly in Relax TVMScript printer (#317)

        * remove relax_func_name_ and change logic

        * well_formed check for globalvar and gsymbol consistency

        * revise the logic in well_formed and update test

        * Remove `global_symbol` in test_function_attr.py

        * Update docs

        Co-authored-by: Ruihang Lai <[email protected]>

    commit 29aebb9d24cbf52ab21fd98996633534301ef34d
    Author: Tianqi Chen <[email protected]>
    Date:   Wed Dec 21 20:21:57 2022 -0500

        [REFACTOR] M1: Change parser/printer to only depend on struct info (#319)

        * [REFACTOR] StructInfo M1: Parser/printer/Var/Function to only depend on struct info field

        * Update src/relax/backend/vm/vm_shape_lower.cc

        Co-authored-by: Ruihang Lai <[email protected]>

        * Address comments

        * Allow function to have default value

        Co-authored-by: Siyuan Feng <[email protected]>
        Co-authored-by: Ruihang Lai <[email protected]>

    commit e6173430f491c1d88d2ab77ce0ab43a8c602df30
    Author: Tianqi Chen <[email protected]>
    Date:   Wed Dec 21 00:42:29 2022 -0500

        [REFACTOR][ARCH] Introduce StructInfo M0 (#314)

        * [IR] Introduce StructInfo

        * StructInfoFunctor and Analysis Support

        * [TVMScript] Parse type/shape annotation with StructInfo

        * remove runtime type assign

        * Remove type/shape during parsing (#2)

        * Normalizer prep: simple checks and legacy function renaming.

        * Struct info deduction in BlockBuilder.

        * Two TODOs

        * StructInfo Normalizer Fixes (#3)

        * StructInfo AST Fix

        * Fix Extern Func Deduction and shape mutator.

        * Update VoidStructInfo & globalvar (#4)

        * Fix passes and proper sinfo propagation.

        * Refactor EraseToWellDefined to Enable Remapping

        * [WIP] First stab at symbolic param tracking

        * Update EraseToWellDefined to support symbolic shape return (#5)

        * fix R.shape with ndim (#6)

        * Remove update shape/type

        * Address review comment, AnnotateTypeShape=>AnnotateStructInfo

        * Update include/tvm/script/ir_builder/relax/frame.h

        Co-authored-by: Ruihang Lai <[email protected]>

        * Address comments

        * Update printer to use structinfo (#7)

        * Update Error mechanism to prep for obj loc based reporting

        * Symbolic shape aware function call return value derivation.

        The main flow works as follows:
        - Match and populate shape_var_map and var_map by visit each pair of
          param and call arguments.
        - Call EraseToWellDefined to map the ret parameter to new result.

        * [ANALYSIS] Refactor well-form to only look at struct info.

        * Update comments according to reviews.

        * Update include/tvm/relax/struct_info.h

        Co-authored-by: Ruihang Lai <[email protected]>

        Co-authored-by: Siyuan Feng <[email protected]>
        Co-authored-by: Tianqi Chen <tqchen>
        Co-authored-by: Ruihang Lai <[email protected]>

    commit 151701740fac3a53b35799a82c85d86f91b720ee
    Author: Tianqi Chen <[email protected]>
    Date:   Fri Dec 16 17:48:26 2022 -0500

        Update relay_translator.py

    commit ad0f3179a84b3bc167f91c3eb082cb996b1d04e2
    Author: Ruihang Lai <[email protected]>
    Date:   Fri Dec 16 17:37:00 2022 -0500

        [Translator] Remove global symbol and follow-up fix for #262 (#316)

        This PR removes the `global_symbol` linkage added by Relay Translator. It also fixes unaddressed comments of #262.

        All tests can pass locally and I believe it is safe to merge this PR directly.

    commit 850deded1201001d833ac65991fb1a4c6509cb1b
    Author: Ruihang Lai <[email protected]>
    Date:   Fri Dec 16 16:19:48 2022 -0500

        [Translator] Support translating op calls with Tuple input (#262)

        Previously, when a Relay function contains a Call which directly uses Tuples as arguments (the example below),
        ```
        %25 = (%23, %24) /* ty=(Tensor[(1, 160), float32], Tensor[(1, 160), float32]) */;
        %26 = concatenate(%25, axis=-1) /* ty=Tensor[(1, 320), float32] */;
        ```
        our Relay-translator is unable to generate corresponding CallTIR, because the translator always assumes a argument of a Call is mapped to a single tensor (see the code snippet below: the translator directly passes the Relax variable `new_args[-1]` to function `te_tensors`, which translate a Var to a single tensor).
        https://github.com/tlc-pack/relax/blob/60e9a01cdfdd013945790fc03d5abad29b8a7c0b/python/tvm/relax/testing/relay_translator.py#L124
        https://github.com/tlc-pack/relax/blob/60e9a01cdfdd013945790fc03d5abad29b8a7c0b/src/relax/ir/emit_te.h#L56-L61

        But in fact, the Relax variable may correspond to a Tuple of tensors, which wasn’t taken into consideration before. And such case can lead to error in `TETensor`, when creating tensors.

        Therefore, this PR fixes the issue by examine the Relax variable before the tensor creation of Relay Call arguments. If an argument has shape Tuple and type TupleType, we break down the tuple Variable and emit a TupleGetItem for each field, and meanwhile create a tensor for each field.

    commit 54a0ff551adb90937073675b4fb3d5439b814398
    Author: Siyuan Feng <[email protected]>
    Date:   Fri Dec 16 21:02:13 2022 +0800

        Remove relax parser_v1 (#313)

    commit b363dd48aced8fb939880db8cf595ed65b7ecc77
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Wed Dec 14 22:51:38 2022 -0500

        [Debugging][Arch] Expose `shape_` fields for `TupleGetItem` and `If` nodes, fix AST printer accordingly (#311)

        * Make the shape of If and TupleGetItem nodes accessible in Python

        * Remove order-dependency from AST printer tests

        * Trailing whitespace

    commit 4bb01fe4eccdd59614cc264838a389b21dd40388
    Author: Yuchen Jin <[email protected]>
    Date:   Wed Dec 14 08:11:47 2022 -0800

        [IR] Dedicated Relax Call, Constant, Tuple, TupleGetItem, If (#306)

        * relax.Constant.

        * Add callnode;

        * Tuple, tuplegetitem, If

        * mypy.

        * lint

        * rebase & fix printer.

        * rebase & remove virtual_device_

        * address comments & leave todos.

        * address comments.

        * address comments.

        * tuple index.

        * type anno.

    commit 4cda8a5881fd4cd2473258b35244fc4129b6110c
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Wed Dec 14 09:09:03 2022 -0500

        [BlockBuilder][Refactor] Normalize nested `SeqExpr`s (#310)

        Co-authored-by: Ruihang Lai <[email protected]>

    commit 5aab150f322526c1a7bfe6cea0f4d7a7543a7f46
    Author: Ruihang Lai <[email protected]>
    Date:   Tue Dec 13 17:06:06 2022 -0500

        [ExprMutator] No prologue in VisitWithNewScope when input is SeqExpr (#305)

    commit 0bf1f1b784f19298117e36016a2e522f58c143fc
    Author: Tianqi Chen <[email protected]>
    Date:   Tue Dec 13 15:27:05 2022 -0500

        [REFACTOR] Refactor BlockBuilder (#308)

    commit 28d598b6a7c55f95f8f9c2ccd5c860ba5451232d
    Author: Siyuan Feng <[email protected]>
    Date:   Sun Dec 11 01:28:56 2022 +0800

        [Normalizer] Combine Nearby Blocks in SeqExprs (#298)

    commit e152c50e368454afab75425fcb0863b1c328bf4c
    Author: Tianqi Chen <[email protected]>
    Date:   Thu Dec 8 19:33:18 2022 -0500

        [ARCH] Add VisitBinding second-level dispatcher in Expr type. (#301)

    commit fed6b8fc88b824ec68260417793447dbe524c4c3
    Author: Yuchen Jin <[email protected]>
    Date:   Wed Dec 7 16:55:40 2022 -0800

        [Linkage] Cleanup global_symbol attachment and linkage. (#300)

        * Cleanup global_symbol attachment and linkage.

        * lint

        * Add global_symbol to the main function in translation.

    commit e0907d4fd03af1731310647d3d0547bdff2cfaf6
    Author: Tianqi Chen <[email protected]>
    Date:   Tue Dec 6 21:35:20 2022 -0500

        [ARCH] Introduce NestedMsg to robustly handle nested-tuple analysis (#295)

    commit 2eb99975dc1b40b83db7dcbb96b748503dcb3319
    Author: Siyuan Feng <[email protected]>
    Date:   Mon Dec 5 21:57:21 2022 +0800

        [TVMScript] Update sccript printer to enable roundtrip tests (#291)

    commit f8ab9890e14c2533c401969ebf11dd591beff592
    Author: Hongyi Jin <[email protected]>
    Date:   Sun Nov 27 09:59:26 2022 -0500

        [RUNTIME] Correctly handling export_module when exporting modules of different type (#13489)

    commit 9009840e654a9900009f7776a19e26f29b1e3f85
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Fri Dec 2 18:33:50 2022 -0500

        [Debugging] Support PackedFuncType in the AST Printer (#289)

    commit bda0e42f05eaba657c40a850486e55c39924f3bf
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Fri Dec 2 18:31:39 2022 -0500

        [IR][Bugfix] Improvements to the normalizer and well-formed checker (#288)

    commit d5fe87b21546995c7a88905bd04b4e944d28a0f4
    Author: Yong Wu <[email protected]>
    Date:   Thu Dec 1 20:00:38 2022 -0800

        Enforce i64 index in ShapeExpr (#281)

    commit 9c9eb5585501a5da0f25ca38d7d3ac8269b6714c
    Author: Yuchen Jin <[email protected]>
    Date:   Thu Dec 1 11:00:47 2022 -0800

        [Parser] Register memory operators to new parser. (#279)

    commit 28c3f68cc51d2c22936c5496debcb8c2de54040b
    Author: Yong Wu <[email protected]>
    Date:   Thu Dec 1 08:55:31 2022 -0800

        [TVMScript] enable the closure test (#280)

        * [TVMScript] enable the closure tests.

    commit eb9d531b2565cdd000f46e5ecae2c45b9f589abe
    Author: Yuchen Jin <[email protected]>
    Date:   Thu Dec 1 05:47:05 2022 -0800

        [Normalizer] Enforce all Expr have checked_type_ invariance after normalization. (#287)

    commit 43f81ddf4afc2f4fdb214c9f994e844f53126cdb
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Mon Nov 21 19:25:43 2022 -0500

        [Debugging][Bugfix] Debug printer improvements: Print `shape_` and `checked_type_` for all nodes and handle non-binding `MatchShape`s (#261)

        The initial AST printer only included the `shape_` and `checked_type_` fields for variables because of the potential for infinite recursion (`shape_` nodes can contain other expressions, which in turn have `shape_` nodes). This PR cuts off the potential recursion to allow for printing these fields for all Relax expressions, which should be more useful for debugging.

        This PR also fixes a bug: The AST printer previously did not handle `MatchShape` bindings that did not bind a new variable.

    commit 304048c33956dddb5027fec26541d57f903d8ca2
    Author: YuchenJin <[email protected]>
    Date:   Thu Nov 17 17:02:11 2022 -0800

        Fix after rebase, and reorganize the TVMScript folder structure.

        Co-authored-by: Junru Shao <[email protected]>
        Co-authored-by: Siyuan Feng <[email protected]>

    commit e7277460f0a2c7c980be9323cdf7919dc38153e2
    Author: Siyuan Feng <[email protected]>
    Date:   Thu Nov 17 00:31:32 2022 +0800

        [TVMScript] Switch to the new parser (#276)

        * [TVMScript] Support cross-function call for relax function

        This PR adds support for cross-function call for relax function, by declaring a function signature (i.e. an empty function that contains params and return type/shape but w/o body.)

        However, the PR meets the issue of block_builder shape deduction, which does not use function `ret_shape` to infer the shape of GlobalVar Calls.

    commit 7152175762613130e3ba647c77cc9818312a5b06
    Author: Yuchen Jin <[email protected]>
    Date:   Sat Nov 5 16:45:33 2022 -0500

        [CI] Enable Mypy type checking for Relax; Fix typing errors to pass Mypy checking. (#270)

    commit 6f8f6da505b835345d7709d06bdfd8dddce7e85b
    Author: Lesheng Jin <[email protected]>
    Date:   Thu Nov 3 08:16:35 2022 -0700

        Introduce memory primitives (#255)

        Introduce the memory primitives, including `relax.memory.{alloc_storage, alloc_tensor, kill_storage, kill_tensor}`.

    commit 48b7c158cc01532f9019a2e615f2d94766a9464c
    Author: Siyuan Feng <[email protected]>
    Date:   Thu Oct 20 08:30:47 2022 +0800

        [TVMScript] Update Type Annotation Behavior of the Parser (#269)

        This commit changes the behavior of the parser to allow type annotations, as suggested by the community.
        The current behavior:
        - Use the more refined type/shape between user annotated and deduced type/shape.
        The updated behavior:
        - Always use user annotations
        - Only checks if the type/shape is valid.

    commit 5c3079bb6e1e4eeb4dc2d9b740facb2686c67519
    Author: sung <[email protected]>
    Date:   Mon Oct 17 19:07:01 2022 -0700

        Reenable autotvm silencer; fix e2e_auto_tir.py; fix lint.

        Co-authored-by: YuchenJin <[email protected]>

    commit 85b81292626ab6f23caf2b61095a6f957b61b21c
    Author: sung <[email protected]>
    Date:   Mon Oct 17 18:09:34 2022 -0700

        Recover: [Bugfix] Couple of bug fixes to run TVM-gen code together with BYOC (#249)

    commit c46ae8566582f1fcd8fcda1479943d3abb95b3b0
    Author: sung <[email protected]>
    Date:   Mon Oct 17 17:16:01 2022 -0700

        Recover: [Pass] Separate ApplyHistoryBest from tuning passes (#226)

    commit 83bc7cb144643d5823bf06220186528923835667
    Author: Junru Shao <[email protected]>
    Date:   Sun Oct 16 22:52:56 2022 -0700

        Enable Hexagon tests

    commit f9f4f7904ec5468a725b2ba924a619a7c5ed4e43
    Author: Junru Shao <[email protected]>
    Date:   Sat Oct 15 15:25:56 2022 -0700

        Recover dropped commits

        [TVMScript] B4: If branch support (#263)
        B8: Local Function Support  (#258)
        [TVMScript] B3: Type annotation checks (#256)
        [TVMScript][Parser] B1: Dataflow block (#252)
        [TVMScript] B2: match shape support (#251)
        [TVMScript] B6/B7: Symbolic shape and var shadowing  (#245)
        [TVMScript] B5: Support relax op (#244)
        [TVMScript] B0: Call_tir support (#243)
        enhance parser error reporting (#242)
        [TVMScript] A1: Relax Parser infra (#240)
        update ci image versions. (#241)
        [TVMScript] B2-4: TIR IRBuilder (#239)
        [TVMScript] A0: Relax IRBuilder infra (#235)
        [TVMScript] B5-6: TIR IRBuilder (#231)
        [TVMScript] B1: IRBuilder (#228)
        [TVMScript] New Parser: Part C (#218)
        [TVMScript] New Parser: Part A (#221)
        [TVMScript] New Parser: Part B (#217)

        Not recovered:
        [Pass] Separate ApplyHistoryBest from tuning passes (#226)
        [Bugfix] Couple of bug fixes to run TVM-gen code together with BYOC (#249)

        co-authored-by: Yuchen Jin <[email protected]>
        co-authored-by: Siyuan Feng <[email protected]>
        co-authored-by: Ruihang Lai <[email protected]>

    commit 65a53034bc0bee9877a1bdf363c2eadcde35f226
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Thu Oct 13 23:06:55 2022 -0400

        [Op][Debugging] Add `assert` operator (#260)

        It was brought up that Relay lacks an assert operator, so we may as well have one in Relax for debugging. One issue is that we can't name it "`assert`" because Python will treat it as a syntax error to have it as a field name for the "`relax`" module, i.e., `relax.assert` is a syntax error. Thus the op is named "`assert_op`," which is not ideal but serves its purpose.

    commit 71d96e6c0a314936fa49fd7bc1ea79069027ab12
    Author: Yuchen Jin <[email protected]>
    Date:   Wed Oct 12 05:07:33 2022 -0700

        [Pass] Support Function and If in Normalize pass. (#268)

        * Support Function and If in Normalize pass.

        * Use structural equality for expr_memo_.

        * Change back to pointer equality for expr_memo_; Add more tests.

        * rebase.

    commit 312a344cdeec66b1330a80d34ca78556fb338e7c
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Tue Oct 11 18:25:29 2022 -0400

        [Analysis] Expose analyses related to vars in Python (#265)

        Previously, analyses to gather up all variables, free variables, bound variables, all global variables, and all global variables that are called had been implemented in C++ but had not been exposed in Python or tested. This PR exposes these analyses and adds tests for them.

        Two further changes:
        * The analyses previously ignored variables bound in `MatchShape` nodes; these are now treated as bindings too.
        * `rec_global_vars` is renamed `called_global_vars`, since the analysis itself does not check recursion.

    commit 132702be7e7ed0256045d7a405e532c3d5beef6d
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Mon Oct 10 18:19:38 2022 -0400

        [Expr] Allow annotating return shape on function nodes (#253)

        This PR adds a `ret_shape` field for specifying the shape of the function's return value. At present, we will not use this information, but by adding it into the AST, we will be able to parse the return shape and use it in the future. Parser V1 in this PR will just always list the `ret_shape` as `RuntimeDepShape`.

    commit 7276c9e2ee13a4754775491ca36a7aae2d55b827
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Sat Sep 24 00:11:45 2022 -0400

        [Bugfix][VM] Properly convert tensor inputs in `save_function` (#257)

        It was observed that closures saved using `save_function` would crash when used over RPC with the `time_evaluator`, whereas using `set_input` and `invoke_stateful` worked as normal. While I am not entirely sure why these failures happened over RPC only in `time_evaluator` (but not in other RPC trials), it became clear that `set_input` performs a conversion of input tensor values in `SetInputTensorWithIndex`, while `save_function` was not doing this. Adding this conversion fixed the observed bug.

    commit 7183c7ffbe896dd9b5f5742b62afe9c821dae682
    Author: Josh Fromm <[email protected]>
    Date:   Wed Sep 21 17:07:08 2022 -0700

        [Call TIR] Fix bug when invoking call_tir with scalar values. (#254)

        This small PR changes a check in the tvmscript parser to support empty shape tuples which are used to represent scalars. I added a scalar addition test to make sure it works properly.

    commit 605ba8d1548efb90980f9b18ea94f1d53f9ec3ec
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Wed Sep 14 17:27:03 2022 -0400

        [Bugfix][Op] Register attributes for unique and print (#248)

        Attempting to use `dump_ast` on functions containing the operators `relax.unique` and `relax.print` previously crashed due to being unable to query their attributes' keys. It turned out that this was a problem with the operator attributes: They had not been registered on the Python side, so Python representation treated them as opaque TVM objects. This PR corrects this mistake.

    commit f4525dd8a3e61f572b50107555cef4b469c971f4
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Wed Sep 14 17:24:40 2022 -0400

        [VM][Benchmarking] Add option for saving e2e results as CSV file (#247)

        This PR makes some small additions to the end-to-end AutoTIR script, namely eliminating a bug (it was incorrectly using the stateful API) and adding an option to save the test results as a CSV file for benchmarking purposes (the data can then be separately analyzed as needed).

        These changes also required a small extension to the save_function method in the VM, namely allowing it to take keyword arguments.

    commit f1ee4b6cd2c3ee0596cef6f5b7ff7e715fb4ae0d
    Author: Ruihang Lai <[email protected]>
    Date:   Wed Sep 14 17:23:29 2022 -0400

        [BugFix] Enable emit global MatchShape (#246)

        Fix an incorrect check which disables emitting global MatchShape outside a dataflow block and mistakenly enables emitting dataflow MatchShape outside a dataflow block.

    commit 0a7a0a9daf5f1a2fa06ee6cd6169a28d397821fa
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Thu Sep 8 09:49:05 2022 -0400

        [Pass] Canonicalizing Bindings (#233)

        It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

        This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

    commit 7a6f91f7d4077eebf926aa1f19281404494b9362
    Author: Prakalp Srivastava <[email protected]>
    Date:   Thu Sep 1 07:02:57 2022 -0400

        [Hexgaon] Use uploaded path to load module. (#238)

        * Fixes a bug to use the uploaded file remote path for loading the module
        remotely.

        * Modifies the task_python_hexagon.sh script to only run passing test
        on device. This is used by Jenkins CI.

    commit e50290140c204ae091e335b797a07f2f6567a163
    Author: Lesheng Jin <[email protected]>
    Date:   Thu Aug 18 21:51:35 2022 -0700

        [Pass] New Python ExprVisitor/ExprMutator! (#190)

        Add decorators `visitor` and `mutator` to help users create `ExprVisitor` and `ExprMutator` in Python. Users can customize visit/rewrite/post-order-rewrite function in Python.  `PyExprVisitor` and `PyExprMutator` lists the functions users can customize.

    commit 7313855476cc522bf3e8bdbe7a60b82cd725fe4c
    Author: Ruihang Lai <[email protected]>
    Date:   Thu Aug 18 15:20:06 2022 -0400

        [BugFix] Expose `relax.expr.Constant` to `relax.Constant` (#230)

    commit cdfd4e939f2d1e88c560a05d83ddf2f7afe70304
    Author: Siyuan Feng <[email protected]>
    Date:   Thu Aug 18 02:25:13 2022 +0800

        [FIX] Fix windows build issue when allocating a dynamic array (#219)

        In the current codebase, kNumArgs is a runtime-dependent variable (i.e. its value depends on the input shape of Array).

        Allocating arrays with runtime values is not allowed during building on Windows (I'm surprised it can be compiled on Linux and macOS)

    commit 887762cd97686ae23a61609ca9ffc8d6a2c5178b
    Author: Yong Wu <[email protected]>
    Date:   Mon Aug 15 08:00:31 2022 +0800

        Update with rebase

    commit 5a23346bc437043b48866411e39dfcf066edda59
    Author: Yuchen Jin <[email protected]>
    Date:   Sun Aug 14 14:44:12 2022 -0700

        [Bugfix][VM] Fix var binding to a ConstantNode; Force VM if.cond register to take an NDArray instead of POD. (#216)

        Fix the bug in #212. The cause of this bug is VM Codegen did not handle binding ConstantNode to variable (`x = relax.const([1, 2])`) and save the constant NDArray to the register. Previously the codegen only handles the case where ConstantNode as CallNode's arguments. Now it's fixed and unit test is added.

        Fix the bug in https://github.com/tlc-pack/relax/issues/214#issuecomment-1211411432, the issue was caused by the VM simply read the condition register of the If instruction, and expect it to be a POD int or bool. https://github.com/tlc-pack/relax/commit/811e877c289fa52f55886c8a3e8dce10ed84915f adds a `LoadScalarInt` function similar to the Relay VM to check the If.cond register stores an NDArray, and cast it to int_64. Since we haven't introduced PrimValue and PrimType (that represents POD values like int and bool) to the Relax language yet, let's enforce `If->cond` to be a Tensor (NDArray at runtime).

    commit 6c9d403503297a0d0e28318bafcba9fc9c99ae42
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Fri Aug 12 13:53:28 2022 -0400

        [VM][UX] Allow for saving closures to avoid extra dictionary lookups in timing trials (#208)

        This PR implements a function that allows for saving a `PackedFunc` in the VM's module that just calls an existing function with a specific set of arguments to address #179 and #178. The main use of this is for timing, to avoid some overhead in looking up functions.

    commit e172b40af31dc3384adbcf6e7b0bce7f31ce41ea
    Author: Jiawei Liu <[email protected]>
    Date:   Thu Aug 11 19:55:57 2022 -0500

        [Pass][UX] Statement rewriter for DataflowBlock (#210)

        - Implements a few APIs to quickly perform statement-level mutation: `add`/`remove_unused`/`remove_all_unused`/`replace_all_uses`.
        - Implemented `remove_all_unused` to remove dead statements inside `DataflowBlock` cc: @psrivas2
        - Address minor issues (unnecessary headers and bad docstrings) in https://github.com/tlc-pack/relax/pull/163

    commit 37791e0a5d4a495365fd647f2cecbed16f3a3785
    Author: Jiawei Liu <[email protected]>
    Date:   Thu Aug 11 13:50:56 2022 -0500

        Clean warning messages by Clang and Pylint (#215)

        * refact: clean clang warning in relax

        * refact: fix pylint

        * fix cpplint and clangd suggestions

        * fix: no cpplint on virtual-override

    commit 0b00715dc634aa7f091e942a54a29ee9c802ccf9
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Wed Aug 10 11:47:37 2022 -0400

        [VM][UX] Implement stateful API (#207)

        This PR implements the stateful API discussed in https://github.com/tlc-pack/relax/issues/179. It ensures that if you use `set_input` to set inputs, you must use `invoke_stateful` to run the function (otherwise failing) and must obtain the results using `get_output`. It handles nested tuple returns.

    commit ed7b77e040654582d1ab1b9535ebbc4da77da243
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Tue Aug 9 17:07:52 2022 -0400

        [Op][Debugging] Add a print operator (#201)

        * Attempt at adding a print operator

        * Fix the registration

        * Actually use the format string

        * Improve test

        * Fix comment placement

        * Improve the docstring for relax_print

        * Handle tuples too

        * Formatting :(

        * Correct commit message

        * Match attr name across Python and C++

        * Make print variadic

    commit a9bd3053c1106d1926fce1dc5787fc8be27f3985
    Author: Sunghyun Park <[email protected]>
    Date:   Fri Aug 5 11:45:03 2022 -0400

        [Pass] Implement legacy lowering pass that leverages relay op strategy (#189)

        This PR implements Relax Op lowering that leverages existing Relay Op Strategy (legacy).
        As ops like conv2d, matmul are relay-, relax- independent, this pass assumes that we can always find relay op equivalents for such relax ops and use their info to leverage the relay op strategy.

    commit 1a1bcf75d97b2e7e4f758b6cd08bd747b222ef36
    Author: Sunghyun Park <[email protected]>
    Date:   Thu Aug 4 17:56:17 2022 -0400

        [Pass] Introduce metaschedule as a tuning pass (#188)

        This PR delivers MetaSchedule tuning as a tuning passes.
        We can either tune at IRModule level with relax.transform.MetaScheduleTuneIRMod or tune at primfunc level with relax.transform.MetaScheduleTuneTIR.

    commit 7144654633477ea0d2bff300ba753dc8bfdeae4d
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Thu Aug 4 14:34:10 2022 -0400

        [Example][UX] Make the RPC timeout configurable in the `e2e_auto_tir` example (#186)

        Running the e2e_auto_tir example over RPC can run into issues due to timeouts because some models can take a long time to run on some machines. This PR makes the RPC timeout configurable to more easily address these issues.

    commit 81e565e5df90cfe12d22deb7b26845ea3aa13526
    Author: Tianqi Chen <[email protected]>
    Date:   Wed Aug 3 19:38:21 2022 -0400

        Fix BlockBuilder Scope Recovery in Misuse (#199)

        This happens in interactive usecases. When function scope
        exit triggers an error, we need to recovery the
        BlockBuilder.current properly so users can try again.

    commit 21b1e7dc35dc838214cd4b6f26fbc31492323b02
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Wed Aug 3 19:09:21 2022 -0400

        [Testing][AST] Add a simple AST printer for debugging (#198)

        * Add ast printer

        * Print seq expr body

        * Match annotation field names to real AST

        * Handle call attrs and func ret types

        * Add more advanced test cases

    commit 89f55c8167a80b4b9c8751309b5db648fb4db047
    Author: Jiawei Liu <[email protected]>
    Date:   Wed Aug 3 09:59:47 2022 -0500

        [UX] Adopt changes from tvm-main and render code with IPython.display (#192)

        Render code with IPython.display.HTML if possible to fix the ansi-escape 24-bit rendering issue in Colab.

    commit 0b52b558eb14b3f113a4b543c8f0a824baaa58bc
    Author: Jiawei Liu <[email protected]>
    Date:   Mon Aug 1 11:59:24 2022 -0500

        Dataflow Pattern Lang: Core Matching Features (#163)

        The structure is similar to the Relay's pattern matcher (https://github.com/apache/tvm/pull/5231). The main difference is that those pattern types are adopted to be relax-compatible. Relay pattern types, some less used patterns (IfPattern) and df-topological patterns (DominatorPattern) are ignored (some of them will be brought later).

        The implementation splits patterns into two parts:
        - **Match an Expression**: match an expression syntactically (`MatchExprPattern`, i.e., `DFPatternMatcher`);
        - **Match a Graph**: match a graph (cross multiple `VarBinding`) topologically (`MatchGraphPattern`);

    commit 74371634e9a011e63650b734aba20546b016c524
    Author: Jiawei Liu <[email protected]>
    Date:   Tue Jul 26 20:06:25 2022 -0500

        [UX] Highlight TVMScript with Pygments (#185)

    commit 15e54ef215950944ffd74858c12c30aabcb0dcce
    Author: Siyuan Feng <[email protected]>
    Date:   Sat Jul 23 11:22:13 2022 +0800

        [Pass] Enhance BindParams to take numpy dict as input (#184)

    commit cf2e3b97110c805597059c5ba8303a653417e080
    Author: Steven S. Lyubomirsky <[email protected]>
    Date:   Mon Jul 18 21:45:21 2022 -0400

        [Bugfix][VM] Ensure set_input works over RPC by not returning an array of argument names (#183)

        Currently, attempting to use the VM's `set_input` method will fail over RPC because `set_input` calls `get_func_param_names`, which returns an array of parameter names. RPC does not support sending arrays. This PR corrects this issue by instead having `set_input` query the function arity and then query the argument names one by one, which is the approach taken by the Relay VM (accordingly, the names for the functions used to do this, `get_function_arity` and `get_function_param_name`, are taken from the Relay VM).

        This PR also adds a unit test over RPC on localhost.

    commit b0e57dbc0862499c3f2a7d91858354c41fcf5e95
    Author: Yong Wu <[email protected]>
    Date:   Fri Jul 15 11:50:29 2022 -0700

        Fix after rebase

    commit 3494b7a47bf0f7c3219538b2e9064b825cf3258c
    Author: Sunghyun Park <[email protected]>
    Date:   Mon Jul 18 00:38:41 2022 -0400

        [Pass Infra] Tuning API serialization and database support (#168)

        * refactor tuning API to support serialization of Choice, Knob, Trace

        * Implement tuning api JSON database

        * Add comments

        * fix pylint

        * fix cpplint

        * reflect feedback

        * add minor comment for the future work

    commit 777549a6037cc97b698f53ed629cf65c33ae7eca
    Author: Siyuan Feng <[email protected]>
    Date:   Mon Jul 18 00:05:14 2022 +0800

        [Fix] fix windows build issue (#182)

        TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS is needed when we have a default-like constructor (e.g. (Span span = Span()))

    commit b81e6a9838f92ba412a0bd4951a46cc61a43a22d
    Author: Siyuan Feng <[email protected]>
    Date:   Mon Jul 18 00:04:03 2022 +0800

        fix print twice issue (#181)

    commit d4cc79ed664bbe34a4d9dab2923cd5a7a7c5b52c
    Author: Lesheng Jin <[email protected]>
    Date:   Thu Jul 14 09:15:44 2022 -0700

        [Pass] Python ExprMutatorBase/ExprMutator (#172)

        - Rewrite ExprFunctor in Python. New ExprMutatorBase and ExprMutator in Python.
        - Implement demo passes: RewriteFMA and FuseFMA with Python ExprMutator.
        - Expose some functions to ffi in block_builder.py

    commit 01cdc4d43258b1fb9dcc630f05f38f792e3bc513
    Author: Prakalp Srivastava <[email protected]>
    Date:   Tue Jul 12 19:25:51 2022 -0400

        [VM] Deprecate API to save/load executable to file (#176)

        Executable `save_to_file` and `load_exec_from_file` API was used to
        save/load just the executable to/from file. This was confusing as it did
        not export the TensorIR kernels in the Relax Module, thus leading to
        bugs such as https://github.com/tlc-pack/relax/issues/175.
        Moreover, the API was only used in some tests, and not useful for end
        user.

        Deprecating this API to have a single uniform way of
        serializing/deserializing TVM IRModule using `export_library` and
        `tvm.runtime.load_module` API.

    commit 74b3d67e8ae74aed3446a5ae5a05b8f5586e2c3b
    Author: Yuchen Jin <[email protected]>
    Date:   Fri Jul 1 09:31:30 2022 -0700

        [Refactor] Generic dispatching for `IsBaseOf`; Simplify Type/Expr initializations; `relax` -> `R` in printer; Disallow local function in VMCodegen (#171)

        - Generic dispatching for `IsBaseOf`: `IsBaseOf` uses a bunch of if-else to check if the subtype relation between the base type and derived type, now it's changed to use a generic TypeFunctor to dispatch on the base class to do the check.
        - Simplify Type/Expr initializations: We had to write `RuntimeDepShape(Span()`), `ObjectType(Span())` to initialize several Types and Exprs, this is due to the `TVM_DEFINE_OBJECT_REF_METHODS` macro that sets the constructor with `= default`. By changing to use `TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS`, we can now just write `RuntimeDepShape()` without specifying an empty span.
        - `relax` -> `R` in printer: Change to print `R` rather than `relax` in TVMScript as the default behavior. This is consistent with our test cases and TIR convention: using `T` as shorthand.
        - Disallow generating code for local function in VMCodegen: these local functions should have been lifted in the lambda lifting pass before codegen.

    commit 8fdc3ba3eae0d1ffc535e240be251aaae5546eb8
    Author: Prakalp Srivastava <[email protected]>
    Date:   Thu Jun 30 15:14:40 2022 -0700

        [Parser] Enable R.parser.pretty_print to print TIR PrimFunc (#174)

        This way we can have a uniform API to print IRModule, TensorIR
        function and Relax functions.

    commit ed0414540c9fbc063aa727cfc71bdee51a4bafdd
    Author: Prakalp Srivastava <[email protected]>
    Date:   Wed Jun 29 08:20:17 2022 -0700

        Update tests to use `set_input` for rpc calls. (#173)

        Fix relax-hexagon tests to use set_input api, which is the correct way to invoke a function over RPC.

    commit 1f962bda7a79d13fee1a4f9f4ad3ddde4f5467b2
    Author: Sunghyun Park <[email protected]>
    Date:   Tue Jun 28 20:49:33 2022 -0400

        [BYOC][PASS] Prototype implementation of modular compilation w/ TensorRT (#164)

        This PR delivers the prototype of the followings:
        - Relax BYOC JSON codegen
        - Relax BYOC TensorRT codegen
        - Extension in Relax VM to support external modules
        - `RunCodegen` pass: run codegen for the annotated relax functions
           - Annotation (dispatch decision) will be done by earlier passes  e.g., greedy heuristic, Collage
           - The generated runtime module and Codegen itself should be tvm object
        - Misc minor code improvement for other passes

    commit f25fe0c80670272582db3aa791901c7fa49fc59e
    Author: Prakalp Srivastava <[email protected]>
    Date:   Tue Jun 28 12:47:07 2022 -0700

        Run static/dynamic models over Hexagon using Relax VM RPC (#167)

        * Move Relax VM builtins to src/runtime.

        * This fixes a bug we encountered while loading the module for hexagon.
        Since it was building the minimal runtime it was missing definition
        of Relax VM builtins.

        * Mark Hexagon module as DSO exportable.

        * Load Relax VM Executable over RPC

        * Support allocation for shape heap on device

        Co-authored-by: Yuchen Jin <[email protected]>

    commit 25174be634b5e04f0468b48bd477f22b17e75f84
    Author: Prakalp Srivastava <[email protected]>
    Date:   Fri Jun 24 13:33:04 2022 -0700

        [CI] Enable Hexagon CI in Jenkins. (#169)

        Running all Hexagon tests in simulator is very slow. So we only run
        Relax related hexagon tests `test_relax_integration.py`.
        This test file is empty right now and it would be
        populated as we push relax-hexagon related changes.

    commit 225aecdb5d7d33f2af048f3aef9c9a6ac758f4fd
    Author: Yuchen Jin <[email protected]>
    Date:   Thu Jun 23 09:47:30 2022 -0700

        [VM] Add set_input interface; Fix e2e tuning script. (#166)

        * Add set_input interface.

        * Address comments.

    commit 29a707cbd9be6e02dd8a3cd1961cfb53057eb51b
    Author: Lesheng Jin <[email protected]>
    Date:   Thu Jun 16 09:07:45 2022 -0700

        WellFormed Instrument (#165)

        * add conftest for test/python/relax

        * [Wellformed Check]: allow TupleType as Function parameters

        * move WellFromedInstrument to relax.ir.instrument

        * add header

    commit b4c3c4bb65b09db7c9b3ec114d6680d14f306d37
    Author: Yong Wu <[email protected]>
    Date:   Sat Jun 11 23:26:17 2022 -0700

        Update after rebase

    commit 3c0e3c0ee08c78b17cc1ba0429727c199737403a
    Author: Yuchen Jin <[email protected]>
    Date:   Sat Jun 11 18:42:29 2022 -0700

        [Relay translator] Allow replacing default topi function with user-provided TIR PrimFunc. (#159)

        * Add replace_op_with_tir to translator.

        * came up with a better name

        * better doc.

    commit f250f93eed886dc2c3a1cb1f8a4ab2077c57080e
    Author: Yong Wu <[email protected]>
    Date:   Sat Jun 11 15:20:21 2022 -0700

        [Pass] Lambda Lifting (#99)

    commit b55fd31d4e11373b30a93f88412a3d6e2d21d3c1
    Author: Siyuan Feng <[email protected]>
    Date:   Tue Jun 7 10:07:17 2022 +0800

        [E2E] End-to-End tuning e2e_script (#153)

        Co-authored-by: Ruihang Lai <[email protected]>
        Co-authored-by: Hongyi Jin <[email protected]>

    commit d3f94e73ec7b9c9ac7b3675f962e9030e55fa603
    Author: Prakalp Srivastava <[email protected]>
    Date:   Thu Jun 2 08:19:18 2022 -0700

        Fix shape lowering pass bug for non i64 dims. (#152)

        Prior to this change, VM Shape Lowering pass did not cast integer values
        to shape heap dtype (i64) which resulted in incorrect values when read
        from heap later. This PR adds a cast to i64 for such values.
        This also adds well-formed check to ensure shape dimensions are of
        integer types.

    commit 9cf777f48069d598eda276be0b9aabaf301acf0f
    Author: Yong Wu <[email protected]>
    Date:   Wed Jun 1 17:52:40 2022 -0700

        [Parser] Add FuncType support (#154)

        * [Parser] Add FuncType support

        * Address comments

    commit f99121d506df45870cd026e052f5b3c41d4bd982
    Author: Sunghyun Park <[email protected]>
    Date:   Wed Jun 1 09:01:40 2022 -0700

        [PASS] Remove Unused Functions in IRModule (#151)

    commit a718e9f9e073ca0ea1790562254c09aaa863eaa4
    Author: Sunghyun Park <[email protected]>
    Date:   Tue May 31 15:15:28 2022 -0700

        [Pass Infra] Tuning Pass API (#144)

    commit a485b7bdb45f8379daa45e8c923a47fd6871cbdf
    Author: Tianqi Chen <[email protected]>
    Date:   Sun May 29 12:51:07 2022 -0400

        [REFACTOR] Move TIR op kind analysis to relax as it is relax oriented (#155)

        This also keep TIR mostly independent from higher-level IR.

    commit abd20bdc9b87aa53e0c27e8c5c3fc195be5e8c91
    Author: Siyuan Feng <[email protected]>
    Date:   Sun May 29 23:31:05 2022 +0800

        add test cases for FuseTIR (#156)

    commit de42ec3d5ae0f0304060460764619a5a16995a33
    Author: Siyuan Feng <[email protected]>
    Date:   Thu May 26 22:14:51 2022 +0800

        [Pass] Relax Transform FuseTIR (#150)

        * [Pass] Relax Transform FuseTIR

        Co-authored-by: Hongyi Jin <[email protected]>
        Co-authored-by: Ruihang Lai <[email protected]>

    commit 153d0cc8f2d39b23e63fcd6feaf9755a0eaf8c28
    Author: Yuchen Jin <[email protected]>
    Date:   Wed May 25 15:44:59 2022 -0700

        [Mutator] Separate unnormalized-form and normal-form mutators (#148)

    commit dfa42c09a3087605e805526ab7db7b49d6752ca5
    Author: Prakalp Srivastava <[email protected]>
    Date:   Fri May 20 16:30:18 2022 -0700

        Print/parse tir cast/max operations in Relax shape (#149)

        tir.cast and tir.max are commonly used operators in shape expression in
        Relax. These two operators often show up when importing Relay module
        with `Any` dims to Relax module.

    commit c7186fd44ad5865d84ac61fc2981a15c8af9be4c
    Author: Prakalp Srivastava <[email protected]>
    Date:   Thu May 19 18:29:12 2022 -0700

        Add support to import relay models with Any dim. (#146)

        Converts Relay Any dimension to symbolic dim in Relax.

    commit ef9cf6baba1c2f7215746459ad5a9193df6572c9
    Author: Yuchen Jin <[email protected]>
    Date:   Tue May 17 07:55:56 2022 -0700

        Refactor shape lowering pass and Blockbuilder. (#145)

    commit 230def2284c21eaff520e58fa96a80313b6a7c8f
    Author: Yong Wu <[email protected]>
    Date:   Fri May 13 14:30:05 2022 -0700

        Support Closure (#140)

    commit 0e998988aabdeb8d913e2889eb5a9d72bee35ca2
    Author: Lesheng Jin <[email protected]>
    Date:   Thu May 12 17:13:15 2022 -0700

        [Analysis] IRModule well-formed check (#142)

    commit 1bd4e685ffcc0c4b677af47ecc8609dbfacdfd9d
    Author: Yong Wu <[email protected]>
    Date:   Wed May 11 09:31:13 2022 -0700

        Change after rebase

    commit d0ad35b375449c7e067a1edada7502557a03dd26
    Author: Siyuan Feng <[email protected]>
    Date:   Tue May 10 08:44:22 2022 +0800

        FuseOps for relax (#141)

        Co-authored-by: Ruihang Lai <[email protected]>
        Co-authored-by: Hongyi Jin <[email protected]>

    commit ae7b5b79c40498203842b6c9193e91bcc1937bea
    Author: Prakalp Srivastava <[email protected]>
    Date:   Wed May 4 20:52:16 2022 -0700

        Add `relax.unique` operator in Relax. (#135)

        * Add Unique operator in Relax.

        This adds the functionality to register a packed function implementation of
        any operator using `FCallPacked` attribute. The relax operator would be
        lowered to a call to the registered packed function during codegen.
        For example, in this change relax.unique is lowered to
        `relax.run.unique` packed function which uses torch.unique under the
        hood.

        * Add support for integer constants in Relax VM.

        This adds serialization, deserialization, and print support for
        integer constants.

    commit 1ca18611ae59ab4d1667066ed9921690d2a5611c
    Author: Siyuan Feng <[email protected]>
    Date:   Tue May 3 09:34:55 2022 +0800

        Add ShapeType to ShapeExpr.checked_type during construction (#139)

    commit 6481d533ed259a080dede704f7443c4a2221a842
    Author: Sunghyun Park <[email protected]>
    Date:   Mon May 2 16:26:08 2022 -0700

        Introduce Relax function attribute and drop name field in Relax function (#136)

    commit d735ebd719d89c804691b29ee0d881c785384fc6
    Author: Yuchen Jin <[email protected]>
    Date:   Sat Apr 30 18:45:14 2022 -0700

        [BlockBuilder] Sub function call shape deduction: constant shape case. (#137)

    commit 10f8e56cbcb27beb373075e3c6e3a9728ffb5eb2
    Author: Yuchen Jin <[email protected]>
    Date:   Thu Apr 28 16:59:38 2022 -0700

        [AST][Type] Introduce ObjectType; Infer the type of call_packed by type_args; Refactor InferType/InferShape. (#132)

    commit 7e2038a8b662659dd6ba2e2a86bedbc6c3891bfa
    Author: Yuchen Jin <[email protected]>
    Date:   Mon Apr 25 17:20:19 2022 -0700

        [AST][BlockBuilder] Normalize relax.Function; Refactor BlockBuilder to take optional input IRModule. (#133)

    commit f1eca6d74365c6b0665b64c86ececce86fd76df3
    Author: Prakalp Srivastava <[email protected]>
    Date:   Sun Apr 24 07:09:11 2022 -0700

        [Printer][Parser] Modify Tensor annotation printing and parsing. (#128)

    commit 296876eaf1246ea7948c69d2111cfea2ca51ca0c
    Author: Lesheng Jin <[email protected]>
    Date:   Fri Apr 22 08:05:13 2022 -0700

        [Pass] Python pass decorator and ExprFunctor (#126)

        * Relax ExprFunctor in Python

        * fix the register bug

        * Expr_functor in relax

        * function/dataflowblock Pass in python

        * testcases

        * reformat

        * fix Tensor annotation()

        * add return type hint

        * type hint

        * new test

        * fix typo

        * remove memo

    commit 5199a206cc86cee9e43b0c8ddddf704acdc4b513
    Author: Ruihang Lai <[email protected]>
    Date:   Thu Apr 21 22:20:33 2022 +0800

        [Relax][MS] Task extraction with proper weights (#129)

        * [Relax][MS] Task extraction with proper weights (hzfengsy#32)

        * Add a unit test

        * Update the deduplication mapping / Update the unit test

        * Update test for DummyDB reusing

        * Remove unnecessary args

        * Remove unused import

    commit badee2add6700f12…
junrushao pushed a commit to junrushao/relax that referenced this pull request Jan 25, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit to junrushao/relax that referenced this pull request Jan 26, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit to junrushao/relax that referenced this pull request Jan 29, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit to junrushao/relax that referenced this pull request Feb 5, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit to junrushao/relax that referenced this pull request Feb 6, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit that referenced this pull request Feb 8, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit that referenced this pull request Feb 8, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit that referenced this pull request Feb 8, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit that referenced this pull request Feb 8, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit that referenced this pull request Feb 9, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit that referenced this pull request Feb 9, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
MasterJH5574 pushed a commit to apache/tvm that referenced this pull request Feb 22, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Feb 22, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Feb 22, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Feb 24, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 4, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 5, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 5, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
jwfromm pushed a commit to octoml/relax that referenced this pull request Mar 8, 2023
* [microTVM] Fix tvmc tutorial (#14076)

This PR applies appropriate changes to make sure the CI fails if micro_tvmc.sh tutorial fails. This issue was captured in #14074.
This PR also makes changes to avoid this breakage in bash script tutorials in future. In addition, this PR fixes the bug in running TVMC tutorial which happened due to renaming zephyr_board to board.

* [MetaSchedule] Introduce Async Pipeline in MultiLevelTiling (#14009)

This PR introduces async pipeline in the current TVM's MultiLevelTiling Rules. This PR is based on apache/tvm#13966, which is already merged. This is because some conv2d workload will use `tir.if_then_else` to pad the input to the correct size, and this PR uses async copy in such copy statement.

1. Add a subrule in `src/meta_schedule/schedule_rule/multi_level_tiling.h/.cc` that annotate async copy for mlt in supported arch (>= sm80).

In CUDA Core, this PR has a perf boost of around 1T GFLOP/s in most Conv2d test cases and 1T ~ 2T in most GEMM test cases.

All generated codes, scripts, and traces are available at https://github.com/Rainy-Memory/tvm-async-rule-benchmark.

Currently tested on commit `afbfb7aa7e43732cb716f8e443df696110be6afc` in conv2d NHWC workload, with a RTX 3080 GPU.

**Notice: given the stochastic nature of evolutionary search, perfromance might become worse if enable this PR.**

Workload: Conv2d NHWC

|Shape|Mainline TVM|Mainline TVM with Async|Performance Boost|
|-|-|-|-|
|N=1_H=224_W=224_C=3_K=64_R=7_S=7_STR=2_PAD=3_DIL=1|13838.05219|14687.89452|6.141343581679319%|
|N=1_H=56_W=56_C=64_K=64_R=1_S=1_STR=1_PAD=0_DIL=1|5398.305085|5613.892553|3.9936140067192905%|
|N=1_H=56_W=56_C=64_K=64_R=3_S=3_STR=1_PAD=1_DIL=1|11652.96825|13157.88249|12.91442839038028%|
|N=1_H=56_W=56_C=64_K=256_R=1_S=1_STR=1_PAD=0_DIL=1|10638.8309|11674.68499|9.736540600527816%|
|N=1_H=56_W=56_C=256_K=64_R=1_S=1_STR=1_PAD=0_DIL=1|8692.32829|9469.264089|8.938178277203573%|
|N=1_H=56_W=56_C=256_K=128_R=1_S=1_STR=2_PAD=0_DIL=1|4685.767442|5698.19634|21.606469175684712%|
|N=1_H=28_W=28_C=128_K=128_R=3_S=3_STR=1_PAD=1_DIL=1|9872.787087|10404.60405|5.38669535070061%|
|N=1_H=28_W=28_C=128_K=512_R=1_S=1_STR=1_PAD=0_DIL=1|9974.281496|10073.31657|0.9929043414276753%|
|N=1_H=28_W=28_C=512_K=128_R=1_S=1_STR=1_PAD=0_DIL=1|7075.866932|8564.572712|21.039199780135142%|
|N=1_H=28_W=28_C=512_K=256_R=1_S=1_STR=2_PAD=0_DIL=1|3648.330914|4021.923142|10.240086132713124%|
|N=1_H=14_W=14_C=256_K=256_R=3_S=3_STR=1_PAD=1_DIL=1|8192.954618|9160.182054|11.805599824451525%|
|N=1_H=14_W=14_C=256_K=1024_R=1_S=1_STR=1_PAD=0_DIL=1|8008.870153|9362.825279|16.90569456283206%|
|N=1_H=14_W=14_C=1024_K=256_R=1_S=1_STR=1_PAD=0_DIL=1|5210.062241|6051.208379|16.144646629759908%|
|N=1_H=14_W=14_C=1024_K=512_R=1_S=1_STR=2_PAD=0_DIL=1|2550.787202|3587.902938|40.65865373586739%|
|N=1_H=7_W=7_C=512_K=512_R=3_S=3_STR=1_PAD=1_DIL=1|4350.626084|5432.788068|24.873706981617943%|
|N=1_H=7_W=7_C=512_K=2048_R=1_S=1_STR=1_PAD=0_DIL=1|6672.068026|7663.725217|14.862815953549454%|
|N=1_H=7_W=7_C=2048_K=512_R=1_S=1_STR=1_PAD=0_DIL=1|3142.564263|4297.988014|36.766909259541826%|

Workload: GEMM NN

|Shape|Mainline TVM|Mainline TVM with Async|Performance Boost|
|-|-|-|-|
|M=512_N=256_K=640|8678.46|10607.37|22.226408832903555%|
|M=512_N=384_K=256|8109.13|10290.72|26.902886006267003%|
|M=512_N=512_K=512|11419.83|14000.86|22.601299669084398%|
|M=512_N=3072_K=768|19709.39|18351.61|-6.8890006235606425%|
|M=512_N=768_K=3072|12844.59|13730.88|6.90010346768561%|
|M=896_N=896_K=896|16149.91|16131.39|-0.11467556165947945%|
|M=1024_N=1024_K=1024|18842.11|19662.8|4.355616223448428%|
|M=1152_N=1152_K=1152|15386.79|16736.1|8.769275462913303%|
|M=1536_N=1536_K=1536|18522.67|18872.06|1.88628313304725%|
|M=2048_N=2048_K=2048|19515.42|18874.85|-3.282378754851291%|
|M=3072_N=3072_K=3072|19233.9|19291.42|0.2990553137948975%|
|M=4096_N=4096_K=4096|17122.17|19259.01|12.479960191961652%|

* [TVMScript] Use op attribute to control whether to print dtype in TVMScript (#14111)

This PR adds an op attribute `TScriptDtypePrintLocation`, and modifies the dtype printing logic of the builtin op to check this attribute. So that user defined operators can use it to specify how there dtype argument are printed by appending attributes instead of appending members to `dtype_first_arg`/`dtype_last_arg`.

* [Fix][TVMScript] Fix index of metadata in printed script (#14130)

Currently, if the same metadata object (e.g. a multi-line `tir.StringImm`) is referenced for more than one times in an IRModule, each reference will have different indices of the metadata array. For example, this code

```
str_imm = T.StringImm("aaa\nbbb\n")
@I.ir_module
class Module:
    @T.prim_func
    def foo() -> None:
        A = str_imm
        B = str_imm

    @T.prim_func
    def foo1() -> None:
        A = str_imm
Module.show()
```

where `str_imm` is referenced three times, will generate such output:

```
@I.ir_module
class Module:
    @T.prim_func
    def foo():
        A: T.handle = metadata["tir.StringImm"][0]
        B: T.handle = metadata["tir.StringImm"][1]
        T.evaluate(0)

    @T.prim_func
    def foo1():
        A: T.handle = metadata["tir.StringImm"][2]
        T.evaluate(0)
```

Each time has a different metadata index. 

This PR fixes this problem by detecting duplicate item in `IRDocsifierNode::AddMetadata`.

* [Pytorch] frontend full_impl fix (#14122)

Minor fix in pytorch frontend to compile gpt2 model, reproduce script.
torch_version = 1.13.1
transformers_version = 4.26.1

```
from transformers import GPT2LMHeadModel
import torch
import tvm
from tvm import relay

inp = torch.ones((1, 128)).to(torch.int64)
input_shapes = [("input_ids", ((1, 128), "int64"))]

model = GPT2LMHeadModel.from_pretrained('gpt2', return_dict=False)
trace_model = torch.jit.trace(model, inp, strict=False)
outputs = trace_model(inp)

mod, params = relay.frontend.from_pytorch(trace_model, input_shapes)
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target='llvm', params=params)

runtime = tvm.contrib.graph_executor.GraphModule(lib["default"](tvm.device('cpu', 0)))
runtime.set_input("input_ids", inp.numpy())
runtime.run()
out = runtime.get_output(0).numpy()
print(out)
print('Done...')
```

Before the fix, the error message
```
Traceback (most recent call last):
  File "gpt2_compile.py", line 13, in <module>
    mod, params = relay.frontend.from_pytorch(trace_model, input_shapes)
  File "/home/ubuntu/apache_tvm/tvm/python/tvm/relay/frontend/pytorch.py", line 4791, in from_pytorch
    outputs = converter.convert_operators(_get_operator_nodes(graph.nodes()), outputs, ret_name)
  File "/home/ubuntu/apache_tvm/tvm/python/tvm/relay/frontend/pytorch.py", line 4164, in convert_operators
    relay_out = relay_op(
  File "/home/ubuntu/apache_tvm/tvm/python/tvm/relay/frontend/pytorch.py", line 841, in full
    return self.full_impl(data, fill_value, dtype)
  File "/home/ubuntu/apache_tvm/tvm/python/tvm/relay/frontend/pytorch.py", line 743, in full_impl
    fill_value = _expr.const(fill_value, dtype=dtype)
  File "/home/ubuntu/apache_tvm/tvm/python/tvm/relay/expr.py", line 707, in const
    raise ValueError("value has to be scalar or NDArray")
ValueError: value has to be scalar or NDArray
```

because `fill_value` is
```
%0 = cast(64, dtype="float32");
power(%0, 0.5f)
```

* [DOCKER] Configurable NDK version support (#14000)

Let the Android NDK version configurable as a command line argument

* [Fix][TIR] SampleCategorical apply-to-schedule (#14133)

This PR is another way to fix the issue described in #14118.

Since we do not have a standard for json file on the format of float
numbers (for example, we cannot require a json file producer to print
the "integer" float numbers with at least one decimal), and the json
parser is not responsible for determining if an integer in a json file
should be parsed to a float or an int, the most convenient way of fixing
the SampleCategorical issue will be allowing both FloatImms and IntImms
as input, and converting all IntImms to FloatImms accordingly.

This PR fixes the issue in this way.

* [Arith] ConstIntBound was incorrectly assuming bounds were over int64… (#13918)

[Arith] ConstIntBound was incorrectly assuming bounds were over int64_t range

This commit improved the floormod and floordiv conversion check to be
simpler for the negative range and adds a  test to cover all integer data types.

* [CMSIS-NN] Reduction in code size of AOT test runner binary (#13815)

* [CMSIS-NN] Reduction in code size of AOT test runner binary


Co-authored-by: Ashutosh Parkhi <[email protected]>

* [CMSIS-NN] Add a runtime error message (#13643)

[CMSIS-NN] Add a runtime error message

APIs TVMAPISetLastError and TVMGetLastError are used to propagate CMSIS-NN
errors caught in the backend. AOT test runner was improved to observe the contents
of this global variable. A test was added to check for the last set error as part of this
commit.

* [CRT]Cleanup unused macros in crt_config.h.template (#14125)

This PR removes old macros in crt_config.h.template.

* [Fix][Relay] Fix axis transformation in squeeze shape function (#14135)

* fix squeeze shape function issue and add testcase.

* fix lint

* [Unittest] merge test_cp_async_in_if_then_else into test_tir_transform_inject_ptx_async_copy (#14138)

This PR merge two related unittests into one.

* [Frontend][TFLite] Fix conv2d import bug (#14124)

* Fix TFLite frontend bug and add test

* lint

* [ONNX][TORCH] Replace scatter op by scatter_elements (#14019)

* remove scatter attr class

* update pytorch: scatter was replaced by scatter_elements

* remove scatter compute and strategy registration

* remove scatter attrs registration

* update onnx front-end: replace _op.scatter by _op.scatter_elements, add checks

* update oneflow front-end

* update paddlepaddle front-end

* update pytorch utils

* remove front-end scatter definition

* fix scatter strategy for rocm

* small update

* remove scatter definition in back-end

* remove scatter strategy for cuda, gpu. transfer special case to scatter_elements

* fix test

* small fix

* upstream scatter with torch description

* last upstream of scatter in pytorch front-end

* fix reduction attribute in cuda strategy

* set scalar to test instead of tensor. update check for dynamic dim

* skip scalar source check in tests for scatter due to issue on torch side

* remove scatter op implementation from topi/cuda

* remove scatter op implementation from topi. small clean code

---------

Co-authored-by: Valery Chernov <[email protected]>

* [TVMScript][Printer] Remove relax prefix for now (#14140)

Remove relax prefix for now

This PR cleans up relax prefix in printer for now.
While these setups are useful and do not cause any
technical debts in the codebase. We remove it given requests.
They can be added back to unity branch and later as part
of upstream

* [microNPU] Sum legalization support (#13997)

Supports legalizing a relay sum operation to an equivalent series of NPU operations. It supports case with int8 output type and channel axis.

* [Fix][MetaSchedule] Fix redundant stages in async pipeline for mlt (#14143)

This PR fixes redundant stages if visiting `InitializeWithTuneContext`
multiple times.

* [COMMUNITY] Cheng Wen -> Reviewer (#14153)

Please join me @chengven027-intellif as a new Reviewer in TVM.
Cheng has contributed to ONNX/PyTorch frontend and Relay passes, making TVM support more input models.

- [Commits History](https://github.com/apache/tvm/pulls?q=author%3Achengven027-intellif+)
- [Code Review](https://github.com/apache/tvm/pulls?q=reviewed-by%3Achengven027-intellif+)

* [Runtime] Fix high RAM usage when saving / loading paramters of big models   (#14147)

* add load_params_from_file

* add save_params_to_file

* avoid making another copy in save_params

* black

* add test

* update doc

* [Relay][Frontend] Span Filling PyTorch (#14050)

* [Relay][Frontend] Span Filling PyTorch

- Construct debug name of C graph instruction as the source name of span for pytorch model.
- To get the reference of renamed nodes. Add a function to export the converted C graph after conversion.
- Add structural_equal comparisons with and without set_span to the existing test cases.
- Add span test cases for frequent conversions.
- Add span test case for exporting model parameter.

* [SpanFillingPyTorch]

- Return TupleGetItem expr from TupleWrapper with the span of its Tuple.
- Add None type symbol in set sapn for certain conversion.
- Add current_op member varible to PyTorchOpConverter to track which op
  is converting for pytorch frontend.

* [SpanFillingPyTorch]

- Fix the error caused by the quantized params not found after renaming
  the debug name of C graph.

---------

Co-authored-by: Joey Tsai <[email protected]>

* [TRT][BYOC] allow strided_slice ops on selected dimensions (#14142) (#14144)

* [ONNX][TOPI] Add `DFT` operator (#13999)

* init convertor for DFT

* init test for DFT

* init DFT operator in Relay

* update topi implementation for DFT

* clean up

* update ONNX frontend

* support  attribute

* fix error: Expected Array[Tensor], but got Array[index 0: Array]

* support inverse, onsided, dft_lenght

* update tests for DFT

* update TOPI test for DFT

* add documentation

* fix pylint

* fix cpplint

* fix cpplint

* fix threshold for FP16 (ARM)

* add CUDA compute

* fix pylint

* fix doc string

* code review fixes for ONNX front-end

* code review fixes for TOPI

* rename: stft.py -> signal.py

* pass input_shape and output_shape to verify_dft

* [CRT][microTVM] Enable USMP by default for AoTExecutor + CRT runtime (#14107)

This PR enables USMP by default when AoTExecutor and CRT runtime are selected. Check forum discussion about this change: https://discuss.tvm.apache.org/t/enable-usmp-by-default-in-aot-executor-with-runtime-crt/14406

As a result, the workspace memory in mlperftiny project type is removed since memory allocation is not required. If we keep this workspace, the model doesn't fit since some of the memory is allocated twice.

* [Android] Fix using system libraries in Android apps (#14145)

- Starting from API 31, using `uses-native-library` is required if we
  want to open system library:
  https://developer.android.com/about/versions/12/reference/compat-framework-changes#enforce_native_shared_library_dependencies

  We should specify OpenCL library in `user-native-library` in all
  applications where OpenCL backend might be used.

- Updated README files and describe how to fix synchronization issues
  in Android Studio.

* [microTVM]Enable TVMC micro with AoT Executor  (#14077)

This PR enables AoT Executor for tvmc micro compilation.

* [bugfix] Fix the write buffer scope of `mma_store_impl` (#14174)

fix

* [Relay] Enhance EliminateCommonSubexpr to support Tuple argument (#14169)

If an argument of a call is a Tuple, we should check its fields.
Different tuples with the same fields should be treated as same inputs

* [TIR] Fix typo in doc (#14178)

* [microTVM] Use QNN schedules to give SOTA performance (#13752)

In #13242, I rewrote microTVM's convolution schedules to give a major improvement in performance. While I demonstrated in tests that my changes worked, they could not be used with relay.build.

This pull request expands the functionality of #13242 and adds new legalize and alter_op passes to take advantage of the quantized schedules. This dramatically improves performance on some models, dramatically cuts RAM usage, and removes the need for autotuning on microTVM. More specifically, for the vww model from MLPerf Tiny running on the nucleo_l4r5zi, this pull request:

- Improves untuned performance from 1741 ms to 137 ms - a 6.8x improvement!
- Improves tuned performance from 337 ms to 137 ms.
- Sets a new state-of-the-art for MLPerf Tiny, beating Plumerai's previous 208 ms record
- Reduces RAM consumption by 73 KB (a large amount on microcontrollers!) by eliminating intermediate buffers.
- Reduces flash consumption for model weights by 5x
- Slightly improves accuracy

@mehrdadh has kindly tested these changes himself, and has confirmed my 137 ms figure.

To enable the schedules that grant these performance improvements, this pull request:

1. Adds out_layout support to the regular and depthwise conv2d schedules from [microTVM] Modernize Arm Cortex-M convolution schedules #13242.
2. Generalizes the schedules from [microTVM] Modernize Arm Cortex-M convolution schedules #13242 to be more widely applicable.
3. Adds a layout alternation pass to ensure regular and depthwise conv2d schedules always get their desired input formats.
4. Adds a conv2d -> depthwise conv2d -> unpadded conv2d rewrite step to remove empty channels from conv2d operators.
5. Adds a conv2d -> average pool -> dense rewrite step to remove empty channels from conv2d operators.
6. Adds an alter_op pass to fold padding into a separate Relay operator.

* Add v0.11.0 docs link to site (#14181)

Update the version menu in TVM documentation to add a specific v0.11.0 release docs link.

* [TIR] Allow TransformLayout with non-inversible index map (#14095)

* [TIR] Allow TransformLayout with non-inversible index map

TransformLayout requires the index map to have inverse map that can be
calculated by the analyzer in order to check whether padding is added.
However, such check doesn't always work for all cases because of
limitation of the affine analysis that can only handle a set of
supported patterns. In some cases, even if the index map doesn't
introduce padding, the schedule primitive throws `TransformationIntroducesPaddingError` because it
fails to calculate the inverse index map.

It is safe to allow buffer being padded without providing pad_value
because the original loop extent is not changed and the padded region is not accessed.
This PR changes the behavior of `TransformLayout` to allow
non-inversible index map.

Previous discussion:
https://discuss.tvm.apache.org/t/conflict-free-shared-memory-permutation-in-tensorir/13959/9

* add assume_injective_transform option

* Apply suggestions from code review

Co-authored-by: Siyuan Feng <[email protected]>

---------

Co-authored-by: Siyuan Feng <[email protected]>

* [TIR][Analysis] Implement IdentifyMemCpy analysis function (#13947)

* [HotFix][MetaSchedule] Turn off database shash check (#14188)

At this moment, the structural hash values of IR in TVM is platform
dependent (e.g., the hash values of a String may differ on different
platforms). In our recent practice, we found this an obstacle for us
to apply one existing database on different platforms (say we tune
an IRModule with MetaSchedule on Metal, and then apply the database
on CUDA, etc.)

To clear this obstacle, we decide to remove the shash value check. The
purpose of that check is mainly to ensure safety, and thus turning it
off will make no difference in terms of using MetaSchedule in most of
the cases that we can imagine.

Meanwhile, it is equally important that we need to make our structural
hash platform independent. There are plans ongoing for this target.

* [TOPI] Batch Norm Training Mode (#14190)

Prior to this PR, TOPI batch_norm only supports inference.

This PR adds training: bool flag and momentum: float argument to support training mode (update moving_mean / var and return), which aligns with torch.nn.functional.batch_norm.

* [TOPI] Group normalization (#14193)

As more and more ML models nowadays contain the group normalization
computation, we find it beneficial to introduce this op to TOPI level.
It will enable us to optimize the group normalization operation as a
whole in a more convenient way.

This PR introduces the group normalization op to TOPI. The group norm
operation was introduced in https://arxiv.org/abs/1803.08494. The
implementation uses tuple reduction, same as the implementation of layer
norm. Implemented with tuple reduction, the corresponding generated TIR
function can be optimized by cross-thread reduction or rfactor through
MetaSchedule.


Co-authored-by: Bohan Hou <[email protected]>

* [Fix][TIR] LowerCrossThreadReduction with write-back predicate (#14199)

Prior to this PR, the cross-thread reduction lowering pass does not
add a store predicate to the write-back block. This is in consideration
that for a certain write-back buffer position, all values being stored
(by all the threads) in the write-back block are the same. Since all
threads are writing the same value, we were assuming that not having a
write-back block predicate is fine, because the result will not be wrong
in any way.

However, recently we noticed that some GPU backend compiler will capture
this behavior (multiple threads writing a same position) as a race
condition and thus throw compilation error. The compiler does not take
the fact that all values being stored are the same, and insist on
complaining.

This means that we will still need the write-back block predicate to
make things work. And this PR does this change. I have done integration
tests locally to make sure that the generated kernels is right and
produces the right results numerically.

* [Unity] Relax VM (#13878)

This PR implements a flexible register-based VM to execute relax programs with dynamic shape and control flow. Design: https://github.com/tlc-pack/relax/wiki/Relax-VM-Design.

Co-Authored-by: Ziheng Jiang <[email protected]>
Co-Authored-by: Ruihang Lai <[email protected]>
Co-Authored-by: Sunghyun Park <[email protected]>
Co-Authored-by: Junru Shao <[email protected]>
Co-Authored-by: Prakalp Srivastava <[email protected]>
Co-Authored-by: Yong Wu <[email protected]>
Co-Authored-by: Steven S. Lyubomirsky <[email protected]>
Co-Authored-by: Tianqi Chen <[email protected]>
Co-Authored-by: Hongyi Jin <[email protected]>

* [Unity] Relax expressions and types (#13901)

* [Unity][IR] First-class StructInfo (#13907)

* [Unity][IR] First-class StructInfo

Relax tracks structural information (such as tensor shape) via `StructInfo` about the values in Relax.

* Fix rust build

---------

Co-authored-by: Junru Shao <[email protected]>

* [Unity][CI] Unity specific jenkins setup (do not upstream to main) (#13910)

This PR setup a unity specific jenkins with minimum jenkinsfile
without sharding and disables most of the tests to reduce overall
cost. We can add tests of unty branch by configuring the specific
groovy file.

* [Unity] Basic StructInfo Analysis and Expr construction (#13916)

[Unity] Basic StructInfo Analysis and Expr construction.

This PR adds struct info analysis and expr support.
These are logics to construct the IR node and perform
struct info related analysis.

Testcases are added to cover the IR node construction
and related struct info analysis checks.

Co-authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Altan Haan <[email protected]>
Co-authored-by: Andrew Liu <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Lesheng Jin <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Prakalp Srivastava <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Steven S. <Lyubomirsky [email protected]>
Co-authored-by: Sunghyun Park <[email protected]>
Co-authored-by: Yixin Dong <[email protected]>
Co-authored-by: Yong Wu <[email protected]>
Co-authored-by: Ziheng Jiang <[email protected]>

* [Unity] Relax BlockBuilder and ExprMutator (#13926)

This PR adds BlockBuilder: the core data structure to construct Relax AST, and ExprMutator: performs AST mutation for implementing transformation passes.

Co-Authored-by: Tianqi Chen <[email protected]>
Co-Authored-by: Altan Haan <[email protected]>
Co-Authored-by: Andrew Liu <[email protected]>
Co-Authored-by: Hongyi Jin <[email protected]>
Co-Authored-by: Jiawei Liu <[email protected]>
Co-Authored-by: Junru Shao <[email protected]>
Co-Authored-by: Lesheng Jin <[email protected]>
Co-Authored-by: masahi <[email protected]>
Co-Authored-by: Prakalp Srivastava <[email protected]>
Co-Authored-by: Ruihang Lai <[email protected]>
Co-Authored-by: Siyuan Feng <[email protected]>
Co-Authored-by: Steven S. <Lyubomirsky [email protected]>
Co-Authored-by: Sunghyun Park <[email protected]>
Co-Authored-by: Yixin Dong <[email protected]>
Co-Authored-by: Yong Wu <[email protected]>
Co-Authored-by: Ziheng Jiang <[email protected]>

* [Unity] Relax TVMScript Parser. (#13932)

This PR adds the TVMScript parser/ir_builder support based on the blockbuilder.

Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Yuchen Jin <[email protected]>
Co-authored-by: Steven S. Lyubomirsky <[email protected]>
Co-authored-by: Yong Wu <[email protected]>

* [Unity] Relax TVMScript Printer (#13944)

This PR introduces Relax as a dialect supported by the TVMScript
Printer. Some caveats:
- Needs to rebase to mainline before merging.
- Some tests are skiped because some operators are not upstreamed to
  the unity branch yet.

Co-authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Yuchen Jin <[email protected]>
Co-authored-by: Steven S. Lyubomirsky <[email protected]>
Co-authored-by: Yong Wu <[email protected]>
Co-authored-by: Prakalp Srivastava <[email protected]>
Co-authored-by: Sunghyun Park <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>

* [Unity] Relax VM codegen (#13954)

* [Unity] Relax VM shape lowering pass (#13956)

This PR introduces Relax `FunctionPass` and `DataflowBlockPass` API, and the `VMShapeLower` pass to lower the shape expression in Relax to TIR functions and VM shape heap builtin functions.

Co-Authored-by: Ziheng Jiang <[email protected]>
Co-Authored-by: Lesheng Jin <[email protected]>
Co-Authored-by: Altan Haan <[email protected]>
Co-Authored-by: Junru Shao <[email protected]>
Co-Authored-by: Prakalp Srivastava <[email protected]>
Co-Authored-by: Ruihang Lai <[email protected]>
Co-Authored-by: Siyuan Feng <[email protected]>
Co-Authored-by: Steven S. <Lyubomirsky [email protected]>
Co-Authored-by: Sunghyun Park <[email protected]>
Co-Authored-by: Tianqi Chen <[email protected]>
Co-Authored-by: Yong Wu <[email protected]>

* [Unity] e2e Relax minimum build flow (#13961)

This PR introduces the e2e Relax lowering flow (`relax.vm.build`). Tests for each pass in the flow are added.

Co-Authored-by: Altan Haan <[email protected]>
Co-Authored-by: Andrew Liu <[email protected]>
Co-Authored-by: Hongyi Jin <[email protected]>
Co-Authored-by: Jiawei Liu <[email protected]>
Co-Authored-by: Junru Shao <[email protected]>
Co-Authored-by: Prakalp Srivastava <[email protected]>
Co-Authored-by: Ruihang Lai <[email protected]>
Co-Authored-by: Siyuan Feng <[email protected]>
Co-Authored-by: Steven S. <Lyubomirsky [email protected]>
Co-Authored-by: Sunghyun Park <[email protected]>
Co-Authored-by: Tianqi Chen <[email protected]>
Co-Authored-by: Yong Wu <[email protected]>
Co-Authored-by: Ziheng Jiang <[email protected]>

* [Unity][TVMScript] Use explicit `R.shape` in TVMScript (#13979)

As we've introduced `arg_sinfo` in CallNode, implicit shape constructor
is not widely used in TVMScript. This PR removes the implicit shape since
it may cause confusion between shape and tuple.

* [Unity] Relax op: index (#13987)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the tensor indexing operators.

* [Unity] Relax op: datatype (#13986)

* [Unity] Relax op: set (#13990)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the set operators.

Co-authored-by: Prakalp Srivastava <[email protected]>

* [Unity] Relax op: image (#13994)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the image operators.

* [Unity] Relax op: arithmetic, comparison (#13983)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the unary, binary and ternary arithmetic and
comparison operators.

Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Chaofan Lin <[email protected]>

* [Unity] Relax op: statistical (#13991)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the statistical operators.

* [Unity] Relax op: neural networks (#13993)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the neural network operators.

* [Unity] Relax op: creation (#13984)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the tensor creation operators.

* [Unity] Relax op: linear algebra (#13988)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the linear algebra operators.

Co-authored-by: Siyuan Fneg <[email protected]>

* [Unity] Relax op: search (#13992)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the search operators.

* [Unity] Relax op: manipulation (#13989)

This PR is about the high-level tensor computation operators in Relax.

This PR includes the tensor manipulation operators.

Co-authored-by: Prakalp Srivastava <[email protected]>

* [Unity] NestedMsg Support utility (#13995)

This PR introduce NestedMsg to robustly handle nested-tuple analysis.

Relax support nested tuple structures in the IR.
Nested tuple structure is important to support advanced groupings in
cases such as gradient calculation and other scenarios.

The possible presence of nested tuple does mean that we need to to
robustly handle analysis that contains nested tuple structures in a dataflow graph.

This PR introduces a NestedMsg<T> class that corresponds to a possibly
nested message tuple for a given leaf message class T.
We also introduces various helper functions to compose and decompose messages.

Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Yixin Dong <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>

* [Unity][Pass] Operator Fusion Passes (#14001)

[Unity][Pass] Operator fusion passes

This PR introduces three passes for operator fusion:
1. AnnotateTIROpPattern: analysis the operator kind from PrimFunc.
2. FuseOps: fuse operators for Relax functions, which adds a new fused
relax primitive function.
3. FuseTIR: fuse corresponding TIR PrimFuncs for the fused relax.

* [Unity][Pass] LambdaLift pass (#14012)

* [Unity][VM] Supporting "compiled" exec mode. (#14015)

[VM] Supporting "compiled" exec mode.

This PR adds support of "compiled" mode to the VM. The compiled mode translate
the relax function into TIR function and drive it through the TIR function.

It is different from the micro AOT codegen, which generate TIR code that targets
the micro C runtime environment and useful for resource limited settings with
smaller set of features. Both leverages the low-level TIR build that is also shared with TensorIR.

The current implementation targets full TVM (VM) runtime, that comes with PackedFunc,
object, tuple, closure and all kinds of rich structure support. This also mean that
we can leverage the full runtime support to handle things like allocation, dynamic shape,
easy plugins and python interaction, which are not available in more limited runtime.

The user directly use the same API to load the generated code regardless of
compiled mode or bytecode. And just need to change one line

```python
ex = relax.vm.build(mod, target, exec_mode="compiled")
```

The simplicity is thanks to the TVM runtime archiecture that allows us to compose things together in objects.
The only difference is how the PackedFunc of high-level driving is being provided. In the case of bytecode
it is normal interpretation and in the case of compiled mode it is TIR.

It is a complete implementation Unit-testcases are added. All codegen build tests are updated to include two
exec_modes and have passed locally.

Co-authored-by: Junru Shao <[email protected]>

* [Unity][Pass] BindParams pass, FoldConstant pass (#14016)

This PR introduces FoldConstant/BindParam passes.

Co-authored-by: Yong Wu <[email protected]>
Co-Authored-by: Hongyi Jin <[email protected]>
Co-Authored-by: Siyuan Feng <[email protected]>

* [Unity][Pass][TuningAPI] Introduce TuningAPI and MetaSchedule pass (#14014)

Add TuningAPI and MetaSchedule tuning pass

* [Unity] Relay -> Relax translator  (#14026)

This PR implements a Relay to Relax translator, which allows us to import Relay workloads to Relax for benchmarking and development purposes (tests and examples are added).

* [Unity][Pass] Normalize Pass (#14031)

This PR implements relax `Normalize` Pass, which allows users to transform Relax IR to normal form, i.e., the expressions are normalized (no nesting and hence the AST is in ANF), and all `checked_type_` and `shape_` of expressions are available. (tests are added).

Co-Authored-by: Yuchen Jin <[email protected]>
Co-Authored-by: Ruihang Lai <[email protected]>
Co-Authored-by: Siyuan Feng <[email protected]>
Co-Authored-by: Tianqi Chen <[email protected]>

* [Unity][BlockBuilder] CallTE convert PrimValue args  (#14028)

Prior to this PR, the `call_te` of BlockBuilder is not capable of converting PrimValue arguments and directly rejects PrimValues instead. This PR fixes this behavior with PrimValue conversion support and one regression test.

Co-authored-by: Siyuan Feng <[email protected]>

* [Unity][Pass] Wellformed Analysis (#14032)

This PR implements relax wellformed analysis, which checks if the IRModule is well-formed. (tests and examples are added).

Co-Authored-by: Ruihang Lai <[email protected]>
Co-Authored-by: Siyuan Feng <[email protected]>
Co-Authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Steven S. Lyubomirsky <[email protected]>
Co-authored-by: Yong Wu <[email protected]>
Co-Authored-by: Yuchen Jin <[email protected]>
Co-Authored-by: Yixin Dong <[email protected]>
Co-Authored-by: Chaofan Lin <[email protected]>
Co-Authored-by: Prakalp Srivastava <[email protected]>
Co-Authored-by: Junru Shao <[email protected]>

* [Unity][TVMScript] Move tir/relax import in script out of __init__.py (#14033)

Prior to this PR, `python/tvm/script/__init__.py` imports both tir and relax
submodules. This leads to the phenomenum that when people does
```python
from tvm.script import tir as T
```
, the relax submodule will be implicitly visited by `__init__.py` as well.

Since TIR does not rely on Relax, it is good not to import both of them
at the same time. (This can prevent cyclic imports sometimes.)

This PR does this decoupling by introducing two files

* `python/tvm/script/relax.py`
* `python/tvm/script/tir.py`

and removing the imports from `python/tvm/script/__init__.py` and
`python/tvm/script/parser/__init__.py`. With this change, we force people to
manually do `from tvm.script import tir` and `from tvm.script import relax`
to use TVMScript parser, which is right our conventional way.

* [Unity][Pass] Operator legalization (#14029)

This PR is the operator legalization pass, which transforms high-level
operator calls to `call_tir`s of corresponding low-level TIR PrimFuncs.

- The legalization pass provides customizability, which enables people
to pass in a customized legalization map to override the default
legalization method.

- The legalization supports symbolic shape. (At this moment only pooling
does not support symbolic shape, as TOPI pooling does not support. This
needs to be fixed in followup PRs.)


Co-authored-by: Chaofan Lin <[email protected]>
Co-authored-by: Yixin Dong <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>

* [Unity][Op] Add ShapeExpr Tests for Reshape Op (#14035)

This PR specially checks the relax.reshape operator when the input is a ShapeExpr.

* [Unity] Initial PyTorch Frontend (#14037)

[Unity] Initial PyTorch Frontend

This PR introduces initial pytorch frontend components of Relax, including
- a FX translator that translates a Torch FX graph module to an TVM IRModule,
- a Relax-backend of Torch Dynamo, which brings the mechanism to build PyTorch model using Relax compilation pipeline,
- a pipeline prototype that contains the collection of pre-defined pipelines that optimizes and lower IRModule before passing to minimum build.

Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>

* [Unity][Pass] Block-level static memory planning (#14038)

This PR introduces the static memory planning pass on binding block level, as well as an analysis function that estimate the memory usage after the memory planning pass. It supports the following features: nested-tuples,  reuse memory of the input of reshape ops, an estimator that returns total memory size needed to be allocated before and after memory planning, as well as the number of tensors / memory blocks to be allocated before and after memory planning.

The estimation is static -- it does not consider control flows (such as “if” and cross-function calls). It simply accumulates the size of every alloc_tensor and alloc_storage.

We will produce “`relax.memory.alloc_tensor/storage`” as the results produced by memory planning.

* [Unity] Disallow inline prim_func in relax IR (#14040)

Disallow inline prim_func in relax IR

* [Unity] Update tests to adapt to latest TVMScript syntax (#14039)

Given that some latest changes of TVMScript syntax have been merged,
some test files are now containing deprecated uses of TVMScript syntax.
This PR updates the test files with latest TVMScript syntax so that
running the tests will not trigger deprecation warnings.

Co-authored-by: Tianqi Chen <[email protected]>

* [Unity] Relax dataflow pattern language (matching) (#14041)

The dataflow pattern language for Relax (originally from https://github.com/tlc-pack/relax/pull/163).

The implementation splits patterns into two parts:

- Match an Expression: match an expression syntactically (MatchExprPattern, i.e., DFPatternMatcher);
- Match a Graph: match a graph (cross multiple VarBinding) topologically (MatchGraphPattern);

* [Unity] Statement rewriter for DataflowBlock (#14043)

This PR implements a few APIs to quickly perform statement-level mutation:
`add`/`remove_unused`/`remove_all_unused`/`replace_all_uses`.
It also implements `remove_all_unused` to remove dead statements inside `DataflowBlock`.

* [Unity][Pass] FuseOps FuseTIR fixes (#14044)

This PR fixes two bugs of FuseOps and FuseTIR:

It fixes FuseOps who only rewrites the "main" function of the
IRModule. After the fix, FuseOps now goes through each non-primitive
Relax function. Test cases for both FuseOps and FuseTIR sides are added
so ensure that both of the two passes work for cases of multiple Relax
functions.

It also fixes FuseOps and FuseTIR who did not take "call_dps_packed" style
"call_tir" into account. The previous behavior will directly downcast
the first argument of "call_tir" to GlobalVar, which is not right when
the "call_tir" is in "call_dps_packed" stype and the first argument is
a PackedFunc. With this fix, FuseOps and FuseTIR will skip such
"call_tir"s. Tests for both CallTIR and CallOps are added accordingly.

* [Unity][TVMScript] Overload `__neg__` for relax expr (#14045)

This PR overloads `__neg__` given that `relax.negative` is now supported. Besides, it adds `test_op_misc.py` and brings tests for calling overloaded operators.

* [Unity][VM] Add per-op profiling support  (#14053)

Adds per-op profiling support to Relax VM, in a way similar to how Relay VM is instrumented via the common profiling infra in the runtime. Profiling over RPC is supported.

Example output:
```
Name                          Duration (us)  Percent    Device  Count                                 Argument Shapes  
conv2d1                          705,779.00    51.22  hexagon0      1  float32[1, 64, 56, 56], float32[1, 64, 54, 54]  
conv2d                           669,589.00    48.60  hexagon0      1  float32[1, 64, 56, 56], float32[1, 64, 56, 56]  
relu                                 683.00     0.05  hexagon0      1  float32[1, 64, 56, 56], float32[1, 64, 56, 56]  
relu1                                679.00     0.05  hexagon0      1  float32[1, 64, 54, 54], float32[1, 64, 54, 54]  
vm.builtin.check_tensor_info          28.00     0.00  hexagon0      1                          float32[1, 64, 56, 56]  
vm.builtin.match_shape                25.00     0.00  hexagon0      1                          float32[1, 64, 56, 56]  
----------                                                                                                             
Sum                            1,376,783.00    99.93                6                                                  
Total                                  0.00               cpu0      1                                                  
Total                          1,377,809.00           hexagon0      1                                                  

Configuration
-------------
Number of threads: 4
Executor: VM
```


The original PR: https://github.com/tlc-pack/relax/pull/422

* [Unity][BYOC] Add pattern-based partitioning pass (#14054)

This adds a new pass, FuseOpsByPattern, which applies pattern matching to each function in the given module, and groups matched expressions into a new function. The end result is similar to FuseOps, but fusion is driven completely by
the provided patterns. The implementation also reuses OperatorFusor used by FuseOps to create grouped functions from partitioned groups, further illustrating the similarity between the two passes.

The new pass will serve the same role the MergeComposite pass plays in Relay BYOC - grouped functions are annotated with the "composite" attribute to denote what operations a given function consists of, and offloaded to external backends. But it can be also useful in non-BYOC settings, for example to support advanced fusion that the op-kind based one doesn't handle (fused MHA, conv2d / gemm + reduction fusion, etc).

The original PR: https://github.com/tlc-pack/relax/pull/366

* [Unity] Relax op: collapse sum (#14059)

This PR brings high-level operators `relax.collapse_sum_like` and `relax.collapse_sum_to` which is useful when doing AD in Relax. To achieve this, it exposes the interface of `topi.collapse_sum`. Moreover, this PR also implements the legalization of these op and adds corresponding tests.

* [Unity][Fix][Pass] Fix FuseOps for lack graph edges (#14058)

This PR fixes a mistake of #14044. In #14044, in VisitLeaf of graph
construction of FuseOps, we first check if the input node is Leaf and
then check if it is Tuple. This is not right: as Tuple is not
categorized as one leaf node, when the input node is a Tuple, the
function will return since the input is not a LeafNode. And the check
for Tuple will thereby never holds.

It is quite interesting that our existing unit tests fail to filter this
mistake out. I add a regression test for this case, which can ensure
that the tuple is always visited.

* [Unity][Pass] Remove Unused Function (#14061)

This PR implements a pass to clean up unused functions. 

Co-authored-by: masahi <[email protected]>

* [Unity][BYOC] Add pass to merge composite functions to offload large subgraphs (#14062)

This PR adds a pass that merges neighboring calls to composite functions offloaded to the same external backend into one function. This is important for backends that want to receive as large subgraph as possible, for example TensorRT. It plays the same role as the MergeCompilerRegion pass in Relay BYOC does, and the algorithm follows the same idea described in https://discuss.tvm.apache.org/t/relay-improved-graph-partitioning-algorithm/5830. 

Original PR
https://github.com/tlc-pack/relax/pull/372

Substantial improvement by @yelite 
https://github.com/tlc-pack/relax/pull/411

Related fix PR by @yelite 
https://github.com/tlc-pack/relax/pull/406

Co-authored-by: Lite Ye  <[email protected]>

* [Unity][Frontend] Annotate number of non-static input of FX function (#14067)

* [Unity][Transform] Add LiftTransformParams pass (#14069)

This PR added a pass `LiftTransformParams`. It allows to compile the
end-to-end model without weights provided. The idea is annotate the
input parameters that are weights, and identify and lift the
transformations to weights, and compile it to a separate function
`transform_params` that can be executed in runtime. Users can run
`transform_params` with weights to get the weights for the optimized
model as a prep step before the deployment. In this way, we perform the
same optimizations and defer the weight transformations to the user
side, while the overhead of the deferred weight transformation can be
ignored as it only need to be run once.

This pass is integrated with the default `vm.build`. It is optional and
only necessary when the parameters are kept as inputs when importing the
model from the frontend.

* [Unity][BYOC][Pass] RunCodegen and TensorRT  (#14078)

This PR introduces the fundamental workflow for BYOC and integrate TensorRT as a demonstration.

* [Unity][Pass] Canonicalize Bindings (#14079)

It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: https://github.com/tlc-pack/relax/pull/233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>

* [Unity] Add testcases for `expr_args_converter` (#14080)

This is a missing test file when we added the `expr_args_converter`. This
PR adds it back.

* [Unity][BYOC] Add CUTLASS backend (#14081)



Co-authored-by: Lite Ye  <[email protected]>

* [Unity][BYOC] Add DNNL backend (#14082)

This PR adds dnnl backend to the unity flow.

* [Unity][Op] `log_softmax` and `cross_entropy_with_logits` (#14083)

This PR introduces two high-level operators log_softmax and cross_entropy_with_logits, which are important when we are calculating CrossEntropyLoss (in torch).

Co-authored-by: Yixin Dong <[email protected]>

* [Unity][Analysis] TIR pattern kind analysis for multi-buffer write block (#14075)

This PR supports TIR pattern kind analysis for TIR blocks which write
to multiple buffers, which is helpful for normalization operators like
layernorm, groupnorm, etc.

Prior to this PR, the analyzer does not support a blocks which write to
multiple buffers. On seeing such a block, the analyzer simply sets the
analysis result to "opaque". With this PR, on seeing a block which
writes multiple buffers, the analyzer will check if all the BufferStores
have the same indices. And it will only set the result to "opaque" when
the BufferStores have different indices.

By doing this, the analysis works for common cases where a block may
write to multiple buffers, like layernorm or groupnorm.

Besides the unit test for the analysis itself, this PR also adds a unit
test for FuseOps pass, make sure that a "layernorm + relu" pattern can
be fused together.

* [Unity][Fix][Pass] FoldConstant with DCE in dataflow block (#14087)

The current FoldConstant pass does not support removing unused bindings
in the post-folding function. Therefore, for large real-world models,
the built executable will be overlarge because of the redundant unused
constants.

This PR removes the redundant unused constant bindings in FoldConstant
by using the analysis function "RemoveAllUnused".

Note that "RemoveAllUnused" only works at dataflow block level.
Therefore FoldConstant will not remove unused bindings outside of
dataflow block as well.

* [Unity] Refactor Relax Build JIT UX (#14088)

This PR refactors relax build so it get exposed at the opt-level.
We also introduces an explicit jit functionality to handle
live loading of compiled artifacts from cutlass.

We also move relax vm to runtime so it can be clearly isolated
from the rest of the compiler stack.

* [Unity][Relax] Set Shape Function to Be Host Function (#14090)

Set shape function to be host func.

* [Unity] Fix typo in the comment (#14096)

* [Unity] Lower `shape_of` to a builtin (#14093)

This PR lowers shape_of op to a Relax VM builtin, and changes a utility function to take StructInfo as input.

Co-authored-by: Steven S. Lyubomirsky <[email protected]>

* [Unity] Relax Recursive function (#14092)

This PR adds TVMScript local recursive function support. It also update lambda lifting pass. Removed CalledGlobalVars, it was not used anymore. It also updates well-form pass to allow un-defined vars for recursive call

* [Unity][Layout] Add layout transformation analysis for PrimFunc (#14066)

* [Layout] Add layout transformation analysis for PrimFunc.

This change adds a PrimFunc level analysis to suggest layout transformations to block and buffers in the PrimFunc based on the layout transformations to PrimFunc outputs.

* Add support for multiple blocks such as split op.

* Add negative tests and increase coverage.

* fix warning message

* fix lint

* remove unused header

* Address comments.
Moved some utility functions to support/array.h
improve doc

* fix deprecation warn T.var("int64") to T.int64()

* address comments

* [Unity] Remove attributes of relax.print, assert and unique (#14101)

Remove the attributes of operators assert, print and unique.
Use PrimValue as substitute.

Co-authored-by: Steven S. Lyubomirsky [[email protected]](mailto:[email protected])
Co-authored-by: Prakalp Srivastava [[email protected]](mailto:[email protected])

* [Unity][BYOC]Add relax backend pattern registry (#14106)

* Add relax backend pattern registry

* Add doc

* [Unity] Update tests again to adapt to latest TVMScript syntax (#14115)

* finished

* fix

* rollback merge_composite_functions

* [Unity][Fix] Fix bug in MergeCompositeFunctions (#14117)

Currently `MergeCompositeFunctions` will modify the map while iterating over it, and that makes tests/python/relax/test_transform_merge_composite_functions.py does not pass. This PR fixes this bug.

* [Unity][BlockBuilder] Add `name_hint` argument for `emit` and `emit_output` (#14126)

This PR adds `name_hint` argument for `emit` and `emit_output` API of Relax blockbuilder. The argument exists in the C++ side but not exposed to Python side (So user who use the Python bb.emit will let `name_hint` be `""` by default).

Co-authored-by: Yixin Dong <[email protected]>

* [Unity][WEB] Relax vm on web runtime (#14131)

This PR brings initial relax vm support on web runtime

* [Unity] Add Global info (#14132)

* [Unity][BYOC] Add transposed matmul support to Relax CUTLASS BYOC (#14128)

Add transposed matmul support for Relax CUTLASS

* [Unity][TVMScript] emit_te sugar (#14123)

This PR adds R.emit_te meta-programming mechanism to emit a topi operator from TVMScript

* [Unity][BYOC] Assign group to unused bindings and ignroe PrimFunc (#14139)

* [Unity][BYOC] Assign group to unused bindings and ignroe PrimFunc

* Update fuse_ops.cc

* [Unity] Add callback to FuseOpsByPattern to check match result is accepted (#14109)

* [Unity] Add callback to FuseOpsByPattern to check match result is accepted

* add callnode to callback args

* update pattern registry

* fix

* [Unity][Legalize] Fix Scalar Constant Legalization (#14127)

This PR fixes the issue of loss of data type during Legalization. Previously, if we use a constant scalar in operators like `multiply`, it will automatically be converted to a python data type variable, which may lose its original data type. For example, `float16` may become python `float` and be interpreted as `float32` later.

This is now fixed by avoiding scalar value conversion. The conversion could be added back once we have better support for scalar prim value.

Co-authored-by: Sunghyun Park <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>

* [Unity][Pass] Enhance constant folding to fold relax ops by evaluating them. (#14146)

* [Unity][Pass] Enhance constant folding to fold relax ops
by evaluating them.

This uses the registered legalization function attached to
the op to lower it to call_tir and uses the existing call_tir
folding mechanism to fold it.

This kind of op folding is only allowed within dataflow block
as ops could have side-effects.

Limitations:
* This currently does not support folding ops
that could lower to multiple call_tir bindings.
* Folding by evaluating ops is not always beneficial.
We need a heuristic to check if it is useful. This is
not implemented yet and folding is always allowed
by evaluating expressions.

* fix ci error

* fix doc

* fix bug

* [Unity][Debugging] AST printer (#14152)

This PR transfers over the AST printer from tlc-pack/relax. The AST printer is a debugging tool that prints out a Relax AST in a precise and human-readable format, which can be helpful for debugging the parser or various passes.

Co-authored-by: Yuchen Jin <[email protected]>
Co-authored-by: Lesheng Jin <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Tianqi Chen <[email protected]>

* [Unity][Pass] Support Symbolic Shape Deduction during BindParam (#14154)

`BindParam` replace function params to constant nodes. However, it will
drop the shape information of the params, considering the following case:

```python
@R.function
def main(
    x: R.Tensor(("batch", "m"), dtype="float32"),
    w0: R.Tensor(("n", "m"), dtype="float32"),
    b0: R.Tensor(("n",), dtype="float32"),
    w1: R.Tensor(("k", "n"), dtype="float32"),
    b1: R.Tensor(("k",), dtype="float32"),
) -> R.Tensor(("batch", "k"), dtype="float32"):
    batch = T.Var("batch", "int64")
    k = T.Var("k", "int64")
    m = T.Var("m", "int64")
    n = T.Var("n", "int64")
    with R.dataflow():
        lv0 = R.call_tir("linear0", (x, w0, b0), out_sinfo=R.Tensor((batch, n), dtype="float32"))
        out = R.call_tir("linear1", (lv0, w1, b1), out_sinfo=R.Tensor((batch, k), dtype="float32"))
        R.output(out)
    return out
```

The current pass will simply drop the symbolic var `n`, `k` and cause
undefined vars during build as
```python
@R.function
def main(x: R.Tensor((1, "m"), dtype="float32")) -> R.Tensor(dtype="float32", ndim=2):
    m = T.Var("m", "int64")
    n = T.Var("n", "int64")
    k = T.Var("k", "int64")
    with R.dataflow():
        lv0 = R.call_tir("linear0", (x, metadata["relax.expr.Constant"][0], metadata["relax.expr.Constant"][1]), out_sinfo=R.Tensor((1, n), dtype="float32"))
        out = R.call_tir("linear1", (lv0, metadata["relax.expr.Constant"][2], metadata["relax.expr.Constant"][3]), out_sinfo=R.Tensor((1, k), dtype="float32"))
        R.output(out)
    return out
```

This PR updates the pass to bind the symbolic shape during binding.

* [Unity][Analysis] Checking function return struct info in well-formed check (#14155)

The current well-formed misses the check of function return struct info,
which may mistakenly pass the check if there are undefined vars in the
function return struct info.

* [Unity][BYOC] Use Relax legalize + CPU build for reference in tests (#14162)

* clean dnnl test

* clean trt test

* clean cutlass test

* fix gelu legalize for fp16

* use memoize in dnnl and trt tests

* [Unity] Add bind_constants option to FuseOpsByPattern (#14151)

* [Unity] Add lift_constatns option to FuseOpsByPattern

* lift_constants -> bind_constants

* [Unity][Analysis] Analysis for detecting recursion in Relax (#14149)

* DFS based attempt to detect mutual recursion

* Use Johnson's circuit-detecting algorithm instead

* Fix control flow test

* Detect all recursion anyway

* Add new test cases for simple recursion

* Fix mistake in test case

* Include missing dependencies

* Remove trailing whitespace

* Dependencies are simply references, not necessarily calls

* More trailing whitespace

* Newline at end of file

* Fix spacing in docstring

Co-authored-by: Siyuan Feng <[email protected]>

---------

Co-authored-by: Siyuan Feng <[email protected]>

* [Unity][BYOC] Add batch matmul support to Relax CUTLASS BYOC (#14166)

* Add batch matmul support to Relax CUTLASS BYOC

* Allow more dtypes

* Fix tests

* Revert how to get batch attr

* [Unity][Op] Full support of Relax op `power` (#14171)

This PR provides a full support of `R.power` including op registering, legalization, overloading `__power__` for Expr and torch fx frontend.

* [Unity][Analysis] Restore Python bindings for var analyses (#14180)

Restore Python bindings for var analyses

* [Unity][OP] Add an operator for fused multi head attention (#14150)

* [Unity][OP] Add an operator for fused multi head attention

This PR introduces the new relax operator `R.nn.attention` for fused multi head attention, and the support of fused multi head attention to relax cutlass BYOC. The input of the operator are query, key and value tensor, with `BSNH` layout, namely `[batch size, sequence length, number of heads, dimension of heads]`. And the output shares the same layout with all input tensor.

* remove useless codes, remove attrs and add memoize

* add more dispatches

* nit and fix rebase

* fix linter

* add support for bias

* fix lint

* BNSS layout for bias

* update doc

* fix typo

* support bias broadcast

* [Unity][WEBGPU] Codegen improvements and WebRuntime (#14187)

This PR makes various improvements web codegen in relax web runtime.

Correct support of shift operators.
Update relax vm to make most use of internal allocators.
Update the webgpu API to the latest spec.

* [Unity][Transform] LiftTransformParams handling multiple functions (#14192)

Previously, the LiftTransformParams pass only works on function
`"main"`. This is a bit restrictive as in our recent practice on stable
diffusion, there are cases where multiple Relax functions inside an
IRModule all need to be transformed.

Therefore, this PR enhances the LiftTransformParams pass, so that it
will now transform **all** functions **with attribute `num_input`**. For
functions without this attribute, the pass will simply skip them.

* [Unity][Op] Group normalization (#14194)

* [TOPI] Group normalization

As more and more ML models nowadays contain the group normalization
computation, we find it beneficial to introduce this op to TOPI level.
It will enable us to optimize the group normalization operation as a
whole in a more convenient way.

This PR introduces the group normalization op to TOPI. The group norm
operation was introduced in https://arxiv.org/abs/1803.08494. The
implementation uses tuple reduction, same as the implementation of layer
norm. Implemented with tuple reduction, the corresponding generated TIR
function can be optimized by cross-thread reduction or rfactor through
MetaSchedule.

Prior to this PR, the group normalization operations in frontend models
are translated to a series of operations, which brings inconvenience
when we want to optimize the group norm op as a whole.

With the TOPI implementation of group norm being introduced by #14193,
we can now use it to legalize the high-level group norm op and optimize
it using cross-thread reduction or rfactor via MetaSchedule.


Co-authored-by: Bohan Hou <[email protected]>

* [Unity][Op] Argmax and argmin (#14195)

This PR introduces full support to the argmax and argmin op to the unity
branch, including the structure info inference, the legalization, and
the translation from Torch FX.

* [Unity][Op] Legalize `round`, `floor`, `ceil`, `sign` (#14198)

This PR implements the legalization for four unary operators:
* round,
* floor,
* ceil,
* sign.

Unit tests are provided accordingly.

* [Unity][Frontend] FX translator supporting more ops (#14196)

This PR improves the torch FX translator in the following perspectives:
* support unary op `sigmoid` and `round`,
* support in-place `fill`, `triu` and `tril`,
* support `tensor`, `arange`, `empty`,
* support `bmm` (batch matrix multiplication),
* support `astype`,
* support `chunk` and `squeeze`.

This PR also fixes `Embedding`. Previously the translation assumes that
the input to Embedding will only be 1-dimensional, and will throw
exception when the input has more than one dimension (i.e., batched).
This PR brings the support.

* [Unity][Frontend] FX translator returning weights with `keep_params_as_input` (#14197)

PR #14067 introduces the flag `keep_params_as_input` to the FX
translator, in the purpose to handle to model weights outside of the
translated Relax function.

This PR takes a further step, by returning the model weights as
NDArrays when the flag `keep_params_as_input` is true. With this PR, the
translator now can return back the weights upon requested. Otherwise,
after the import we will lose the model weights in the given PyTorch
model.

* [Unity][Fix] FX translating dtype (#14201)

This PR fixes a bug of the current FX translator when dealing with
dtype.

Previously, the translator does not take the cases
```python
dtype = x.getattr("dtype")
```
into consideration. In this case, the dtype will be a fx.Node object,
while the translator assumes that the dtype is either a string or
a torch native datatype (e.g., torch.float32).

This PR fixes this by doing an environment table lookup before for all
dtypes.

* [Unity][TIR][Pass] ForceNarrowIndexToInt32 (#14203)

[TIR][Pass] ForceNarrowIndexToInt32

This PR introduces a pass which forces every index expression in a
PrimFunc to have dtype int32. Meanwhile, it also checks if all integer
buffers in the PrimFunc have int32 dtype, and report error if some
integer buffer has dtype other than int32.

In terms of implementation, this pass leverages the
IndexDataTypeNormalizer, with the target dtype being int32.

This PR contains a few basic tests that come from
`test_tir_transform_narrow_datatype.py`, and contains some negative
tests as well.

* [Unity][Frontend] FX translator support torch.baddbmm (#14202)

This PR brings the support of translating `torch.baddbmm` into
combination of operators (matmul, add, multiply). Unit tests
are provided accordingly.

This PR also fixes the kwarg fetching issue of `torch.interpolate`.

* [CI] Point cpu ci to dep with onnx (#40)

Point cpu ci to dep with onnx

* [Unity] Introduce Default GPU Schedule Pass (#14182)

* Implement default schedule.

* Add test.

* Add tests.

* Fix linting.

* Skip scheduled blocks.

* Address issues.

* Use target current.

* Minor fixes.

* Remove Mutator.

* Move pas…
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 13, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 13, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 13, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 13, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Mar 20, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Apr 1, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Apr 1, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
tqchen pushed a commit to apache/tvm that referenced this pull request Apr 1, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. Additionally, `MatchCast` bindings where the LHS and the RHS are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Example:
```python
y = x
z = y
w = z
o = w
p = o
```
Will be replaced with
```python
y = x
z = x
w = x
o = x
p = x
```

Original PR: tlc-pack/relax#233

Co-authored-by: Steven S. Lyubomirsky <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants