Drafting on some recent discussions regarding relayering/splitting Linalg, one of the ops that should graduate out of linalg is linalg.init_tensor
.
Background
linalg.init_tensor
was added in ⚙ D93374 [mlir][Linalg] Define a linalg.init_tensor operation. and, roughly speaking, the op creates an “empty” tensor of a given shape. However, the ODS is not really precise about what this means. The commit message gives more insight into the intended use, but uses semantically-confusing phrases like
- “is expected to overwrite the entire tensor” - the
tensor
type is value-semantic, so talking about “overwriting” it doesn’t make any sense from the perspective of defining the op semantics. - “will be used as the init tensor of such Linalg operations” - having an op that can only be used as operand to another specific op doesn’t really fit well with the data model of value-semantic tensors – I can pass any
tensor
to anything that accepts atensor
and that should be ok (same for block arguments).
The use case
Before tightening up the semantics of this op, I would like to provide concrete insight into the most important use case for this op. Basically, linalg.init_tensor
is the equivalent of memref.alloc
. However, it has proven very useful to not jump to memrefs immediately, but rather to stay in value-semantic tensor
land to retain SSA use-def chains. One key trick that makes this work well is the use of “tied” operands (which in linalg take the form of outs
operands), which simplify analyses and transformations and can be used to model a restricted form of in-place updates in a program that otherwise has value-semantics. The burden then falls on “bufferization” to make the updates truly in place if possible (and make copies if not). This is a reuse analysis somewhat analogous to register allocation in a traditional compiler. In fact, LLVM’s MachineIR has the same kind of “tied” operands, for modeling so-called “2 address” instructions like add eax, edx
in a “3 address” SSA IR.
Another word for this value-semantic modeling of an underlying mutable program is “destination-passing style”. This destination-passing style gets the advantages of value-semantics while maintaining enough predictability w.r.t. in-place updates to be useful for codegen. There isn’t a specific, bulletproof contract for what becomes in-place at the buffer level, but the reality is that bufferization algorithms are sufficiently sophisticated that in practice the results are unsurprising.
Non-goals
We don’t intend for this op to be an “undef” in the LLVM sense (uses seeing different values), or touch on any complex UB topics that are needed to unlock certain aggressive optimizations (e.g. counterfactual reasoning with poison values) – it has been found that these optimizations are not important at the level of abstraction / use cases that this op is used for.
This allows us to propose a fairly relaxed semantics that sidestep “the hard parts” of UB/poison/undef discussions.
Proposed semantics / new op name
Rename linalg.init_tensor
to bufferize.unspecified_tensor
(please bikeshed), and give it the following semantics:
"
bufferize.unspecified_tensor
produces a tensor of the given shape with unspecified values.
Each instance of the bufferize.unspecified_tensor
op potentially produces different values.
"
Additionally:
- The op has an “allocation” side effect. This prevents it from being CSE’d (for the use cases of this op, CSE is not really useful and I hear that the existing CSE behavior of linalg.init_tensor has actually caused some headaches)
- The op is restricted to memref-compatible element types
The reason this op fits well in the “bufferize” dialect is that, as I mentioned, the op is really “memref.alloc” but for the tensor-level destination-passing style. This is very intimately related to bufferization – in fact, it is so intimately related that people who use this op are usually thinking about the underlying mutable program they expect it to bufferize to rather than the nominal value-semantic program actually written in the IR.
I believe that the proposed semantics here are fully-defined and well-specified at the value-semantic tensor level, while still having the right properties for the main use case of this op.
Thoughts?