[RFC] Arith dialect versions of affine.apply/min/max

Background and context

affine.apply, affine.min and affine.max are operations that take operands and return a value of type index and an attribute of builtin type AffineMap.

The particularity of the builtin type AffineMap is that it decomposes its operands between so-called “dimensions” and “symbols”. This decomposition was originally motivated by polyhedral compilation and Presburger arithmetic in which we distinguish between:

  1. runtime quantities that follow an arithmetic progression (called “dimensions” in MLIR parlance)
  2. runtime quantities that are constant but statically unknown at compile-time (called “symbols” in MLIR parlance)

For those who remember the Polylib tools :slight_smile:, these were called “domain iterators” and “parameters”.

One interesting property is that the definition of a “dimension” and a “symbol” is not universal: depending on which “op with a region” is considered the root of a particular analysis, the classification of which variable can be interpreted as a “dimension” or a “symbol” may change.

affine.apply was originally introduced as a means to implement classical analysis subject to polyhedral and affine restrictions in MLIR.

Motivation

Since them affine.apply has emerged as a good abstraction to represent more general notions of indexing that revolve around the AffineMap::compose method that do not have restrictions about what can be considered a dim and what must be considered as a symbol.

As the evolution of usages evolved in MLIR, it now appears clear that the implementation of affine.apply conflates 2 independent things:

  1. the machinery and rewrites based on mathematical properties of builtin::AffineMap that are universal and reusable.
  2. the polyhedral-era restrictions on what is allowed to be a “dimension” and a “symbol” in the context of an enclosing op with a region; also known as AffineScope in MLIR (not to be mistaken with a traditional polyhedral SCoP (static control part)).

As a result of this conflation, affine.apply depends on memref::DimOp (through isValidDim).
This makes it impossible to have a tensor.expand/collapse_shape that matches memref.expand/collapse_shape due to cyclic dependencies.

More generally, such an apply operation is a better representation than chains of AddIOp/MulIOp to manipulate in the IR and generally composes better, irrespective of polyhedral-inspired analyses.

Additionally, in the longer term, we anticipate to benefit from canonicalizations and rewrites that mix affine and non-affine indexing. In particular, I have always been a fan of the conciseness of the Halide folding rules once a more general notion of IndexingExpr is available; see for instance Halide/Simplify_And.cpp at master · halide/Halide · GitHub. We hope we can also look at recursive indexing for FFT/sort and other recursive quantities such as shifts, exp and log.

Proposal

Decouple the machinery and rewrites based on mathematical properties of builtin::AffineMap into arith dialect operations and keep AffineScope-related considerations to the affine dialect where they belong. Since AffineMap is already builtin, this should be relatively mechanical.

I am not too attached to the specifics but here are a few questions I expect others will have strong opinions on:

  1. start a new index dialect vs plop ops into arith and split out later on a per-need basis
  2. arith/index.affine_apply/affine_min/affine_max vs other names
  3. how to actually carve out the AffineScope related constraints and make them compose properly. This may be tricky as affine has historically not been very composable with other things.

Thoughts ?

Thanks for reading !

@ftynse , @pifon2a , @bondhugula , @ThomasRaoux

2 Likes

Can you clarify on this part? I see this in the isValidDim code:

  // The dim op is okay if its operand memref/tensor is defined at the top
  // level.
  if (auto dimOp = dyn_cast<memref::DimOp>(op))
    return isTopLevelValue(dimOp.source());
  if (auto dimOp = dyn_cast<tensor::DimOp>(op))
    return isTopLevelValue(dimOp.source());
  return false;

+1 on my side. It has happened several times that I wanted generate some arithmetic expression for tiling, distribution, etc… and having to deal with the affine expression restrictions around symbols vs dimension doesn’t make sense and prevents us from using affine expressions in some cases.

I assume the arith.affine expr would have relaxed validity condition (for instance affine expr doesn’t allow multiply two dimensions)?

I’ll punt to @pifon2a to refresh our memories on what the exact issue was there :slight_smile:

1 Like

Seems like you would want something like a PolynomialExpr which is a different beast?

Atm I am only proposing to make the AffineExpr-based op more reusable, so you’d still use dim * sym or sym * sym if you wanted to use that. The AffineMap::compose rules on that would be slightly different so it really depends on your use case.

+1

I’m in this camp. arith is “simple integer and floating point arithmetic operations”.

We have other contenders: arith.ceildivsi and arith.floordivsi. These fall more under an index umbrella and I don’t feel fully belong in arith. And given Adding unsigned integer ceil and floor in Std Dialect, which I just saw, I think now would be a good time to make a new dialect with these ops.

Linalg depends on Affine:
reifyResultShapes for linalg::TensorExpand/CollapseShapeOp calls createFoldedComposedAffineApply from Linalg/IR/LinalgInterfaces.cpp that creates AffineApplyOp

Affine depends on TensorOps. This is an artefact of the refactoring when we moved some of the ops from Std to MemRefOps and TensorOps. Most likely this we can get rid of

 if (auto dimOp = dyn_cast<tensor::DimOp>(op))
    return isTopLevelValue(dimOp.source());

and the other 2 appearances of tensor::DimOp without changing anything.

The cyclic dependency arises when we add Tensor->Affine dependency because MemRef dialect depends on Tensor dialect.

So if we move linalg::ExpandShapeOp to TensorOps we would get TensorOps → AffineOps → MemRefOps → TensorOps cycle.

The interesting part is that most of the code in MemRefOps.cpp that uses IR/Tensor.h is related to bufferization. I remember there was a discussion about creating a dialect that has BufferCastOp and TensorLoadOp and all related patterns. @herhut

@nicolasvasilache, @gysit In order not to speculate about what would happen if tensor::DimOp is removed from AffineOps.cpp, here is the PR that does exactly that ⚙ D113117 [mlir] Remove `tensor::DimOp` from `AffineOps.cpp`.. It affects 4 tests in Linalg. I am not sure how important these simplifications are.

Getting back to the proposal to move apply/min/max out of AffineOps, I fully support it. I also think that these ops can stay in Arith dialect for a while. Maybe later we will have a more general class of expressions that we want to evaluate, then it would make sense to have a separate dialect.

1 Like

I am generally supportive of this direction, this has come up several times in the past. A couple of remarks from my side:

  • affine.apply itself does not enforce dimension/symbol constraints on its operands - llvm-project/AffineOps.cpp at 2a7c3f8b02bf9c692bd28e6560ced1aaae53ad7e · llvm/llvm-project · GitHub, I think the same is true for min/max.
  • Some canonicalizations of these ops may depend on tensor/memref.dim. Since canonicalizations need not be registered by the dialect of the root operation (however surprising it may look like), canonicalization patterns can be moved to another dialect when switching the dependency direction. Alone, they should not prevent changing the dependency direction. Foldings should not depend on other dialects at all, but rather get the operand that comes from dim as an attribute resulting from folding the dim itself independently.
  • It is not exactly clear what is the semantics of dimensions/symbols in affine maps if they move (it isn’t precisely clear even in the current state). We have canonicalizations that “promote” dimensions to symbols based on the affine provenance rules…

At the same time, I am not supportive of creating the “index” dialect that would also include floor/ceildiv. Instead, I would like apply/min/max operations to be extended to support other integer types. One can argue for an “affine_map” dialect instead, but it shouldn’t include floor/ceildiv either.

Yes I made the effort of quarantining such requirements in the affine dialect a while back (1 - 2 years? details are now fuzzy). We should be in a good place wrt this point.

True, still, since @pifon2a seems to have a way to break the cycle (it’s not like affine does anything useful with tensor::DimOp anyway), this seems like a nice know to untangle greedily.

I’m guilty here, I’ll surface the relevant portion of my comment in this review:


affine.apply canonicalization has assumptions related to AffineScope, these happens in canonicalizePromotedSymbols.
I had to introduce this in 071ca8da918a5aed4758c4b4e27b946663adce58 as a counterpart to promoteComposedSymbolsAsDims.

The rationale at the time was to reduce the mess created by the multi-result + chains of AffineApplyOp design which had its own separate implementation that did not agree with the mathematical form of AffineMap::compose.

As time passed, things got better and all this technical debt could be cleaned up at long last.
I think it is time to drop this AffineScope-dependent rewrite from affine.apply canonicalization and make it into a separate opt-in pattern (if still needed, I’d be fine just dropping it altogether).


More precisely, to your point:

I am not sure what is unclear:

  • generally, any Value may be used as a dim or a symbol locally at the place they are introduced
  • AffineExpr/AffineMap cannot express polynomials over dims
  • composition of AffineMaps “compose dims” and “concatenate symbols”
  • any containing op that wants to give dims or symbols a more universal meaning, across the scope it owns, is free to do so (e.g. AffineScope for affine.loop operations).

This is how we’ve been operating for 1-2 years already; current proposed changes to AffineScope leak into the rest of the world via the affine.apply canonicalization; we can remedy those independently of moving the op into arith and later proceed with the refactoring proposed here and future extensions.

I think the key issue here is the canonical form of the operation. We can already have any value as either symbol or dimension in affine apply (we cannot use results of such operations in other affine operations such as load or for, but this is an argument for moving apply out of affine), and we never could have polynomials with dimensions. Composition is where the difference starts to matter. But composition itself is a tool, which may be made orthogonal to op semantics. It is not currently, because the canonical form requires operand sources that are also affine apply to be folded in, and also requires anything that can be proven to have the affine symbol value category to become a symbol in the expression. I think you just want to remove the part about symbols. (I haven’t thought enough about the issue to claim that just transforming everything into symbols outside of affine scopes is okay or not okay). This will make @bondhugula’s change possible without affecting linalg tests. I suspect he, on the other hand, may have use cases in affine that actually want dimension to symbol promotion though. Maybe there is a way to only do that inside ops that extend affine scope and are already subject to affine value categorization.

Yes, this would be interesting to investigate given the origin story of this promotion function…

Promoting dimensional operands to symbols is obviously extremely important and such a form where Value symbols are in symbolic positions would be the canonical form for the application of operands to maps – which is also the point @ftynse makes on the opening line of the para. That’s key. Without it, the IR wouldn’t have a canonical form and there could n different ways to express map + operands as far as dim and symbol choices go – since every symbol Value is also valid as a dim. Knowing something is a symbol lets one do only more in general as far as analysis and transformation goes (since the value would be known to be a constant albeit unknown).