What does NoSideEffect mean?

Is the TCP "matmul" op marked NoSideEffect? - #45 by sanjoy_das_google raised some big questions w.r.t. the meaning of NoSideEffect. I wanted to give a very concrete example of where we miscompile today due to this lack of precision about what NoSideEffect means.

If --loop-invariant-code-motion is run on the following MLIR, both the divi_unsigned and cmpi get hoisted out of the loop, which are both miscompiles. In the case of %trip_count being zero, the transformed program executes undefined behavior / an error, whereas it does not prior to the transformation.

func @maybe_divide_by_zero(%lhs: i32, %rhs: i32, %trip_count: index) {
  %ci0 = constant 0 : index
  %ci1 = constant 1 : index
  loop.for %_ = %ci0 to %trip_count step %ci1 {
    // Could divide by zero!
    %div = divi_unsigned %lhs, %rhs : i32
  }
  return
}
func @maybe_tensor_shape_mismatch(%lhs: tensor<?xi32>, %rhs: tensor<?xi32>, %trip_count: index) {
  %ci0 = constant 0 : index
  %ci1 = constant 1 : index
  loop.for %_ = %ci0 to %trip_count step %ci1 {
    // What if tensor sizes mismatch? Error or UB.
    %cmp = cmpi "eq", %lhs, %rhs : tensor<?xi32>
  }
  return
}

How should we model this to allow doing LICM? Do we need a “speculatable” trait? Do we need to remove NoSideEffect from divi_unsigned and cmpi? Can we model this with the effects system?

Also, the basic arithmetic std ops (addi, etc.) allow tensors as operands, but we don’t define what happens if the shapes mismatch dynamically (UB or an error?). That’s probably a discussion for another day and ties into the TCP discussion of modeling errors in the presence of dynamically shaped tensors.

1 Like

I thought we already covered this exact problem? The current definition in MLIR of NoSideEffect is the absence of memory effects
LLVM differentiate side effects and can be speculated for this exact purpose. Since MLIR does not model the can/can’t speculate trait of operation and so right now passes have been “assuming” that nothing can ever trap.

1 Like

Let’s rename NoSideEffect to NoMemoryEffect then?

And can we use the effect system to model this speculatability trait?

I think it makes sense to differentiate between side-effects on memory, whose order has to be preserved for correctness and trapping, where one might be ok with reordering but not speculation. I don’t think the current effects system would allow to model the reordering, so just making trapping an effect would constrain us more then required.

Another angle of this is whether side-effects also keep an operation alive. For your divi example, one could also argue that it is ok to not execute the operation even though it is trapping. As long as one is clear about the semantics.

At the very least, we need to model LLVM’s speculation notion. Maybe we could have effects that allow reordering in the effects system for this purpose? That seems generally useful.

We likely need to represent the “can be speculated” predicate on operations, I don’t know if we should use the Effect machinery for this purpose though or if we should just use a trait to model such “flags” on these operations?

Like with NoSideEffects, starting with a trait is a good idea until we figure out a use case for and design of a more sophisticated solution.