Adding min/max (int/float) ops to standard dialect

Hi,
We would like to do min/max detection in reduction loops which is used in the vectorizer and parallelization pass in our project. At the moment, min/max can of course already be done via compare/select, but adding explicit min/max ops to the standard dialect, and then just support them in some other utility functions (e.g. isLoopParallel) later would make the changes we would need simpler.
Do you think that is something we could add to standard dialect; And secondly, do you know if anybody is currently already working on this?
Please let me know @ftynse .

At this point, nothing should be added to the standard dialect, the intention is to remove it or minimize it to the very bare minimum - [RFC] Splitting the Standard dialect. These operations may be suitable for the math dialect though.

The usual question we ask for any proposal of min/max operation is their more detailed semantics: are these integer or floating point? signed or unsigned semantics for integers? how they handle nans in floats? do they apply to vectors and tensors?

Thank you for the reply. Placing the suggested ops in the math dialect should be OK for us.

In regard to your questions:
We are proposing to add min/max for int (signed+unsigned) and fp, for scalars/vectors/tensors.
Regarding NaN: We are thinking to implement handling with minnum/maxnum semantics as done in llvm.minnum/maxnum intrinsics.
Please let me know what you think.

+1 to having stock max/min operations in MLIR core, I’ve been wanting this for a while too.

I spent some time thinking through the details of this a while ago, but so far it hasn’t been a priority for anyone to actually implement it. From that investigation, there are a few far-from-obvious considerations to keep in mind.

Just to have the background readily at hand for reference: IEEE-754 standards define(d) a few variants of min and max:

  • IEEE-754-2008 minNum and maxNum are NaN-suppressing (if one operand is qNaN, return the other) and treat different-signed zeros as equal, returning an arbitrary one of them. This is non-associative as described by https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf, and conforming implementations can be non-commutative by choosing to return the LHS when both operands are zeros. These more-or-less correspond to the LLVM intrinsics @llvm.minnum and @llvm.maxnum, and were removed from IEEE-754-2019 entirely because of the associativity/commutativity problems.
  • IEEE-754-2019 minimumNumber and maximumNumber replace the above directly; they explicitly treat -0 as less than +0, and also suppress signaling NaNs. These are a bit closer to @llvm.minnum and @llvm.maxnum. As far as I understand, the only differences are that the IEEE versions set an error flag when given any signaling NaNs, and the LLVM version leaves signed zero behavior unspecified.
  • IEEE-754-2019 minimum and maximum are NaN-propagating, i.e. return NaN if either input is NaN, and distinguish signed zeros. These correspond exactly to @llvm.minimum and @llvm.maximum.

I’m told some (many?) ML models rely critically on min/max propagating NaNs (@sanjoy_das_google for context), so ops whose semantics are defined to be the IEEE-754 minNum/maxNum functions would be unsuitable as the lowering of the MHLO (TensorFlow/XLA lowering dialect) min/max ops. It’d be a shame to have stock min/max ops and not be able to use them for lowering MHLO, so IMO we should make sure they cover the NaN-propagating variant. If you do need to support the NaN-suppressing variant too, that would probably mean either two pairs of ops or an attribute configuring the NaN behavior.

Regarding the signed zero behavior, there’s a bit of an issue in the current state of MLIR: there’s no way to write IR that can be correctly compiled to a hardware implementation of the IEEE-754-2019 minimum, because the closest equivalent compare-select sequence cannot distinguish signed zeros (float comparisons consider them equal), and writing that compare-select sequence specifically requests taking the LHS (or RHS, depending on how it’s written) of two zeros.

The HLO min/max ops don’t document their signed zero behavior, but anecdotally users tend not to care whether they distinguish signed zeros or not; as such I’d be inclined to treat them as unspecified signed zero behavior, to give more flexibility to choose efficient implementations for a given platform. To achieve this, we’ll need MLIR ops that can leave signed-zero behavior unspecified, and a way to plumb that down into LLVM IR to be handled by the targets.

Currently LLVM doesn’t have intrinsics for NaN-propagating min/max with unspecified signed zero behavior. @llvm.minimum is the equivalent that explicitly distinguishes signed zeros, so it would seem appropriate to me to make it accept an nsz flag that permits it to choose arbitrarily between signed zeros, i.e. to be implemented by a compare-select sequence.

All in all, the tentative plan I had come up with before was:

  • Add NaN-propagating math.minimum and math.maximum, with optional nsz flag.
  • Lower math.minimum and math.maximum to @llvm.minimum and @llvm.maximum, copying over the nsz attribute if present.
  • Lower lmhlo.min and lmhlo.max to the new math operations, with nsz set.
  • [In LLVM] Accept nsz on @llvm.maximum and @llvm.minimum calls, and propagate it to the FMINIMUM and FMAXIMUM SelectionDAG nodes.
  • [In LLVM] If the target doesn’t mark FMINIMUM and FMAXIMUM as Legal , legalize FMINIMUM and FMAXIMUM with nsz to the compare-and-select sequence.

If we need the NaN-suppressing variants too, it seems reasonable to add math.minimumNumber and math.maximumNumber as well (following IEEE-754-2019 nomenclature), or to specify the NaN behavior with an attribute on math.minimum and math.maximum. With nsz, these are @llvm.minnum and @llvm.maxnum, but without it, there’s no direct equivalent – since it seems like a very niche case, it’s probably sufficient to lower it to a sequence of NaN checks to implement NaN suppression followed by @llvm.minimum to implement distinguishing signed zeros.

As an intermediate step to make implementation easier, we could start by lowering math.minimum {nsz} to LLVM as the compare-and-select sequence, to decouple the MLIR changes from the LLVM changes; this (temporarily) foregoes the benefit of letting the target choose a hardware implementation of IEEE-754-2019 minimum for it, but unblocks the pattern-recognition use case described earlier in this thread.

The main open questions I’m aware of are:

  • Do we in fact need a NaN-suppressing version at all?
  • If so, do we prefer two separate sets of ops, or using attributes to configure NaN behavior?
  • Are there any other glaring problems with this plan?

This sounds like a reasonable plan. Initially we also wanted to implement NaN-propagating minimum/maximum. The concern is that there are llvm reduce intrinsics, such as llvm.vector.reduce.fmin.* that use the same semantics as llvm.minnum.* intrinsic (and max equivalents) which would then lead to inconsistencies in behavior with math.minimum/maximum. Perhaps we should add to this plan creating new LLVM intrinsics for llvm.vector.reduce.fmin/fmax that mimic llvm.minimum/maximum as well?