LLVM Discussion Forums

RFC: Alignment support

Hi,

As of today, there isn’t a way to specify the alignment of a memref and preserve that in LLVM. At Standard level, I can think of several options to support alignment:
a) Similar to LLVM, allow loads and stores to carry alignment information.
b) Similar to LLVM, allow function args (that are memrefs) to carry alignment information.
c) Add alignment to MemRefType as part of the type information.

Although c) is still debatable from language design’s perspective (whether alignment should be part of a type), a) and b) are worse choices to me:

  • a) doesn’t enable users from specifying alignment for function args;
  • As for b), inlining will drop all function arg attributes information on the floor.
  • Besides, for b), MLIR values don’t have attributes in the first place.

I’m working on c) at the moment, in the hope that nothing surprising comes up.

Thanks!

1 Like

b) Similar to LLVM, allow function args (that are memrefs) to carry alignment information.

Can you be more specific to the LLVM feature you’re referring to here?

  • a) doesn’t enable users from specifying alignment for function args;

Correct, but you’re not saying why is it useful/important to have.

b) Similar to LLVM, allow function args (that are memrefs) to carry alignment information.

Can you be more specific to the LLVM feature you’re referring to here?

It’s the “align” attribute in “parameter attributes”.

  • a) doesn’t enable users from specifying alignment for function args;

Correct, but you’re not saying why is it useful/important to have.

One example is GPU kernels. They are functions that often take large buffers allocated with good alignment (e.g. 256 bytes). Knowing this information allows heavily vectorized loads and stores within the function.

Ah somehow I couldn’t find it earlier. Do you know if this is used though? I search the codebase but couldn’t really find it beyond the verifier and the C++ unit-test.

Sure, but LLVM does that with a) by annotating the loads and stores, what is missing?

You mentioned that a) “doesn’t enable users from specifying alignment for function args”: this is true but you could still add function parameters attributes separately (like LLVM).

My main concern with adding it to the type itself is that I believe it requires to be handled/preserved by every transformation that use this type everywhere.

We’ve been pondering this question with @nicolasvasilache, Andy and Albert for quite some time. The tricky part is the trade-off between preserving this information on the function boundary vs. having to specialize functions.

In practice, if you see the allocation operation (i.e. it happens in the same function), you can derive alignment from it. But you’d lose it when passing the allocated buffer to the function. The principal way to enforce this information is transferred to the callee is to embed it in the type, but it would make memref<?xf32> different from memref<?xf32, align=64>, potentially requiring you to replicate the callee for every alignment. And it will require everything that touches the type to be aware of the alignment.

We also discussed having arguments on values and decided we did not have a strong case for adding this complexity to the IR. We can have attributes on function arguments and on individual operations that define or use values. The consensus was to “annotate” values using dummy operations instead, as in

func @foo(%arg0: memref<?xf32>) {
  %0 = annotate %arg0 {foo=64: i32} : memref<?xf32>
  // use %0 instead of %arg0
  // rewrite patterns can differentiate between annotated and non-annotated values
}

I haven’t thought about this for alignment though

I also see a great potential in c), and alignment being a contract (assume/guarantee) between memref allocation and use, it feels very natural to embed this contract into a type. I also do not underestimate the risks, but I suggest to keep it as an ideal solution, until we identify a critical risk that cannot be mitigated.

[The main alternative is to use vectors. I suspect this is what @nicolasvasilache will propose as a temporary solution, and then we are back to something closer to a). But it seems like overkill and misuse of vectors to me.]

Regarding code duplication (function cloning), if the function takes advantage of memref argument alignment to optimize and simplify its implementation, supporting misaligned accesses will necessary lead to code duplication, and this seems legitimate. One does need to extend the sub-typing rules of memref, along the lines of vectors sub-typing, but implicitly.

Regarding type preservation and handling in transformations, I’m not too worried as long as it is part of the type. Most transformations don’t have to care, the type will just be carried along without inspecting its alignment property. It would be more challenging if alignment would be a “type attribute”. New infrastructure and care in transformations needed. One other argument in favor of c).

Regarding @ftynse’s dummy op with alignment attribute, it looks like a pretty neat trick and also a clean SSA-friendly solution with minimal impact on the infra. But it does not cross function boundaries.

Albert

The topic of function boundary has many aspects indeed. Something that I find to be harder as being part of the type rather than an attribute on the parameter is how rigid it makes the system: if a function is declared with a memref aligned on 8B and you have a memref aligned on 16B, you can’t call it directly.
So we’ll need to have some cast / conversion operation readily available for this as well.

Something that I find to be harder as being part of the type rather than an attribute on the parameter is how rigid it makes the system: if a function is declared with a memref aligned on 8B and you have a memref aligned on 16B, you can’t call it directly.
So we’ll need to have some cast / conversion operation readily available for this as well.

I’d like to evaluate how bad the rigidness is. For example, in C++ you can’t compare two vectors with different allocators: vector<int, Allocator1>() == vector<int, Allocator2>();, to me it has gone too far.

As of now, I don’t see any operation that takes more than one memrefs (except for memref_cast), so I guess we are safe at the moment.

I too find it too heavy to attach alignment information to the type. First, having multiple call sites with different alignments for the memref arg is a rather rare scenario, and even for those, the specialization needed is often just about choosing the right load/store instructions late. Using function argument attributes along with an approach to use an assume style operation on the value type to specify alignment (effectively similar to what @ftynse says above) sounds better to me. Either

%0 = annotate %arg0 {foo=64: i32} : memref<?xf32>
or
assume_alignment %arg0, 64 : memref<?xf32>

IIUC putting alignment in the memref type will let us do this via memref_cast. Is that correct?

If you put it in the type, you won’t need to cast. The func memref argument will have the alignment info, and one will have to define the default semantics when it’s not present. (The memref_cast is just meant to cast between compatible shapes now, and not to change such information. )

The issue is much more complex than just adding an alignment flag. Memrefs are a structured multi-dimensional type. Having the base pointer aligned does not mean much. To support this properly in the presence of strided layouts and symbolic size/stride we need much more information.

The proposed solutions so far seem that they can only work for static shape where the most minor size is also a multiple of the alignment (or some similar very restrictive case).

I meant if the memref type has an alignment attribute then we could have a func take a memref<...xf32, align=1> and memref_cast it to (say) memref<...xf32, align=32>. This would subsume the functionality annotate or assume_alignment provides, and conceptually “leaner” (I will concede that this is a subjective point).

Isn’t that the most we can practically guarantee (by making making the allocator alignment aware)?

Can you give examples of what cannot be supported by the current proposal?

I intentionally used a generic “annotate” because the previous discussion about it was about attributes on values in general rather than alignment in particular. @bondhugula’s assume_alignment would work as well and so would memref_cast. The latter may need more careful consideration. Currently, memref_cast can either introduce or remove a constant value instead of unknown one, but cannot change one constant value into another directly. There’s a bit of semantic mismatch with alignment where we can assume unknown alignment is equivalent to alignment=1 and never need unknown alignment.

A possible alternative is to consider relaxing the typing rules on function calls. That is, be able to call a function with looser alignment alignment (e.g. 4) with a compatible stricter alignment (e.g. 8, but not 10). Personally, I don’t like the implicit casting, even if it’s restricted to std.call, and casting explicitly before the function call is only marginally more verbose.

Isn’t that the most we can practically guarantee (by making making the allocator alignment aware)?

If the information is encoded as a single alignment value in the type system then I would agree this is as much as one could hope for.

Can you give examples of what cannot be supported by the current proposal?

Consider memref<?x7xf32, alignment = 64> or more generally memref<?x?xf32, alignment = 64>.
A load[%i, 0] is only ever aligned to 64 when i %(64 / sizeof(f32)) == 0.

As @albertcohen was hinting, I have been thinking about how vectors carry this type of information naturally. I agree that using vector just for that purpose is not a reasonable solution. But this comes in a more global context where we have been thinking about these interactions and phase orderings holistically (e.g. an early-vectorization transformation).

At the moment, due to the structured nature of memref, I think annotating the loads is the only option that would work short-term to guarantee the information is available.

Even something like:

assume_alignment %arg0, 64 : memref<?x?xf32>

would not be enough to guarantee alignment given what I point out above and the need to propagate constraints conservatively.

What is the alternative? If you’re suggesting padding the memref to (say) memref<?8xf32>? If so, won’t that be a part of the memref dimension bounds?

To summarize, we have two problems under discussion:

  1. There is a need to inform load/stores with alignment hints for vectorized loads.
  2. There might be a need to exhibit alignment information in the function signature using some MLIR mechanism.

Embedding alignment in types gives us both 1) and 2) as a package. Putting attributes on function arguments also gives us both. On the other hand, the assument_alignment operation gives us 1) but not 2).

For 2), we have some annotation mechanism, implemented as either attributes on function args or alignment in types. I don’t see much benefits beyond better compile-time checking. For example, without any implementation of 2), it’s good enough for the callee creator to document (in human text) the callee with alignment requirements and announce UB on invalid input.

I’d argue against having 2) in the Standard dialect, as 2) looks like a problem that should be solved in a frontend language / dialect / upper level.

This observation points at adding assume_alignment as the path forward, and leave 2) for another time (at least for my project).

My understanding is that assume_alignment is just a hint. It doesn’t change the semantics of a valid program, just put UBs on the invalid ones per user contract (aka alignment contract). From this perspective, no-op is a valid implementation for assume_alignment for your case.

To echo with Sanjoy, is there more we can do to improve in this specific case? If so, maybe indeed we want to put all solutions into a coherent framework.

I didn’t mean assume_alignment to be a hint; it’s exactly like LLVM’s @llvm.assume intrinsic. It just says the alignment is that and it would be UB if that doesn’t hold true. (It’s not an assert - so we won’t abort.) Also, it works in both dominance and post dominance scenarios. But since we are modeling an alignment that is constant and obviously holds for the value’s entire lifetime, it doesn’t make sense to have the assume_alignment ops on subset control flow paths of the value. So, the assume will in practice always either dominate or post dominate each use of that value.

And yes, this will be a no-op for final code generation.
https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic