Memrefs and maps for tiling

@bondhugula, Thank you for always answering. I have another question about normalizing (static) memrefs. Could you give me any comments or suggestions?

I would like to normalize following example, but I couldn’t. However, when I removed spv.EntryPoint "GLCompute" @empty, I can normalize the memrefs. Is it possible to ignore the line or any other suggestions? @AlexEichenberger suggested me that the normalization may not be able to handle code outside of the func.

(I looked for similar example with ours in llvm-project/mlir/test, and I created this example from misc-ops-to-llvm.mlir)

- Example (not normalized)

$ cat misc-ops-to-llvm_entrypoint.mlir
#map0 = affine_map<(d0, d1) -> (d0 floordiv 32, d1 floordiv 64, d0 mod 32, d1 mod 64)>

module {
  func @empty() {
    %0 = alloc() : memref<10x10xf32, #map0>
    return
  }
  spv.EntryPoint "GLCompute" @empty**
}

I saw following error messages by mlir-opt --normalize-memrefs <this code>

mlir-opt: llvm-project/llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = mlir::CallOp; Y = mlir::Operation; typename llvm::cast_retty<X, Y*>::ret_type = mlir::CallOp]: Assertion `isa(Val) && “cast() argument of incompatible type!”’ failed.

$ ../../../llvm-project/build/bin/mlir-opt  -normalize-memrefs  misc-ops-to-llvm_entrypoint.mlir
mlir-opt: /home/imaihal/docker/imaihal-ubuntu/work/llvm-project/llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = mlir::CallOp; Y = mlir::Operation; typename llvm::cast_retty<X, Y*>::ret_type = mlir::CallOp]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ../../../llvm-project/build/bin/mlir-opt -normalize-memrefs misc-ops-to-llvm_entrypoint.mlir 
 #0 0x000002aa2aa044e8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../../llvm-project/build/bin/mlir-opt+0x3044e8)
 #1 0x000002aa2aa02366 llvm::sys::RunSignalHandlers() (../../../llvm-project/build/bin/mlir-opt+0x302366)
 #2 0x000002aa2aa024fe SignalHandler(int) (../../../llvm-project/build/bin/mlir-opt+0x3024fe)
 #3 0x000002aa2cbf2efe 
 #4 0x000003ff9ddbdef4 raise (/lib/s390x-linux-gnu/libc.so.6+0x3def4)
 #5 0x000003ff9ddbf37a abort (/lib/s390x-linux-gnu/libc.so.6+0x3f37a)
 #6 0x000003ff9ddb5ee4 (/lib/s390x-linux-gnu/libc.so.6+0x35ee4)
 #7 0x000003ff9ddb5f64 (/lib/s390x-linux-gnu/libc.so.6+0x35f64)
 #8 0x000002aa2b3eb9c6 (anonymous namespace)::NormalizeMemRefs::updateFunctionSignature(mlir::FuncOp, mlir::ModuleOp) (../../../llvm-project/build/bin/mlir-opt+0xceb9c6)
 #9 0x000002aa2b3edcf6 (anonymous namespace)::NormalizeMemRefs::runOnOperation() (../../../llvm-project/build/bin/mlir-opt+0xcedcf6)
#10 0x000002aa2b3714b2 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager) (../../../llvm-project/build/bin/mlir-opt+0xc714b2)
#11 0x000002aa2b37160e mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range<llvm::pointee_iterator<std::unique_ptr<mlir::Pass, std::default_delete<mlir::Pass> >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager) (../../../llvm-project/build/bin/mlir-opt+0xc7160e)
#12 0x000002aa2b3795da mlir::PassManager::run(mlir::ModuleOp) (../../../llvm-project/build/bin/mlir-opt+0xc795da)
#13 0x000002aa2b340a2e performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, mlir::PassPipelineCLParser const&) (.isra.26) (../../../llvm-project/build/bin/mlir-opt+0xc40a2e)
#14 0x000002aa2b340e76 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, bool, bool, bool, bool, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&) (../../../llvm-project/build/bin/mlir-opt+0xc40e76)
#15 0x000002aa2b341044 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&, bool, bool, bool, bool, bool) (../../../llvm-project/build/bin/mlir-opt+0xc41044)
#16 0x000002aa2b341512 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (../../../llvm-project/build/bin/mlir-opt+0xc41512)
#17 0x000002aa2a910dfe main (../../../llvm-project/build/bin/mlir-opt+0x210dfe)
#18 0x000003ff9dda3aca __libc_start_main (/lib/s390x-linux-gnu/libc.so.6+0x23aca)
#19 0x000002aa2a915454 _start (../../../llvm-project/build/bin/mlir-opt+0x215454)
#20 0x0000000000000000 
Aborted (core dumped)

- Example (Removed spv.EntryPoint ==> Normalized correctly)

$ cat misc-ops-to-llvm_entrypoint.mlir
#map0 = affine_map<(d0, d1) -> (d0 floordiv 32, d1 floordiv 64, d0 mod 32, d1 mod 64)>

module {
  func @empty() {
    %0 = alloc() : memref<10x10xf32, #map0>
    return
  }
//  spv.EntryPoint "GLCompute" @empty
}
$ ../../../llvm-project/build/bin/mlir-opt  -normalize-memrefs  mi
sc-ops-to-llvm_entrypoint.mlir

module {
  func @empty() {
    %0 = alloc() : memref<1x1x32x64xf32>
    return
  }
}

@imaihal Irrespective of the current support, this behavior is a bug. This should be easily fixable. Btw, what are the trailing *s at the end? Was this just a typo?

Drive by comment since I am not really familiar with the overall conversation topic here, but it is strange that you have spv.entry_point in the module. They should exist only in spv.module. So we have a missing verification there. How is that being added.

@bondhugula Thanks for your comment. Sorry, *s is typo. (I just tried to make the line bold)
Can you fix it? Or should I investigate more?

@MaheshRavishankar Thanks for checking. Sorry, this code might not be good example.My actual code issued similar error, but it is not appropriate to write here because it requires additional code of our own dialect.
My code put similar code about entry point outside of func within module.

#map0 = affine_map<(d0, d1) -> (d0 floordiv 32, d1 floordiv 64, d0 mod 32, d1 mod 64)>

module {
  func @empty() {
    %0 = alloc() : memref<10x10xf32, #map0>
    return
  }
  <I wanted to add some line here to reproduce my error>
}

@imaihal thanks for clarifying.

Side note though. If you are using SPIR-V dialect for OpenGL case, it would be interesting to know if you have any gaps that the SPIR-V dialect has for this. There has been some contributions to enable graphics mode in SPIR-V dialect, but more would be needed there I think. If there are specific things that you need for your use case, we can try to create tasks for the community to work on.

Please do go ahead to fix it - I won’t be able to get to this in the next few days.

OK. I’ll try.

I found the error happens NormalizeMemRefs.cpp#L268

When I removed the line ( spv.EntryPoint "GLCompute" @empt ) in the example, this loop NormalizeMemRefs.cpp#L265-L331 does not go through. I am considering whether I can avoid going through the loop even when inserting the line.

I created a patch to solve the error https://reviews.llvm.org/D87746

@bondhugula I started thinking about how we can normalize dynamic memrefs, but I’m not sure how to do it. Could you tell me a bit more details about your suggestion?
I think dynamic memrefs in alloc op are solved in lowering to LLVM (-convert-std-to-llvm). Normalizing dynamic memrefs is possible in MLIR conversion(--normalize-memrefs)?

Yes, dynamic memrefs are all properly supported on the path to LLVM.

Normalizing alloc ops and load/store with dynamic memrefs should be straightforward. It’s the dim op that’s tricky as @AlexEichenberger points out upthread. Could you provide a simple example you have in mind for discussion?

@bondhugula This is an artificial example, but I would like to normalize this kinds of dynamic memrefs with affine_map.

#map0 = affine_map<(d0, d1) -> (d0, d1 floordiv 32, d1 mod 32)>

func @test_norm_dynamic(%arg0 : memref<?x256xf32, #map0>) -> () {
    %0 = alloc() : memref<?x256xf32, #map0>
    "test.op_norm"(%arg0, %0) : (memref<?x256xf32, #map0>, memref<?x256xf32, #map0>) -> ()
    dealloc %0 :  memref<?x256xf32, #map0>
    return
}

What should I start from about this example?

You’ll need an argument for the alloc.

%0 = alloc(%N) : memref<?x256xf32>

The key step here is to deduce the sizes of the normalized memref. This is a simpler example and it’s going to be %N x 8 x 32. 8 and 32 are already given to you by the existing logic. The %N would map to a symbol in the constraint system. So this is to be replaced with:

%m = alloc(%N) : memref<?x8x32xf32>

All load/store op subscripts being multidimensional don’t see anything on the symbols binding to the ?s. So all of that replacement would work as is. So it’s just that you need to add size symbols to the constraint system used to compute the ranges and the upper bound for a normalized memref dimension will in general be a function of such symbols (instead of a constant).

@bondhugula Thanks! Sorry for late response. I resume working on normalizing dynamic memrefs.

In the previous my example, unknown dimension(?s) are not affected by normalizing. It was too simple compared with my actual case.
Excuse me again, but I would like to update my example as follows.

#map0 = affine_map<(d0, d1) -> (d0, d1 floordiv 32, d1 mod 32)>

func @test_norm_dynamic(%arg0 : memref<8x?xf32, #map0>) -> () {
    %0 = alloc() : memref<8x?xf32, #map0>
    "test.op_norm"(%arg0, %0) : (memref<8x?xf32, #map0>, memref<8x?xf32, #map0>) -> ()
    dealloc %0 :  memref<8x?xf32, #map0>
    return
}

How can I normalize memref<8x?xf32, #map0>?

I think this is the answer you wrote before. If I can see an example of this affine function, it is very helpful for me.

For example, if the size of the memref has to be %N + 1 (where %N is the one corresponding to the symbol), this information is obtained from the constraint system and used to construct the AllocOp.

%S = affine.apply (d0) -> (d0 + 1) (%N)
%M = alloc(%S) : memref<?xf32>

As to the constraint system, you’ll for example have:

d0  s0  const  >=/== 0
-1    1     1    >= 0
# This means d0 <= s0 + 1.

@bondhugula Thanks for the detailed explanation!
I wrote MLIR after normalizing following example (just handwriting, not implemented yet). This is my understanding. Is this reasonable code?

  • Input (before normalization)
#map0 = affine_map<(d0, d1) -> (d0, d1 floordiv 32, d1 mod 32)>

 func @test_norm_dynamic(%arg0 : memref<4x?xf32>) -> () {
   %c1 = constant 1 : index
   %0 = dim %arg0, %c1 :memref<4x?xf32>
   %1 = alloc(%0) : memref<4x?xf32, #map0>
   "test.op_norm"(%arg0, %1) : (memref<4x?xf32>, memref<4x?xf32, #map0>) -> ()
   dealloc %1 :  memref<4x?xf32, #map0>
   return
  }
  • Output (after normalization)
func @test_norm_dynamic_affine(%arg0 : memref<4x?xf32>) -> () {
  %c1 = constant 1 : index
  %0 = dim %arg0, %c1 : memref<4x?xf32>
  
  %1 = affine.apply affine_map<(d1) -> (d1 floordiv 32)> (%0)
  %2 = affine.apply affine_map<(d1) -> (d1 mod 32)> (%0)
  
  %3 = alloc(%1, %2) : memref<4x?x?xf32>
  "test.op_norm"(%arg0, %3) : (memref<4x?xf32>, memref<4x?x?xf32>) -> ()
  dealloc %3 :  memref<4x?x?xf32>
  return
}

This looks fine to me!

@bondhugula I have implemented the code to produce the output of the above example. However, I found the output was not correct.
For example, if the 2nd dimension of <4x?xf32> is 8, <4x8xf32, #map0> must be <4x1x32xf32>, but it seems to be <4x0x8xf32> in the example. So, I need to update the output. In normalizing static memrefs, it seems complex computation is done in C code. Is it easy to write it in MLIR? Or is there any other appropriate way to write?

I didn’t realize then - those affine.apply / rules are actually wrong. It should be ceildiv instead of floordiv? Secondly, it shouldn’t be d1 mod 32 but just 32. You get padding implicitly here with the mod if the size wasn’t a multiple of 32.

@bondhugula Thanks for replying. Yes. It will be correct in the example when I use them.

  %1 = affine.apply affine_map<(d1) -> (d1 ceildiv 32)> (%0)
  %2 = affine.apply affine_map<(d1) -> (32)> (%0)

Then, I’m wondering how they can be generated automatically. In the example, it is possible by just replacing floordiv with ceildiv and d1 mod 32 with 32, but is it enough to support other cases?

It seems algorithm for calculating upper bound of static memref with map is complex (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Transforms/Utils/Utils.cpp#L466) This algorithm can be realized by converting affine map?