LLVM Discussion Forums

Memrefs and maps for tiling

@bondhugula: Our use case is that basically we have library calls implementing high-level ops such as CONV, LLSM,… which requires the data as represented by the map. By definition, these “external” function will know how to deal with this data. In its simplest term, we really only need to pass the pointers to the data, plus a separate library-specific descriptor that defines the size of the data. Note the test functions will both load some data (read) and store some other data (write).

Thanks for the collective inputs provided on these forums, much appreciated.

Would you be able to suggest an approach where we can “register” these library calls (“test.test” in this example) to tolerate the maps and not prevent the lowering of the maps in the same way as when the “test.test” call is commented out?

Hi @AlexEichenberger @imaihal someone in my team is actually working on the interprocedural version (module pass) of memref normalization. That will handle function argument rewriting, call args, and return signature conversion - so it would be comprehensive.

Over here, looks like all you need is to replace the memref SSA value on your op in spite of it being a non-dereferencing use. This should be pretty straightforward if you need a temp fix - by patching normalizeMemRef and RAMUW. (Just do a regular replace use (setOperand) for its use on your test.test.)

@AlexEichenberger @imaihal please see here: https://reviews.llvm.org/D84490

@bondhugula

I checked the patch and it does the advertised functionality well, as shown with the small example below:

#map0 = affine_map<(d0, d1) -> (d0, d1 floordiv 32, d1 mod 32)>
module {
  func @test(%in : memref<5x10xf32>, %out : memref<5x10xf32, #map0>) {
      affine.for %i = 0 to 5 {
          affine.for %j = 0 to 10 {
              %v = affine.load %in[%i, %j] :  memref<5x10xf32>
              affine.store %v, %out[%i, %j] : memref<5x10xf32, #map0>
          }
      }
      return
  }
  func @test_simplification() {
    %0 = alloc() : memref<5x10xf32>
    %1 = alloc() : memref<5x10xf32, #map0>
    //"test.test"(%0, %1) : (memref<5x10xf32>, memref<5x10xf32, #map0>) -> ()
    call @test(%0, %1) : (memref<5x10xf32>, memref<5x10xf32, #map0>) -> ()
    dealloc %1 : memref<5x10xf32, #map0>
    dealloc %0 : memref<5x10xf32>
    return
  }
}

transformed into

module {
  func @test(%arg0: memref<5x10xf32>, %arg1: memref<5x1x32xf32>) {
    affine.for %arg2 = 0 to 5 {
      affine.for %arg3 = 0 to 10 {
        %0 = affine.load %arg0[%arg2, %arg3] : memref<5x10xf32>
        affine.store %0, %arg1[%arg2, %arg3 floordiv 32, %arg3 mod 32] : memref<5x1x32xf32>
      }
    }
    return
  }
  func @test_simplification() {
    %0 = alloc() : memref<5x10xf32>
    %1 = alloc() : memref<5x1x32xf32>
    call @test(%0, %1) : (memref<5x10xf32>, memref<5x1x32xf32>) -> ()
    dealloc %1 : memref<5x1x32xf32>
    dealloc %0 : memref<5x10xf32>
    return
  }
}

where all the maps are eliminated from the load/store and alloc/dealloc.

However, a pattern that we see often is the lowering to external implementations not expressed in MLIR (think CUDNN calls or the like). In the above example, if we comment the test call lines and uncomment the line with test.test, then the optimization will not remove any of the simplifications, as shown below.

func @test_simplification() {
  %0 = alloc() : memref<5x10xf32>
  %1 = alloc() : memref<5x10xf32, #map1>
  "test.test"(%0, %1) : (memref<5x10xf32>, memref<5x10xf32, #map1>) -> ()
  dealloc %1 : memref<5x10xf32, #map1>
  dealloc %0 : memref<5x10xf32>
  return
}

Is there a way to extend the approach to force the optimization through dialect operations?

In our case, these maps were introduced especially to satisfy expected layout by the dialect “test.” So we can take full responsibility that accesses within test.test will be fine. Happy with either a declarative approach or with a flag for a given dialect.

Sure - I think we should discuss a clean way to support this in a subsequent patch. The mechanics to achieve this are really trivial - not more than a couple of lines. I assume the “test.test” operation you are using is in reality a registered dialect operation. Traits or effects could be one way to model this cleanly.

@abhishek.varma

@bondhugula, you got it, we have a dialect which relies on a custom data layout for its operations, and we would like to continue representing memrefs using the original “logical” indices while hiding the actual projection of the actual data in a map. That way we preserve the original dimensions of the arrays, which we need at times, while being able to alloc and reference data within MLIR using the projected dimensions.

Operations that have a corresponding operations in that dialect will be using that dialects; and their functionality is implemented outside of MLIR. Operations that have no corresponding operations in that dialect will be implemented natively in MLIR, and having the memref maps will be very useful to access the data produced by the dialect operations.

This should enable dialects for many custom accelerators that rely on memory with custom data layouts.

Happy to help

Sure - that was exactly the objective behind having layout maps in memref type from the beginning. D84490 is committed now. There is another revision in the pipeline that completes this normalization by handling ReturnOps as well, which is non-trivial. Will be happy to review if you are able to submit one that adds the desired support.

Hi @AlexEichenberger, @imaihal an update has been made to memref map layout normalization that deals with the ReturnOps. Please see here: https://reviews.llvm.org/D85226

Hi, @bondhugula, @abhishek.varma,
Our test case including map in dialect operations is now successfully normalized by your and @AlexEichenberger’s patch (https://reviews.llvm.org/D86236). Thanks for your help!

We have another requirement about normalizing affine_map with dynamic dimension, as in the code below. (I just changed the dimension of test code here https://github.com/llvm/llvm-project/blob/master/mlir/test/Transforms/normalize-memrefs.mlir#L7-L17)

func @permute() {
  %c64 = constant 64 : index
  %A = alloc(%c64) : memref<?x256xf32, affine_map<(d0, d1) -> (d1, d0)>>
  affine.for %i = 0 to %c64 {
    affine.for %j = 0 to 256 {
      %1 = affine.load %A[%i, %j] : memref<?x256xf32, affine_map<(d0, d1) -> (d1, d0)>>
      "prevent.dce"(%1) : (f32) -> ()
    }
  }
  dealloc %A : memref<?x256xf32, affine_map<(d0, d1) -> (d1, d0)>>
  return
}

Currently this is not normalized, but we found you wrote it as TODO in comments.(https://github.com/llvm/llvm-project/blob/master/mlir/lib/Transforms/Utils/Utils.cpp#L455-L457)
Do you plan to support it?

Hi @imaihal, this isn’t really in our immediate TODO list. Will be happy to help review it if someone takes it up.

Hi @bondhugula, do you think that handling a case where the dynamic dimension is trivially mapped would be an easier stepping stone? See d0 mapping to ? below:

 memref<?x256xf32, affine_map<(d0, d1) -> (d0, d1 floordiv 32, d1 mod 32)>>

I don’t think it’ll make a big difference or any difference at all. It could be done in one shot for the general case I think. An extra “symbol” column would be needed in the constraint system for each dynamic dim, and the upper bound obtained subsequently would be an affine function potentially involving symbols (as opposed to just a constant as was the case for a static memref). It can then be used to construct the allocation for the new memref type. The access replacement logic remains unchanged, right?

@bondhugula, Thank you for always answering. I have another question about normalizing (static) memrefs. Could you give me any comments or suggestions?

I would like to normalize following example, but I couldn’t. However, when I removed spv.EntryPoint "GLCompute" @empty, I can normalize the memrefs. Is it possible to ignore the line or any other suggestions? @AlexEichenberger suggested me that the normalization may not be able to handle code outside of the func.

(I looked for similar example with ours in llvm-project/mlir/test, and I created this example from misc-ops-to-llvm.mlir)

- Example (not normalized)

$ cat misc-ops-to-llvm_entrypoint.mlir
#map0 = affine_map<(d0, d1) -> (d0 floordiv 32, d1 floordiv 64, d0 mod 32, d1 mod 64)>

module {
  func @empty() {
    %0 = alloc() : memref<10x10xf32, #map0>
    return
  }
  spv.EntryPoint "GLCompute" @empty**
}

I saw following error messages by mlir-opt --normalize-memrefs <this code>

mlir-opt: llvm-project/llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = mlir::CallOp; Y = mlir::Operation; typename llvm::cast_retty<X, Y*>::ret_type = mlir::CallOp]: Assertion `isa(Val) && “cast() argument of incompatible type!”’ failed.

$ ../../../llvm-project/build/bin/mlir-opt  -normalize-memrefs  misc-ops-to-llvm_entrypoint.mlir
mlir-opt: /home/imaihal/docker/imaihal-ubuntu/work/llvm-project/llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = mlir::CallOp; Y = mlir::Operation; typename llvm::cast_retty<X, Y*>::ret_type = mlir::CallOp]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ../../../llvm-project/build/bin/mlir-opt -normalize-memrefs misc-ops-to-llvm_entrypoint.mlir 
 #0 0x000002aa2aa044e8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../../llvm-project/build/bin/mlir-opt+0x3044e8)
 #1 0x000002aa2aa02366 llvm::sys::RunSignalHandlers() (../../../llvm-project/build/bin/mlir-opt+0x302366)
 #2 0x000002aa2aa024fe SignalHandler(int) (../../../llvm-project/build/bin/mlir-opt+0x3024fe)
 #3 0x000002aa2cbf2efe 
 #4 0x000003ff9ddbdef4 raise (/lib/s390x-linux-gnu/libc.so.6+0x3def4)
 #5 0x000003ff9ddbf37a abort (/lib/s390x-linux-gnu/libc.so.6+0x3f37a)
 #6 0x000003ff9ddb5ee4 (/lib/s390x-linux-gnu/libc.so.6+0x35ee4)
 #7 0x000003ff9ddb5f64 (/lib/s390x-linux-gnu/libc.so.6+0x35f64)
 #8 0x000002aa2b3eb9c6 (anonymous namespace)::NormalizeMemRefs::updateFunctionSignature(mlir::FuncOp, mlir::ModuleOp) (../../../llvm-project/build/bin/mlir-opt+0xceb9c6)
 #9 0x000002aa2b3edcf6 (anonymous namespace)::NormalizeMemRefs::runOnOperation() (../../../llvm-project/build/bin/mlir-opt+0xcedcf6)
#10 0x000002aa2b3714b2 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager) (../../../llvm-project/build/bin/mlir-opt+0xc714b2)
#11 0x000002aa2b37160e mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range<llvm::pointee_iterator<std::unique_ptr<mlir::Pass, std::default_delete<mlir::Pass> >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager) (../../../llvm-project/build/bin/mlir-opt+0xc7160e)
#12 0x000002aa2b3795da mlir::PassManager::run(mlir::ModuleOp) (../../../llvm-project/build/bin/mlir-opt+0xc795da)
#13 0x000002aa2b340a2e performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, mlir::PassPipelineCLParser const&) (.isra.26) (../../../llvm-project/build/bin/mlir-opt+0xc40a2e)
#14 0x000002aa2b340e76 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, bool, bool, bool, bool, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&) (../../../llvm-project/build/bin/mlir-opt+0xc40e76)
#15 0x000002aa2b341044 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&, bool, bool, bool, bool, bool) (../../../llvm-project/build/bin/mlir-opt+0xc41044)
#16 0x000002aa2b341512 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (../../../llvm-project/build/bin/mlir-opt+0xc41512)
#17 0x000002aa2a910dfe main (../../../llvm-project/build/bin/mlir-opt+0x210dfe)
#18 0x000003ff9dda3aca __libc_start_main (/lib/s390x-linux-gnu/libc.so.6+0x23aca)
#19 0x000002aa2a915454 _start (../../../llvm-project/build/bin/mlir-opt+0x215454)
#20 0x0000000000000000 
Aborted (core dumped)

- Example (Removed spv.EntryPoint ==> Normalized correctly)

$ cat misc-ops-to-llvm_entrypoint.mlir
#map0 = affine_map<(d0, d1) -> (d0 floordiv 32, d1 floordiv 64, d0 mod 32, d1 mod 64)>

module {
  func @empty() {
    %0 = alloc() : memref<10x10xf32, #map0>
    return
  }
//  spv.EntryPoint "GLCompute" @empty
}
$ ../../../llvm-project/build/bin/mlir-opt  -normalize-memrefs  mi
sc-ops-to-llvm_entrypoint.mlir

module {
  func @empty() {
    %0 = alloc() : memref<1x1x32x64xf32>
    return
  }
}

@imaihal Irrespective of the current support, this behavior is a bug. This should be easily fixable. Btw, what are the trailing *s at the end? Was this just a typo?

Drive by comment since I am not really familiar with the overall conversation topic here, but it is strange that you have spv.entry_point in the module. They should exist only in spv.module. So we have a missing verification there. How is that being added.

@bondhugula Thanks for your comment. Sorry, *s is typo. (I just tried to make the line bold)
Can you fix it? Or should I investigate more?

@MaheshRavishankar Thanks for checking. Sorry, this code might not be good example.My actual code issued similar error, but it is not appropriate to write here because it requires additional code of our own dialect.
My code put similar code about entry point outside of func within module.

#map0 = affine_map<(d0, d1) -> (d0 floordiv 32, d1 floordiv 64, d0 mod 32, d1 mod 64)>

module {
  func @empty() {
    %0 = alloc() : memref<10x10xf32, #map0>
    return
  }
  <I wanted to add some line here to reproduce my error>
}

@imaihal thanks for clarifying.

Side note though. If you are using SPIR-V dialect for OpenGL case, it would be interesting to know if you have any gaps that the SPIR-V dialect has for this. There has been some contributions to enable graphics mode in SPIR-V dialect, but more would be needed there I think. If there are specific things that you need for your use case, we can try to create tasks for the community to work on.

Please do go ahead to fix it - I won’t be able to get to this in the next few days.

OK. I’ll try.

I found the error happens NormalizeMemRefs.cpp#L268

When I removed the line ( spv.EntryPoint "GLCompute" @empt ) in the example, this loop NormalizeMemRefs.cpp#L265-L331 does not go through. I am considering whether I can avoid going through the loop even when inserting the line.

I created a patch to solve the error https://reviews.llvm.org/D87746