Bug in --convert-linalg-to-affine-loops?

Hello,

This looks like a bug, but it’s kind of big, so maybe it’s me not getting the semantics correctly.

I’m converting the following code:

  func @test(%750: tensor<1x7x7x512xf32>) -> (tensor<1x7x7x512xf32>) {
    %247 = "tf.Const"() {value = dense<1.0> : tensor<3x3x512x512xf32>} : () -> tensor<3x3x512x512xf32>
    %751 = "tf.Conv2D"(%750, %247) {data_format = "NHWC", device = "", dilations = [1, 1, 1, 1], explicit_paddings = [], padding = "SAME", strides = [1, 1, 1, 1], use_cudnn_on_gpu = true} : (tensor<1x7x7x512xf32>, tensor<3x3x512x512xf32>) -> tensor<1x7x7x512xf32>
    return %751: tensor<1x7x7x512xf32>
  }

Through several translations, I get to linalg on buffers, as follows:

global_memref "private" constant @__constant_3x3x512x512xf32 : memref<3x3x512x512xf32> = dense<1.000000e+00>
func @test(%arg0: memref<1x7x7x512xf32>, %arg1: memref<1x7x7x512xf32>) {
    %0 = get_global_memref @__constant_3x3x512x512xf32 : memref<3x3x512x512xf32>
    %cst = constant 0.000000e+00 : f32
    linalg.fill(%arg1, %cst) : memref<1x7x7x512xf32>, f32 
    linalg.conv_2d_input_nhwc_filter_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %0 : memref<1x7x7x512xf32>, memref<3x3x512x512xf32>) outs(%arg1 : memref<1x7x7x512xf32>)
    return
}

Then, by applying mlir-opt --convert-linalg-to-affine-loops I obtain:

global_memref "private" constant @__constant_3x3x512x512xf32 : memref<3x3x512x512xf32> = dense<1.000000e+00>
func @test(%arg0: memref<1x7x7x512xf32>, %arg1: memref<1x7x7x512xf32>) {
    %cst = constant 0.000000e+00 : f32
    %0 = get_global_memref @__constant_3x3x512x512xf32 : memref<3x3x512x512xf32>
    affine.for %arg2 = 0 to 1 {
      affine.for %arg3 = 0 to 7 {
        affine.for %arg4 = 0 to 7 {
          affine.for %arg5 = 0 to 512 {
            affine.store %cst, %arg1[%arg2, %arg3, %arg4, %arg5] : memref<1x7x7x512xf32>
          }
        }
      }
    }
    affine.for %arg2 = 0 to 1 {
      affine.for %arg3 = 0 to 7 {
        affine.for %arg4 = 0 to 7 {
          affine.for %arg5 = 0 to 512 {
            affine.for %arg6 = 0 to 3 {
              affine.for %arg7 = 0 to 3 {
                affine.for %arg8 = 0 to 512 {
                  %1 = affine.apply #map(%arg3, %arg6)
                  %2 = affine.apply #map(%arg4, %arg7)
                  %3 = affine.load %arg0[%arg2, %1, %2, %arg8] : memref<1x7x7x512xf32>
                  %4 = affine.load %0[%arg6, %arg7, %arg8, %arg5] : memref<3x3x512x512xf32>
                  %5 = affine.load %arg1[%arg2, %arg3, %arg4, %arg5] : memref<1x7x7x512xf32>
                  %6 = mulf %3, %4 : f32
                  %7 = addf %5, %6 : f32
                  affine.store %7, %arg1[%arg2, %arg3, %arg4, %arg5] : memref<1x7x7x512xf32>
                }
              }
            }
          }
        }
      }
    }
    return
}

There are two problems here:

  • First of all, the semantics of the tf.Conv2D “SAME” is not respected (the padding is not explicitly considered)
  • The accesses to %arg0, in the innermost loop, will get outside the bounds on indices %1 and %2.

D.

I feel there are errors or issues before you get to Linalg on buffers. Linalg infers the loop boundary from operand shapes. If the operand shape is incorrect, then you will access out of boundary. Currently Linalg ops don’t verify if the shape is correct. Inho is working on Linalg ops verification when they are static: ⚙ D98390 Added static verification for Linalg Ops.

For the first issue, I think it is important that how you get to Linalg on buffers. E.g., we extract the padding attributes from mhlo.conv in IREE, so you will see a linalg.pad_tensor in one dispatch function and linalg.fill + linalg.conv_* in another dispatch function. It is important that how you get to the function. If the shapes are not correct, maybe something goes wrong before you get here.

For the second issue, I feel the inputs for linalg. conv_2d_input_nhwc_filter_hwcf are not valid, ie, the shapes are not correct. In this case, you will hit out-of-bound accessing.

Ok, I found the bug, in my code. Thanks!
D.

Hi,

If possible, would you mind sharing the commands implementing the several translations to lower from tf to linalg/affine dialects?

Thank you in advance

No problem, but there’s a bug. Let me first find a correct way of doing it. :slight_smile:

Dumitru

1 Like

Here’s a simpler example of the bug for linalg.matmul.

I’m using a file named linalg-matmul.mlir with just a simple call to linalg.matmul:

func @matmul(%arg0: memref<2x4xf32>,
             %arg1: memref<4x8xf32>,
             %arg2: memref<2x8xf32>) {
  linalg.matmul
    ins(%arg0, %arg1: memref<2x4xf32>,
                      memref<4x8xf32>)
    outs(%arg2: memref<2x8xf32>)
  return
}

Run the following command from llvm-project/build directory:

./bin/mlir-opt linalg-matmul.mlir --convert-linalg-to-affine-loops="interchange-vector=2,0,1"

This yields

module  {
  func @matmul(%arg0: memref<2x4xf32>, %arg1: memref<4x8xf32>, %arg2: memref<2x8xf32>) {
    affine.for %arg3 = 0 to 4 {
      affine.for %arg4 = 0 to 2 {
        affine.for %arg5 = 0 to 8 {
          %0 = affine.load %arg0[%arg3, %arg5] : memref<2x4xf32>
          %1 = affine.load %arg1[%arg5, %arg4] : memref<4x8xf32>
          %2 = affine.load %arg2[%arg3, %arg4] : memref<2x8xf32>
          %3 = mulf %0, %1 : f32
          %4 = addf %2, %3 : f32
          affine.store %4, %arg2[%arg3, %arg4] : memref<2x8xf32>
        }
      }
    }
    return
  }
}

It’s getting the loop reordering correctly, but the indices are not being reordered accordingly resulting in incorrect access to the matrices.

It should be

module  {
  func @matmul(%arg0: memref<2x4xf32>, %arg1: memref<4x8xf32>, %arg2: memref<2x8xf32>) {
    affine.for %arg3 = 0 to 4 {
      affine.for %arg4 = 0 to 2 {
        affine.for %arg5 = 0 to 8 {
          %0 = affine.load %arg0[%arg4, %arg3] : memref<2x4xf32>
          %1 = affine.load %arg1[%arg3, %arg5] : memref<4x8xf32>
          %2 = affine.load %arg2[%arg4, %arg5] : memref<2x8xf32>
          %3 = mulf %0, %1 : f32
          %4 = addf %2, %3 : f32
          affine.store %4, %arg2[%arg4, %arg5] : memref<2x8xf32>
        }
      }
    }
    return
  }
}

There is a test pass in the affine dialect that does this correctly from affine loops. Run the command

./bin/mlir-opt linalg-matmul.mlir -convert-linalg-to-affine-loops -test-loop-permutation="permutation-map=1,2,0"

(Note that for some reason permutation-map is the inverse of interchange vector). This will yield

module  {
  func @matmul(%arg0: memref<2x4xf32>, %arg1: memref<4x8xf32>, %arg2: memref<2x8xf32>) {
    affine.for %arg3 = 0 to 4 {
      affine.for %arg4 = 0 to 2 {
        affine.for %arg5 = 0 to 8 {
          %0 = affine.load %arg0[%arg4, %arg3] : memref<2x4xf32>
          %1 = affine.load %arg1[%arg3, %arg5] : memref<4x8xf32>
          %2 = affine.load %arg2[%arg4, %arg5] : memref<2x8xf32>
          %3 = mulf %0, %1 : f32
          %4 = addf %2, %3 : f32
          affine.store %4, %arg2[%arg4, %arg5] : memref<2x8xf32>
        }
      }
    }
    return
  }
}

If you paste the output with -canonicalize, the accesses will be easier to see. (The affine.apply's will go away.)

Looks like a bug with the interchange vector.
I’ll take a look within a few days, thanks for reporting!

1 Like

Thank you. By the way, this also happens with -convert-linalg-to-loops when using interchange vector option

Yes, it’s the usage of interchange when lowering to loops that only does 1/2 of the work.
The simple step forward is to kill that redundant option and just call interchange on the linalg op before lowering to loops so that this composes more nicely.

I think @gysit is going to take a look at that sometime next week.

Yah that would be great. I was wondering why there was no stand-alone permute loop pass for the linalg dialect

There is a transformation and a pattern to do this. Grep for interchange in linalg/transforms.h. there is no pass because in general passes are not a good level of control for this type of transformations.

Oh cool. Thanks for pointing me to that

It looks like this transformation wouldn’t work for linalg.matmul because it doesn’t pass the conditions, namely it’s not a generic op. Is this expected? What is the canonical way to interchange loops for matmul in linalg dialect? Thanks again

You can call -linalg-generalize before that to turn a named op into generic form.

Thanks. That works pretty well. Although it looks like some permutations don’t work. Maybe I’m making a mistake but has anyone noticed a bug with this? I’ll try to post a reproducible example soon. But for example in the meantime, an interchange vector of {2, 0, 1} worked and {2, 1, 0} didn’t. I’ve also made sure that both interchangeGenericLinalgOpPrecondition() and LinalgBaseInterchangePattern() return success()

Here’s my attempt at using LinalgInterchangePattern. Sorry if there is more code than there needs to be. I tried to trim down from a bigger file, but I’m kinda new to this.

I wrote a file called interchange.cpp with the following contents

#include "llvm/Support/CommandLine.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/ToolOutputFile.h"
#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/Linalg/Transforms/CodegenStrategy.h"
#include "mlir/Dialect/StandardOps/Transforms/Passes.h"
#include "mlir/ExecutionEngine/OptUtils.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/MLIRContext.h"
#include "mlir/InitAllDialects.h"
#include "mlir/InitAllPasses.h"
#include "mlir/Parser.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Support/FileUtilities.h"
#include "mlir/Support/MlirOptMain.h"
#include "mlir/Transforms/Passes.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

#include <filesystem>
#include <fstream>
#include <string>
#include <unistd.h>

using namespace mlir;
using namespace mlir::linalg;
using llvm::Error;
using llvm::Expected;
using llvm::StringError;
using llvm::Twine;

namespace mlir {

namespace {
namespace cl = llvm::cl;
struct Options {
  cl::opt<std::string> inputFile{cl::Positional,
    cl::desc("the input .mlir file"),
    cl::init("")};

  // Matrix multiplication sizes
  cl::opt<int> M{"M", cl::Required,
    cl::desc("Number of rows of first matrix"),
    cl::init(2)};
  cl::opt<int> N{"N", cl::Required,
    cl::desc("Number of column of second matrix"),
    cl::init(4)};
  cl::opt<int> K{"K", cl::Required,
    cl::desc("Number of columns of first matrix"),
    cl::init(8)};

  cl::opt<std::string> interchangeVector{"interchange-vector",
    cl::desc("' '-separated tuple. Specifies the desired permutation"),
    cl::init("0 1 2")};
};
}

template <class T>
static void convertToVector(const std::string &tileSizes, llvm::SmallVectorImpl<T> &sizes) {
  std::stringstream ss(tileSizes);
  int size;
  while (ss >> size) {
    sizes.push_back(size);
  }
}

namespace {
struct InterchangeMatmulLoops : public PassWrapper<InterchangeMatmulLoops, FunctionPass> {

  InterchangeMatmulLoops() = default;
  InterchangeMatmulLoops(SmallVector<unsigned, 4> interchange) {
    interchangeVector = interchange;
  }

  void runOnFunction() override {
    MLIRContext *context = getFunction().getContext();
    OwningRewritePatternList patterns;
    patterns.insert<LinalgInterchangePattern<GenericOp>>(context, interchangeVector);
    DimOp::getCanonicalizationPatterns(patterns, context); // not sure if i need this
    applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
  }

  SmallVector<unsigned, 4> interchangeVector = {0, 1, 2};
};

std::unique_ptr<OperationPass<FuncOp>> createInterchangeMatmulLoopsPass(std::string interchange_str) {
  llvm::SmallVector<unsigned int, 4> interchange;
  convertToVector(interchange_str, interchange);
  return std::make_unique<InterchangeMatmulLoops>(interchange);
}
}
}

//===----------------------------------------------------------------------===//


/// Wrap a string into an llvm::StringError.
static Error make_string_error(const Twine &message) {
  return llvm::make_error<StringError>(message.str(), llvm::inconvertibleErrorCode());
}

Error compile(Options &options, mlir::DialectRegistry &registry) {
  MLIRContext context;
  registry.loadAll(&context);
  llvm::errs() << "Read file: " << options.inputFile << "\n";
  OwningModuleRef moduleRef = parseSourceFile(options.inputFile, &context);
  if (!moduleRef)
    return make_string_error(Twine("could not open ") + options.inputFile);

  ModuleOp module = *moduleRef;
  PassManager pm(module.getContext(), OpPassManager::Nesting::Implicit);
  pm.addPass(createLinalgGeneralizationPass());
  pm.addPass(createInterchangeMatmulLoopsPass(options.interchangeVector));
  pm.addPass(createConvertLinalgToLoopsPass());

  if (failed(pm.run(module))) {
    return make_string_error(Twine("error compiling to llvm backend"));
  }

  std::string moduleStr;
  llvm::raw_string_ostream ss(moduleStr);
  ss << module;

  std::string name = std::filesystem::path(std::string(options.inputFile)).stem();
  name += "-interchanged.mlir";
  std::ofstream output(name);
  output << moduleStr;
  output.close();

  return Error::success();
}

int main(int argc, char **argv) {
  mlir::DialectRegistry registry;
  mlir::registerAllDialects(registry);
  mlir::registerAllPasses();

  llvm::InitializeNativeTarget();
  mlir::registerPassManagerCLOptions();

  Options options;
  llvm::cl::ParseCommandLineOptions(argc, argv, "interchange\n");

  auto error = compile(options, registry);
  int exitCode = EXIT_SUCCESS;
  llvm::handleAllErrors(std::move(error), [&exitCode](const llvm::ErrorInfoBase &info) {
    llvm::errs() << "Error: ";
    info.log(llvm::errs());
    llvm::errs() << '\n';
    exitCode = EXIT_FAILURE;
  });
  return exitCode;
}

I’ll be using the same starting mlir as before (a call to linalg.matmul of size 2x4x8).

Run the command

./interchange linalg-matmul.mlir -M 2 -N 4 -K 8 -interchange-vector "2 0 1"

This yields the following

module  {
  func @matmul(%arg0: memref<2x8xf64>, %arg1: memref<8x4xf64>, %arg2: memref<2x4xf64>) {
    %c2 = constant 2 : index
    %c8 = constant 8 : index
    %c4 = constant 4 : index
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    scf.for %arg3 = %c0 to %c8 step %c1 {
      scf.for %arg4 = %c0 to %c2 step %c1 {
        scf.for %arg5 = %c0 to %c4 step %c1 {
          %0 = load %arg0[%arg4, %arg3] : memref<2x8xf64>
          %1 = load %arg1[%arg3, %arg5] : memref<8x4xf64>
          %2 = load %arg2[%arg4, %arg5] : memref<2x4xf64>
          %3 = mulf %0, %1 : f64
          %4 = addf %2, %3 : f64
          store %4, %arg2[%arg4, %arg5] : memref<2x4xf64>
        }
      }
    }
    return
  }
}

It correctly permutes the loop order as well as the indices correspondingly.

Now run

./interchange linalg-matmul.mlir -M 2 -N 4 -K 8 -interchange-vector "2 1 0"

This produces the following

module  {
  func @matmul(%arg0: memref<2x8xf64>, %arg1: memref<8x4xf64>, %arg2: memref<2x4xf64>) {
    %c2 = constant 2 : index
    %c8 = constant 8 : index
    %c4 = constant 4 : index
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    scf.for %arg3 = %c0 to %c2 step %c1 {
      scf.for %arg4 = %c0 to %c4 step %c1 {
        scf.for %arg5 = %c0 to %c8 step %c1 {
          %0 = load %arg0[%arg3, %arg5] : memref<2x8xf64>
          %1 = load %arg1[%arg5, %arg4] : memref<8x4xf64>
          %2 = load %arg2[%arg3, %arg4] : memref<2x4xf64>
          %3 = mulf %0, %1 : f64
          %4 = addf %2, %3 : f64
          store %4, %arg2[%arg3, %arg4] : memref<2x4xf64>
        }
      }
    }
    return
  }
}

which isn’t correct. In fact, nothing has been permuted.

Am I doing something wrong here, or is this a bug? Thanks again for your help.

I suspect the problem is that the pattern is applied multiple times meaning the output permutation may be equivalent to the input permutation. Note in contrast to a conversion pattern that for example transforms a linalg op into a loop nest the interchange pattern applies again to its own output. The pattern will thus apply repeatedly until you reach some upper limit for the number of pattern applications.

One way to avoid the problem is to add a transformation filter that checks if the pattern has been applied before (using an attribute):

  patterm.add<LinalgInterchangePattern<GenericOp>>(
      context, interchangeVector,
      LinalgTransformationFilter(ArrayRef<Identifier>{},
                                 Identifier::get("interchange", context)));

Also note that I am working on a patch to make the interchange pattern available from the command line ⚙ D100627 [mlir][linalg] adding pass to run the interchange pattern.. Feel free to give it a try and let me know if the problems persist. The command line to invoke the pattern should look as follows:

mlir-opt linalg-matmul.mlir --linalg-generalize-named-ops -test-linalg-transform-patterns=test-interchange-pattern=2,1,0 --convert-linalg-to-loops

I hope to land this later this week.

Tobias

Thanks a bunch. It might take me a couple days to get back to you, but I’ll try this out and let you know.

Both your suggestion for LinalgInterchangePattern and your patch worked great. Thanks again!