[HELP!] How to write loops with two induction variables and non-constant stepsize for each of them

Hi,

I want to write a MLIR code to implement a function of merging two ordered arrays into one (the merged one also ordered). The c++ of code this function looks like the following:

        void Merge(int array1[N], int array2[N], int array3[N], int n1, int n2)    {
            int i = 0, j = 0, k = 0;       
            while (i < n1 && j < n2)   { // Traverse both array.
                if (array1[i] < array2[j]) 
                     array3[k++] = array1[i++];
                else
                    array3[k++] = array2[j++];
            }
            while (i < n1)  // Copy remaining elements of first array.
                array3[k++] = array1[i++];
            while (j < n2) // Copy remaining elements of second array
                array3[k++] = array2[j++];
        }

I am trying to implement a loop which can do the same thing with loop “while (i < n1 && j < n2) { …}”, this loop need to have two induction variables and the step of the variables are not constant. I checked scf dialect, and I haven’t found a operation to implement such a loop. In Affine dialect, there is affine.parallel, it says it supports two induction variables at the same time in the loop, but the step size should be constant.

So could anyone please give me any suggestions that I can implement such a loop, which has two induction variables and non-constant step size for each of them, in mlir?

Thank you so much for your time.

Best,
Ruiqin
Student in W&M

Let me first disambiguate terminology here. By “non-constant step size” we usually understand that we have a “for” loop with an induction variable that is incremented by another variable, the value of which is unknown at compile time. That is, we do “i += K” for unknown K instead of “i += 1” and don’t know K. It is however the same value for all iterations. This is different from what we call “non-counted” loops where we don’t actually have induction variables, but some generic exit condition, for example “while” loops.

There are no “while” loops in upstream MLIR dialects, nobody actually needed them so far.

In your particular case, it is possible to rewrite your input code to use for loops and more conditionals

int pos_in_1 = 0;
int pos_in_2 = 0;
for (int i = 0; i < max(n1, n2); ++i) {
  if (pos_in_1 < n1 && pos_in_2 < n2) {
    if (array1[pos_in_1] < array2[pos_in_2])
      array3[i] = array1[pos_in_1++];
    else
      array3[i] = array2[pos_in_2++];
  } else if (pos_in_1 < n1) {
    array3[i] = array1[pos_in_1++];
  } else if (pos_in_2 < n2) {
    array3[i] = array2[pos_in_2++];
  }
}

This way, you will only have one induction variable that corresponds to the position in array3 and is always incremented (usually, an induction variable is supposed to change on every iteration of the loop).

This can be transformed into SCF in different ways, either by storing pos_in_1/pos_in_2 in memory (using a memref, which one needs to load and store) or passing them as iteration arguments of the loop. Below is a sketch of the second approach. Every scf. operation must return a pair of values that correspond to the new contents of pos_in_1 and pos_in_2.

%c0 = constant 0 : index
%c1 = constant 1 : index
%maxn = "compute the max here"
scf.for %i = %c0 to %maxn step %c1 iter_args(%pos_in_1 = %c0, %pos_in_2 = %c0) {
  %0 = cmpi "lt" %pos_in_1, %n1 : index
  %1 = cmpi "lt" %pos_in_2, %n2 : index
  %2 = and %0, %1 : i1
  %upd:2 = scf.if %2 {
    %3 = load %array1[%pos_in_1] : memref<?xi32>
    %4 = load %array2[%pos_in_2] : memref<?xi32>
    %5 = cmpi "lt" %3, %4 : i32 
    %innerupd:2 = scf.if %6 {
      store %array3[%i] : memref<?xi32>
      %6 = addi %pos_in_1, %c1 : index
      scf.yield %6, %pos_in_2 : index, index
    } else {
      scf.yield "(%pos_in_1 + 1), %pos_in_2"
    }
    scf.yield %innerupd#0, %innerupd#1 : index, index
  } else {
    scf.yield "yield new values for both pos_in_1 and pos_in_2"
  }
  scf.yield %upd#0, %upd#1 : index, index
}

You are certainly welcome to propose a generic while loop for inclusion to the SCF dialect.

Hi Alex,

Thank you so much for your suggestions in both c++ code and mlir code. It was so so helpful! Based on your c++ code and mlir code, I finished the mlir code which works functionally well!

A small modification is that I change max in to sum in the for loop, then it works well except a little problems in performance. The case is that if Array1={1, 2, 3, 6}; Array2={1,2,3}, i.e. n1=4, n2=3. In this case, we have to iterate 7 times in the following loops, actually there is no value update for Array3 in the last three loops, since we have read all the values in Array1 and Array2. If in c++, we can just use “break;” to stop the for loops before i reached the upper bound n1+n1. Is there any way to “break” the for loop in mlir?

int pos_in_1 = 0;
int pos_in_2 = 0;
for (int i = 0; i < n1+n2; ++i) {
  if (pos_in_1 < n1 && pos_in_2 < n2) {
    if (array1[pos_in_1] < array2[pos_in_2])
      array3[i] = array1[pos_in_1++];
    else if(array1[pos_in_1] > array2[pos_in_2]){
      array3[i] = array2[pos_in_2++];
   } else{ // ==
      array3[i] = array2[pos_in_2++];
      pos_in_1++;
   }
  } else if (pos_in_1 < n1) {
    array3[i] = array1[pos_in_1++];
  } else if (pos_in_2 < n2) {
    array3[i] = array2[pos_in_2++];
  }else {
     break; // HELP: How to do this in MLIR?
  }
}

Thank you so much~

(sorry for the delay).

No, there is no “break” in upstream MLIR. We did not have a use case for it. Just doing nothing in the loop body is safe, although it might be slightly less efficient (I expect later stages of CFG optimization in LLVM to be able to handle that).

Hi Alex,

Thank you for your information! I really appreciate it!

Best,
Ruiqin