The current *structured for* operation does not support loop carried dependencies without going through memory. This has been brought up as a limitation multiple times and I propose to add support. The proposal is a write up of ideas that were discussed in chat and conversations previously. It is a focused change to `loop.for`

but we might extend to `affine.for`

if desired.

The basic structure would be as follows

```
%res = loop.for %iv = %lb to %ub step %step args %arg0 {
^bb0(%0 : tensor<32xf32>):
...
loop.yield %val0 : tensor<32xf32>
} : (tensor<32xindex>) -> tensor<32xf32>
```

with the expected constraint that `%arg0`

, `%val0`

, `%0`

and `%res`

have the same type. The semantic would be that `%0`

is assigned the value of `%arg0`

before the first iteration and the value of `%val0`

at the end of iterations. On termination of the loop, the last value of `%0`

is assiged to `%res`

.

The example above has a single carried value. It extends to multiple values in the expected way

```
%res:2 = loop.for %iv = %lb to %ub step %step args %arg0, %arg1 {
^bb0(%0 : tensor<32xf32>, %1 : tensor<16xf64>):
...
loop.yield %val0, %val1 : tensor<32xf32>, tensor<16xf64>
} : (tensor<32xindex>, tensor<16xf64>) -> (tensor<32xf32>, tensor<16xf64>)
```

I assume the structure is non-controversial. I welcome suggestions on syntax.