Here is an interesting design question w.r.t. to a more linalgy direction or a more xla_hlo-y representation for Tensor Compute Primitives. Let’s consider a simple pointwise add. Consider these two IR representations:

the “tensor” approach:

```
%add = "tensor.add"(%lhs, %rhs) : (tensor<?xf32>, tensor<?xf32>) -> tensor<?xf32>
```

the “lifting” approach:

```
%add = "lifting.pointwise"(%lhs, %rhs) : (tensor<?xf32>, tensor<?xf32>) -> tensor<?xf32> {
^bb0(%0: f32, %1: f32)
%2 = "scalar.add"(%0, %1) : (f32, f32) -> f32
yield %2
}
```

(note that “lifting.pointwise” is a very special case of linalg.generic)

The lifting approach has the appealing property that we only need to define scalar versions of the operations, and all logic related to shape propagation, shape functions, broadcasting, tensor-vs-memref, etc. is concentrated in just the “lifting.pointwise” op.

However, the lifting approach has the downside that (at least naively) it relies the ability to fuse large amounts of computation into the body of the regoin in order to do even simple arithmetic simplification. Consider the user code:

```
x = add(t0, t1)
if cond():
y = sub(x, t1)
print(y)
else:
y = sub(x, t0)
print(y)
# No other uses of `x` or `y`.
```

We would like to optimize this into the code

```
if cond():
print(t0)
else:
print(t1)
```

With the “tensor” approach, this is a relatively straightforward application of a fold like `sub(add(t0, t1), t1) -> t0`

. However, with the “lifting” approach the IR for this becomes much harder to follow and requires looking through regions to find this simple identity. Here, control flow presented a barrier that made it impossible to fuse everything into a single “lifting.pointwise” (at which point the scalar application of the `sub(add(t0, t1), t1) -> t0`

pattern would trivially apply).

Thoughts? In some ways, I feel like the “lifting” approach feels more principled, and linalg shows that it scales to a large family of ops. However, it seems hard to beat the intuitive simplicity of the “tensor” approach when it comes to extending many basic optimizations to the tensor domain. On the other hand, I feel like the “tensor” approach might become less “intuitively simple” once broadcasting, shape propagation, tensor-vs-memref, etc. come into the picture. With the right traits though, perhaps it will be palatable? Maybe we need to discuss those other topics first.