What is the semantics of memref<0xf32> and tensor<0xf32>

Hello,

It appears that 0 is an acceptable size for a tensor dimension.
For instance, memref<0xf32> and tensor<0xf32> are accepted by mlir-opt.
I was wondering what is the semantics of memrefs/tensors of dimension 0.
Are there particular corner cases to watch when dealing with them?

Best,
D.

I believe tensors of dimension zero are disallowed. (I ran into a case with tensors recently where this was improperly handled).

Steve

Tensors of dim 0 are allowed and produced by frontend like TF. You may have hit a bug in a lowering pass somewhere.

Currently they are primarily used to convey an element type and use due to a general lowering/generation rule. It happens in a few places and one can actually produce a non-empty tensor from an empty one (slicing out from a 3x0x3 a 3x3 tensor, these are funky for me but this is at the application level where different frontends can make different decisions). In the type self, there is no semantics additionally for it. It just carries no value and has size of 0 (even though the shape is different between 3x0x2 and 3x0x4, and shape is retained, the number of elements match) AFAIK.

Exactly, I got it from Keras/TF input from a slicing operation (in the kws_streaming back-end). But still, if there is no intended use for it, why not simply outlaw it - in our case, it was something incorrect.

Sorry, maybe I misunderstood the original question and was confusing in my response.
I think there are two different things:

  1. a tensor with a dimension that has size zero. e.g. tensor<0xf32> or tensor<0x4x0xf32>
  2. a tensor with zero dimensions. I don’t think there is a way to write this?

Case 2) came up recently for me because I tried vectorizing a matrix multiply and got this:

#map = affine_map<(d0, d1) -> (0)>
        affine.for %arg1 = 0 to 16 {
        affine.for %arg2 = 0 to 16 step 8 {
          affine.for %arg3 = 0 to 16 {
            %63 = vector.transfer_read %52[%arg1, %arg3], %cst {permutation_map = #map} : memref<16x16xf32, 2>, vector<8xf32>
            %64 = vector.transfer_read %50[%arg3, %arg2], %cst : memref<16x16xf32, 2>, vector<8xf32>
            %65 = vector.transfer_read %48[%arg1, %arg2], %cst : memref<16x16xf32, 2>, vector<8xf32>
            %66 = mulf %63, %64 : vector<8xf32>
            %67 = addf %65, %66 : vector<8xf32>
            vector.transfer_write %67, %48[%arg1, %arg2] : vector<8xf32>, memref<16x16xf32, 2>
          }
        }
      }

But vector-to-llvm conversion didn’t handle permutation_map, so I tried to remove it using some of the vectortransformations that rewrite permutation_maps like this into a simpler transfer_read plus a broadcast. However, this too failed because the code involved tried to generate a transfer_read referring to a memref type with zero dimensions. My solution was to convert the transfer_read to a scalar load. (see ⚙ D103432 [mlir] Handle cases where transform_read should turn into a scalar load)

tensor<f32>?

Yes, tensor<f32> is correct, and it’s handled correctly. It seems to be a “semantic hack” that allows every scalar to be treated as a tensor to please some verification algorithms (even though an f32 already has the value semantics the tensor<XXX> is intended to have).

My question concerned a dimension that is set to 0, like in memref<0xf32> or memref<1x0x3xf32>. In C such a construct would be forbidden by the language. But then, it can also be interpreted as an empty list…

I haven’t followed closely what actually happens in code with 0x stuff.

I’d classify scalar <-> tensor interpretations as semantic hacks too as @dpotop says. They seem to be considered to satisfy invariants that are closer to “frontend” considerations. IMO this bears resemblance to the signed/unsigned discussion.

memref<f32> always reduces to a pointer to an f32.

vector<f32> is currently disallowed and should be added but I haven’t gotten to it yet (see. Should we have 0-D Vectors?). The lack of a vector<f32> is the source of annoying corner cases like the one @stephenneuendorffer fixes.

IMO, 0xf32 isn’t a thing because I think its interpretation is ambiguous and depends on the intent of whoever produces it. It may be either:
a. empty (i.e. 0 elements is nothing),
b. a [0] shape of an f32 which may be interpreted as just an f32 (i.e. a 1xf32 but with a [0]shape …)
c. an empty shape of an f32 which may be interpreted as just an f32 (i.e. a 1xf32 but “without a / with an empty” shape …)
d. the example @jpienaar gave.

I’d personally prefer we’d handle the 0x the same way as we do for signed/unsigned: they have to disappear before getting too low in the stack of dialects (i.e. should not be visible below TF/xHLO).

I’d argue that to keep the separation clean, TF/xHLO should have their own type that allow 0x and we should flatly disallow it from tensor, memref and vector.

Ok, so upto now this was an abstract question. However, here what I find while looking into the training function of a very simple network (an input layer followed by a dense layer):
%21 = "tf.Sum"(%20, %3) {device = "", keep_dims = false} : (tensor<f32>, tensor<0xi32>) -> tensor<f32>
Note the type of the second argument, which is defined as:
%3 = "tf.Const"() {value = dense<> : tensor<0xi32>} : () -> tensor<0xi32>
What’s the meaning of this?

It’s in a part of the code that does not influence the result of the gradient descent, but still. BTW, the amount of generated TF code that does not serve is impressing.

Does it do anything? I’m trying to remember ancient conversations – I think such things are an artifact of some internal loops in gradient computation that sometimes have zero iterations, and instead of eliding them at the source, the frontend splats out the structure but operating on these “empty” tensors.

It should be a compiler’s job to reduce/strip these out at the frontend, and they should never survive to codegen I believe.

“Impressing” is one word for it.

This seems to be boilerplate code that computes some stats on the quality of the model. But the results are simply stored (to be used by the driver, I guess), never used in the gradient descent process…

What is the issue with empty arrays at the code generation level? I’d argue that it is a bug if something breaks due to them.

This is similar to loops with tripcount 0. Yes, if statically known one might want to elide them but a code generator should be able to handle them without breaking.

From a language perspective, having empty arrays and 0-d arrays is convenient, as one does not have to handle these special cases when writing code.

For the slicing operation, slicing out of an empty array is well typed but known to always fail at runtime, as any indexing into an empty array is out of bounds.

For tf.Sum with empty reduce dimensions, I would expect it to be the identity.

I think that tensors with a dimension of size 0 are not “just a frontend concern” that is going to boil away before we get lower into the system. I claim that the following PyTorch functions (which have trivial analogs in every tensor programming system) are legal and should be supported by every reasonable compiler for those systems.

>>> import torch
>>> def f(t):
...     return t[1:1]
... 
>>> f(torch.tensor([2,3,4]))
tensor([], dtype=torch.int64)
>>> def g(t: torch.Tensor, start: int, end: int):
...     return t[start:end]
... 
>>> g(torch.tensor([2,3,4]), 1, 1)
tensor([], dtype=torch.int64)

A real-world example where this is relevant is e.g. a program that is maintaining a buffer of accumulated audio samples and a function to get the accumulated audio samples. If 0 samples have been accumulated, the result will be a tensor with a dimension of size 0.

I also don’t see any relevance to the question of whether “0x” in the type is allowed or not. It’s really a question about the semantics of the type, and whether dynamically 0 is allowed (whether we happen to statically know it is 0 is immaterial to the discussion). Examples like the one I showed above suggest that we do need to support the use case of a dimension of size 0 in the builtin tensor type.

Also, as a frontend author, it’s quite inconvenient for me to multiversion my code around every single potentially-zero case. E.g. in the following example, if the size of tt is not known statically, I would expect that lower levels of the stack should create a matmul kernel that cleanly handles the zero case. Is there some fundamental reason this is hard? It seems like using half-open loops for iteration naturally handles this.

>>> tt
tensor([], size=(0, 1), dtype=torch.int64)
>>> torch.matmul(tt, tt.t())
tensor([], size=(0, 0), dtype=torch.int64)