[RFC][Standard] Memref cast ops

Compilation time is a bit hard to predict: we get more ops to traverse (and potentially a couple of extra switches/dyn-casts in the traversal code), but each op is simpler (so less switches on the “subtype” of the op in the traversal code). So ultimately I think these should be equivalent.

As for runtime, shape manipulation is insertelement/extractelement. Currently each cast unconditionally writes all elements of the new descriptor, and conditionally reads elements of the old descriptor (dynamic components only). I don’t expect much difference in these. If we allow the descriptor to contain undef/garbage for static parts of the shapes, refine_rank would become a noop in the lowered code, but it’s beyond the scope of this proposal. Rank is tricker because unranked descriptors need allocation. With a special pair of ops to create unranked memrefs, it will become easier to track the lifetime of these allocations IMO.

Regarding canonicalization, again, I expect the patterns to become simpler erase_rank / refine_rank can cancel out in absence of intermediate uses, same is true for the shapes. It should to fold chains of refines into a single op and bring it closer to its use.

We may also want a story for pseudo-void * to memref (currently done by std.view) and inverse (currently impossible) casts that would give you an equivalent of reinterpret.

We already have an std.assert, we can add a pass that sprinkles assertions around memref ops and let the user call it. Alternatively, we can have an option for the std-to-llvm conversion that injects such assertions and lets them get converted by the infra.