LLVM Discussion Forums

[RFC] Python casts for attribute values

In the python bindings, we have generally overloaded the __str__ method to return the corresponding MLIR assembly of an object. Example:

module = ctx.parse_module("...")
print(module)  # Calls __str__ and prints assembly

attr = ctx.parse_attr('"mystring"')
print(attr)  # Calls __str__ and prints the assembly form "mystring"

In trying to use the API, I’ve found this more awkward than intended and propose that we avoid defining the __str__ method unless if we are intending that the instance represents something that should be considered a Python str. I propose that instead we:

  • Define the __repr__ method to print a debug-friendly decorated ASM form for “small” objects like Type and Attribute (i.e. so that repr(my_tensor_type) prints a debug friendly string like Type(tensor<...>)).
  • Define the __repr__ method for “large” objects like Operation and Module to print a debug summary such as Module(<id>) or Operation(mydialect.my_op) for debugging what the object is without getting into trouble of accidentally dumping megabytes of text for logs and prints.
  • Define a to_asm method on any ASM-printable object. By default, it would print the ASM with default options. We could extend it with an AsmPrintOptions optional argument at a later date.

This frees up the __str__ method for what I believe to be a better aligned purpose: casting things that should be treated as strings according to the rules Python has for such things. This will be primarily helpful with attributes.

Consider that currently, extracting the Pythong str value from a StringAttribute looks like this:

sattr = ctx.parse_attr('"mystring"')
sval = StringAttr(sattr).value

I would like to define the __str__ method on the generic Attribute base class so that it implies a “cast” through StringAttr:

sval = str(sattr)  # Returns the str containing text "mystring" (no quotes)
print(sattr)  # Prints "mystring" (no quotes)

I would make this strict so that it only works if the attribute is a StringAttr. This would throw an exception:

iattr = ctx.parse_attr(1)
sval = str(iattr) # Raises ValueError: attribute cannot be implicitly converted to a str

We could then also define __int__ and __float__ to similar effect for IntAttr, FloatAttr, and IndexAttr access:

iattr = ctx.parse_attr(1)
i = int(iattr) # i = 1
s = str(iattr) # Raises ValueError: attribute cannot be implicitly converted to a str

The last point does create a speed-bump because it breaks the Python notion that things that are “int-ey” or “float-ey” should be convertible to a str via a call to str(foo). We can either accept this (and be more strict/correct) or special case the __str__ attribute conversion to follow the python rules by attempting to downcast to an IntAttr or FloatAttr or StringAttr and doing the conversion. If we do diverge, it means that printing and logging objects of unknown providence needs to be done this way to generically print (which is arguably the most right anyway, since presuming that things are str convertible is trouble – but many will do it anyway):

myattr = ... # Some attribute of unknown providence
print(repr(myattr))
print("some format message: {!r}".format(myattr))
print("some format %r" % (myattr,))
print(logging.info("some message: %r", myattr)

I think I prefer this approach because it gives us some nice, pythonic shorthands for accessing values, preserves the MLIR type hierarchy, and gives us a more full featured way to access the assembly form.

Preferences?

In my opinion, we should not raise a ValueError when calling the str(iattr). If the int(iattr) can return 1, as for users, they do want to get a '1' when call the str(iattr), so It’s better to implement the str conversion separately for different attributes I think.

I am curious about the attribute of unknown providence and the generically printing approach. Does it mean we only need to bind the __repr__, without binding __str__, __int__ and __float__, and different attributes (StringAttr, IntAttr, etc.) share the same __repr__ binding?

Yes, we should just bind __repr__ on the PyMlirAttribute base class.

I think I’m ok with this specifically for the int/float case (but not others). MLIR attributes have specific types and the default behavior where if an object can’t be __str__ converted, its __repr__ is returned instead is problematic: it is really easy to get debug printouts and such in files when you should have gotten an error.

I’m with you on that!

I’ve been fiddling more with this and with some other IR python bindings and believe I have convinced myself to discard this RFC. I think that as an IR, the common case of getting assembly from the __str__ cast makes sense and is consistent with others. We may want to provide a convenience value attribute on PyAttribute that applies some default checks/conversions for extracting an appropriate Python primitive, but I think this is the minority case, has a number of complexities to it, and we shouldn’t bend the rest of the API around it.

I agree with this analysis.

My first reaction was that I wouldn’t expect str(x) to throw. It may give useless information, like <mlir.ir.Type at 0x7f1f3f3490>, but it should work. We may still want to control the size of the output, and just trim the result of __str__ to some, ideally configurable, value while .to_asm() provides the full assembly. For core IR objects, we can be more inventive about trimming if we do some of the traversal ourselves.