Does MLIR support string (or char*) type?

In MLIR, we can define an int variable like this:

%a = constant 10 : i32

Does it support a string variable (or char*) defined in a similar way? For example:

%s = constant "I am a string" : string

The standard-dialect constant operation does not support strings and there is no builtin string type (see Builtin Dialect - MLIR), but you can define a string type in your dialect and create an operation that would construct a value of that type. For instance, the TensorFlow dialect has a string type: tensorflow/tf_types.def at 104959a3051b4df05fb380588e9ef517b1a422e2 · tensorflow/tensorflow · GitHub (though it’s got a few extra layers of macros and such in the definition). Defining Dialect Attributes and Types - MLIR explains how to create your own type.

N.B.: The OpaqueType in the builtin dialect is represented as a string literal, but I don’t think that’s what you want.

Thank you for your info @gcmn , I appreciate the information.

More question: I want to call a runtime function in mlir call @getString(){stringvalue="aaaa"} : () -> (). The runtime function is implemented in C void getString() {...}. Is there any method that I can pass the string from mlir to C, i.e. is there a method that I can get the stringvalue attribute in the C implementation?

Thank you in advance!

Hi @rqtian, I recently faced the same problem: How to represent values of type “string” in MLIR and how to pass them to pre-compiled C functions at run-time. Note that I’m still a learner in MLIR and LLVM, so the solution might not be perfect (and I’d indeed appreciate feedback from other users). In my case, making it work at all was the main goal, not making it work efficiently. Here is a sketch of how I did it:

String type

As @gcmn suggests, I defined a string type in my own dialect, as simple as:


def String : MyDialect_Type<"String"> {

let summary = "string";

}

How to carry strings through the IR

One option is to attach a StringAttr to your operation, as in your comment above (but for that, you don‘t even need a string type). Another option is to create a kind of StringConstantOp in your dialect, which is mostly similar to the existing ConstantOp, but has a StringAttr and is of result type String (the one defined above).

Passing a string known at compile-time to a run-time C function:

Chapter 6 of the Toy tutorial (Chapter 6: Lowering to LLVM and CodeGeneration - MLIR and llvm-project/LowerToLLVM.cpp at main · llvm/llvm-project · GitHub) shows how to rewrite a custom operation (toy::PrintOp) to a call to the printf C function during the lowering to the LLVM dialect. The trick is to pass the string as a !llvm.ptr<i8>, which corresponds to a char* in C.

Approach A) In the IR, you can get a mlir::Value holding an mlir::LLVM::LLVMPointer to your string using mlir::LLVM::createGlobalString(). This creates a global storing your string, gets the address of it, and calculates a pointer to the first element of it. The source code of the Toy tutorial seems to do something similar to this function (see getOrCreateGlobalString in the file mentioned above), but implements it itself. Note that you need to specify a name for the global string.

Approach B) Another option, which does not require the specification of a name, could be to create a buffer using the mlir::LLVM::AllocaOp with type !llvm.ptr<i8> and the size of your string. The result of this operation is the pointer you can pass to your function call. To store your string in this buffer, you could copy over the characters of your string one-by-one using mlir::LLVM::GEPOp and mlir::LLVM::StoreOp (make sure to append a \0 at the end if required).

If you chose to attach your string as an attribute to your operation, you need to lower it according to either approach A or B, plus to a call to your C function.

If you chose to create that StringConstantOp, you need to lower it using either approach A or B, and the C function call is separate from it. Finally, in your lowering pass to the LLVM dialect, you need to add a type conversion of your String type to !llvm.ptr<i8>.

Hope that helps.


As stated above, I’d be happy about feedback on this solution. I personally chose approach B in combination with the StringConstantOp. With approach A, I wasn’t sure how to choose the names of the globals. There could be many of such globals in my case, so I would have just used a counter, but wasn’t sure if it must be thread-safe etc. (Remember, my main goal was to make it work at all…)

1 Like

Great! I have the same problem. I am trying to def a new type on toy dialect ,which can emit string literal to mlir.
Once I finish my job, i will share my code to you.

Thank you so much @pdamme , these are super helpful information. I will try your suggestions and share the experiences~

FYI, the LLVM dialect has string globals - 'llvm' Dialect - MLIR.

Is there a standard way to create a unique symbol name for these globals?

SymbolTable::insert will autorename on collision.

1 Like

How does insertion of operations into the symbol table work?
When I create two GlobalOps with the same name (e.g. using createGlobalString), then I get: error: redefinition of symbol named '...'.
Why are they not autorenamed?

Because you need to create a SymbolTable instance and call .insert on it after creating the op, like here llvm-project/TensorConstantBufferize.cpp at 9d4896f50e441ea5b9e8ae78ebe328e006cb6b67 · llvm/llvm-project · GitHub. Symbol handling is orthogonal from op creation, there may be cases that actually want duplicate symbols to exist, e.g., to merge them in a later pass. See also Symbols and Symbol Tables - MLIR.

1 Like

When creating my SymbolTable(moduleOp) instance, I run into an assertion: expected region to contain uniquely named symbol operations.
My pass which uses the SymbolTable to create a global string is run within an OpConversionPattern inside of an OperationPass<ModuleOp>. The symbols which are reported to be duplicate belong to FunctionOp, which indeed seem to get duplicated during this pass.
Could this be an artifact of other lowerings happening at the same time (LoopToStd, MemRefToLLVM, StdToLLVM)? Is creating a SymbolTable not safe within an OpConversionPattern? But your example seems to do it in the same way.
Any ideas?

SymbolTable does not know about OpConversionPattern and vice versa. The situation is a bit complex here, but your diagnosis looks right. Inside conversion patterns, and when used with the dialect conversion infrastructure (i.e. applyPartial/FullConversion), replacing an operation with another one does not delete the old operation immediately. Instead, the new operation is inserted next to the original one. This is necessary for several reasons, in particular for the conversion to be reversible and for type conversion purposes. Depending on the entire conversion being successful or not, either the original or the replacement operation will be actually erased at the end of the conversion process. As a result, functions with the same name may co-exist in the module when you construct a symbol table inside a pattern in case another pattern has previously “replaced” functions, which is what likely happens in your case.

I don’t immediately have a good suggestion for you on how to proceed. One possibility is to create a SybmolTable instance before running the conversion, pass a reference to the table to all relevant patterns, and use it to update the table in all of them. This will lead to newly created functions having different names than original functions and you’ll need some cleanup to rename them back after the conversion completes.
Another possibility is to split the conversion into two separate calls to the infrastructure: one that converts functions and another that produces symbols. The IR can be temporarily invalid within one pass, the difficulty here is correctly setting up the operation legality in the conversion targets for both calls.

Hope this helps.

1 Like

Thanks, that helps a lot!

I created a SymbolTable within the LoweringPass.runOnOperation() and it seems to work.
(Use SymbolTable instead of static counter · tali/sclang@32b6b23 · GitHub).

How are the patterns applied within this pass?
The matchAndRewrite function is const. This way it is guaranteed that it can be called concurrently from separate threads. Is MLIR doing this? Does it work when the pattern contains a reference to the SymbolTable, which may then be changed concurrently through several patterns?

Roughly, blocks are ordered topologically and their operations are traversed in textual order, operation regions are visited recursively. There is no parallelization at the pattern level AFAIK and I don’t think matchAndRewrite is const for parallelism purposes.

MLIR does run function passes in parallel though and one must not modify anything above individual function (e.g., the parent module) in such passes.

1 Like