More work needed for TypeID duplication (help wanted)

(forked from a discussion that started specific to python: Checking types between different python extensions - #33 by stellaraccident)

Mehdi did some great work in ⚙ D105903 Emit strong definition for TypeID storage in Op definition (WIP) to begin to give TypeIDs a strong-linked home for Attributes, Types, Dialects and Operations, which has started to chip away at our long standing problems using MLIR in combination with dynamic linking (tl;dr - some people have no problems but it is really easy to break with symptoms of TypeID mismatches across shared libraries).

However, there are still issues. See a recent dump here (ignore IREE symbols - this just happened to be from that project). This is indicative that OpTraits, Interface ids and a handful of other things are still being emitted in the fragile way. This isn’t just academic: I was able to trivially reproduce TypeID mismatches on operator terminator detection in an MLIR-HLO build with -DBUILD_SHARED_LIBS=ON linked to mlir-hlo-opt without anything special going on.

I know this kind of thing can be intimidating, but you don’t need to understand the mechanic in order to contribute fixes to any of the above. I think we should be shooting for a world where we remove the ability to have weak linked type ids from TypeID.h, and to do that, we need to eliminate some more of these classes of uses, following the template of the above patch. I can verify that strong linking, where we have done it solves the problems.

Patches welcome :slight_smile:

2 Likes

I’ll also point at the strong linking the TypeIDs makes it a lot harder to accidentally forgot LINK_LIBS on things you should be depending on but are getting away without because of the weak/header-only use. Where we’ve done it, this style has helped highlight unambigously where such issues exist as link errors.

1 Like

Thanks Stella and Mehdi for all your hard work sorting this out. This indeed does seem like a good project for community contribution.

Not to side track the discussion, but would solving all these issues enable the Python bindings to link directly to MLIR C++ rather than going through the CAPI? That’s very attractive to me. (Just saw this post.)

Conditionally. I believe this is the primary outstanding issue keeping that from happening on Posix. And making sure that every symbol has a single .obj home is straight-up required on PE/Windows. I will just caution that dynamic C++ linking is a minefield, generally. The upstream layering in this way, while costly, is good for avoiding weird pitfalls and it puts Python bindings at the same level of ~every other language (which typically are going to want/require a C API to build against vs a C++ one).

With that said, I think there are a lot of prototypey reasons to want an easier path, and I do think that leaving the option open for downstreams who want to go that way wouldn’t be be bad. I just don’t want to personally support it – on the best of days, such things are really fragile.

Yeah, that’s what I thought (since you’ve mentioned the c++ dynamic linking minefield in the past). Needing to go through the CAPI defeats some of the attractiveness of pybind11, especially for random C++ code which just happens to be in the CIRCT code base since it internally uses MLIR constructs.

Right behind the TypeID issues are RTTI/exception interop issues. The only sure fire way is to build the whole project with the same flags – which upstream isn’t going to do (i.e. exceptions/rtti enabled). But if a downstream, specific implementation wanted to, I think having the option for more direct linking is fine.

In case if you haven’t noticed, C++ is not what I would call great.

1 Like

Ugh… I keep forgetting that llvm compiles without those. This is actually something which has bitten me in the past. I think it’s fine for my use case to compile with RTTI and exceptions disabled… assuming pybind11 supports it. The C++ is literally just wrapping CIRCT code.

That’s putting it lightly. One of my best friends is on the C++ standard committee. Every time I run into something like this, I give him shit about it. I’ve given him a lot of it over the years.

C++ is a beast, but to be fair: for something like TypeID, C++ is quite limited in what is offered by the various platforms: shared libraries / DSO stuff is just hard. I think C++ is paying the price of being “native” to each platforms: everything is built using the low-level bits of the system, whereas something like Java entirely abstract the platform with this kind of things.

Pybind requires both RTTI and Exceptions to be enabled. Which then, unless if you like pain, means that the whole LLVM project should be built that way.

Yeah, all of the stuff at this intersection is just painfully the reult of a lot of semi-convergent evolution… and is just a mess of sharp edges.

Well, C++ does have RTTI, which could implement TypeID, I think… It seems that the problem is that it also implies a bunch of other things with significant cost and always gets applied to all classes. Essentially we want RTTI only for some classes…

Good point, I wonder how RTTI works cross-DSO with hidden visibility? (does it?).

On windows I think someone told me it falls back to string comparisons of some mangled names?

It doesn’t honestly. C++ RTTI is fraught with peril, and isn’t any better (in general). Most implementations generally fall back to string name (which two different classes can share) in the case of mismatch failure.

– River

+1 to what River says. The way that RTTI works across different std libs and platforms is… interesting (if you like watching how things all evolve without much of a plan to a point of meta-stability). Several of the implementations are at least as fragile as what we are dealing with with TypeID stuff, and it is not uncommon for projects that lean in to that to have to ifdef around specific C++ standard library versions, etc.