Strange Swift issues with Dialect Registration Hooks

This one seems really bizarre, so I wouldn’t be surprised if the answer is just 🤷, but I’ve started getting an intermittent build failure in the Swift bindings since I updated MLIR last. I’m betting this is a Swift bug (and I filed a ticket here), but figured it couldn’t hurt to ask here. Basically, my project has three targets that import MLIR headers: CMLIR and CSCF and CStandard. The former imports only core IR functionality, while the latter two import dialect-specific headers. When compiling any two of these targets, I get intermittent failures that are some variation on the following (about 50% of the time the build succeeds with no issues):

.../include/mlir-c/Registration.h:61:8: error: 'MlirDialectRegistrationHooks' has different definitions in different modules; definition in module 'CSCF' is here
struct MlirDialectRegistrationHooks {
       ^
.../include/mlir-c/Registration.h:61:8: note: definition in module 'CMLIR' is here
struct MlirDialectRegistrationHooks {
       ^

This seems fairly strange to me. There are tons of other structs defined in header files that have been working just fine, and this never happens with anything other than MlirDialectRegistrationHooks. As an added bonus, I’ve only been able to reproduce this on macOS Big Sur, but not Ubuntu 20.04. I’m not even really sure what emits that error message, the closes thing I found was this test file in LLVM.

Is there something I’m missing that might make MlirDialectRegistrationHooks special?

That is really bizarre - no ideas from me.

Sorry no suggestion to help: that looks like an issue with the Swift importer to me…

I haven’t diagnosed the root issue, but removing any reference to MlirDialectRegistrationHooks seems to have reliably fixed the issue. I’ve added a slightly-nasty workaround in MLIRSwift to accomplish this. My best guess is that clang uses some kind of hashing scheme to determine when two types defined in separate modules are actually the same type, and that scheme nondeterministically fails for MlirDialectRegistrationHooks for some reason.

I’ve done some more digging and here is my completely uneducated guess (emphasis on “completely”, “uneducated” and “guess”) as to what is going on:

If you have two clang modules, ModuleA and ModuleB, they might reference the same type SameStruct defined in some header file. In order for these two modules to interoperate (either in C or in a higher-level language which leverages clang modules, like Swift), you would need to asses whether or not ModuleA.SameStruct and ModuleB.SameStruct are indeed the same struct (they can be different if, for instance, there is a #define set differently in the two modules). Clang uses a hash of the struct structure in order to determine this. Unfortunately, it seems like there is a nondeterminism or other bug which causes structs with function pointers (again, my best guess) to sometimes hash improperly, and this is what what happening with MlirDialectRegistrationHooks.

Now, I’ve put together a patch (⚙ D96229 [MLIR] Replace dialect registration hooks with dialect handle) which fixes this for me, though I think even outside of the context of this issue the patch is a good change, as it allows consumers of the C API to not depend on the actual structure of MlirDialectRegistrationHooks.

I don’t love not really understanding the root cause, but glancing at the patch, it seems reasonable either way. Afk, but will review in detail tmw.

(Also, sorry something that should have been simple was so costly – super annoying)

Yeah, to be clear I think we should evaluate the patch outside of the context of this issue. And no worries, I’m pretty sure dealing with seemingly innocuous things that end up causing outsized problems is what we do for a living :slight_smile: