Recent improvements to the IR parser

I’d like to share a couple of improvements to the (textual) IR parser I have made recently, as they may be useful to other people as well.

Intrinsics no longer require declarations

To use an intrinsic like call i64 @llvm.ctpop.i64(i64 %value) it was previously necessary to add a declaration like declare i64 @llvm.ctpop.i64(i64) as well.

This is no longer necessary, and the IR parser will automatically add the declaration. This makes it a bit easier to write IR tests and alive2 proofs.

Unnamed values no longer have to be consecutive

It’s now allowed to write something like this:

define i32 @test(i32 %0, i32 %1) {
  %3 = add i32 %0, 1
  %5 = add i32 %3, 1
  ret i32 %5
}

Note that the value %4 has been skipped. This means you can now easily delete unnamed instructions in the IR, without having to renumber all the following instructions by hand. (This is also supported for unnamed globals.)

Parsing of incomplete IR

opt supports a new -allow-incomplete-ir option, which will drop references to undefined metadata and try to insert missing global declarations.

For example

define i32 @test() {
  %ptr = call ptr @foo()
  %v = load i32, ptr %ptr, !tbaa !2
  ret i32 %v 
}

will be accepted and converted into

define i32 @test() {
  %ptr = call ptr @foo()
  %v = load i32, ptr %ptr, align 4
  ret i32 %v
}

declare ptr @foo()

if you pass -allow-incomplete-ir.

This makes it easier to work with the output of -print-after-all and similar, which contains dumps of individual functions only, without references declarations. Of course, this is all done on a best-effort basis only.

40 Likes

These may be the 3 greatest changes to LLVM ever made.

1 Like

Thanks for working on this, this is awesome!

It could be exceeded by getting rid of the float-constant-encoded-as-double thing

I might not go quite that far, but the first and third would have saved me considerable headaches when first dealing with Alive2 unsoundness reports.

Very Uesful Feature !!! :smiley:

I added -allow-incomplete-ir to my latest Alive2 documentation PR (Hints for uniquing unsoundness reports by FlashSheridan · Pull Request #1016 · AliveToolkit/alive2 · GitHub), so that others won’t have to go through what I did. Thanks again.

I’ve noticed an odd behaviour that might be related to this. The following IR fails to parse with error: use of undefined value '@llvm.ptrmask':

define void @ptrmask() {
  call ptr addrspace(1) @llvm.ptrmask(ptr addrspace(1) poison, i64 poison)
  call ptr addrspace(2) @llvm.ptrmask(ptr addrspace(2) poison, i64 poison)
  ret void
}

But if I comment out either of the calls to ptrmask, the whole IR file parses just fine.

I can fix it by adding explicit type-mangling suffixes to the intrinsic names, llvm.ptrmask.p1.i64 and llvm.ptrmask.p2.i64. But the error above still seem strange, and strangely worded.

Yeah, this is sort of related. The fact that you can use llvm.ptrmask instead of llvm.ptrmask.p1.i64 if there is just one signature is due to intrinsic auto-upgrade – this will trigger a remangling upgrade that adds the suffix. That’s pre-existing behavior.

Now, what is new is that the intrinsic declaration will only be inserted if all calls use the same type, as otherwise the IR is ill-formed anyway. This results in the undefined value error message here.

I see two ways to improve this: The first would be to just add an error message specific to this case, that mentions that an intrinsic was used with two different signatures.

The other one would be to actually support this. I have always understood the ability to call suffix-free intrinsics as an accident that falls out from general intrinsic auto-upgrade support, rather than an intentional feature. But we could turn this around and explicitly support it, including for the case where the “same” intrinsic is used with different signatures.

So in your example, we’d automatically convert the IR into this:

define void @ptrmask() {
  call ptr addrspace(1) @llvm.ptrmask.p1.i64(ptr addrspace(1) poison, i64 poison)
  call ptr addrspace(2) @llvm.ptrmask.p2.i64(ptr addrspace(2) poison, i64 poison)
  ret void
}

declare ptr addrspace(1) @llvm.ptrmask.p1.i64(ptr addrspace(1), i64)
declare ptr addrspace(2) @llvm.ptrmask.p2.i64(ptr addrspace(2), i64)

What do folks think about that? Is there any good reason why we force people to write out the mangling suffix, even though it is fully derivable from the call type?

I don’t think so, and I’d be quite happy for this to change. At least to me, the mangling just feels like noise that makes the IR more difficult to read and write.

FWIW, in llvm-dialects I decided to define operations (“intrinsics”) as varargs if the arguments are overloaded, so mangling is only used for overloaded return types. So in the example, you’d end up with:

define void @ptrmask() {
  call ptr addrspace(1) (...) @llvm.ptrmask.p1(ptr addrspace(1) poison, i64 poison)
  call ptr addrspace(2) (...) @llvm.ptrmask.p2(ptr addrspace(2) poison, i64 poison)
  ret void
}

declare ptr addrspace(1) @llvm.ptrmask.p1(...)
declare ptr addrspace(2) @llvm.ptrmask.p2(...)

This works pretty well for reducing noise without changing LLVM itself. (The mangling by return type is unavoidable given current rules, but perhaps those rules could be changed for intrinsics.)

The pre-existing behavior allows you to omit the suffix, or even use a completely incorrect suffix, right? That makes it seem even more pointless to ever “force people to write out [a] mangling suffix”.

I would love this. It seems really ergonomic and I don’t see any downsides.

1 Like

Okay, I went ahead and implemented this here: [AsmParser] Support calling intrinsics without mangling suffix by nikic · Pull Request #89172 · llvm/llvm-project · GitHub

A big downside of that is that you can no longer have attributes on the intrinsic arguments, which is quite important for intrinsics…

1 Like