What’s the proper way to use asm
for PTX mma in LLVM IR?
I tried to add mma instruction which talked here using asm
in llvm IR, like the code below.
However the intrinsic part appearently has something wrong since it generated PTX code like:
//begin inline asm
mma.sync.aligned.m16n8k8.row.col.f16.f16.f16.f16 {%0,%1},{%2,%3},{%4},{%5,%6};
// end inline asm`
It seems the registers are not subsituted into %0
, %1
…
I am not sure whether I return the proper data type of this kind of multiple return value instruction. I made it a vector of <2 x i16>
. There is no explaination in the document.
define <2 x i16> @thefunction(<2 x i16> %0, <1 x i16> %1, <2 x i16> %2) {
BB:
%a0 = extractelement <2 x i16> %0, i32 0
%a1 = extractelement <2 x i16> %0, i32 1
%b0 = extractelement <1 x i16> %1, i32 0
%c0 = extractelement <2 x i16> %2, i32 0
%c1 = extractelement <2 x i16> %2, i32 1
%value = call <2 x i16> asm "mma.sync.aligned.m16n8k8.row.col.f16.f16.f16.f16 {%0,%1},{%2,%3},{%4},{%5,%6};", "=r,=r,r,r,r,r,r"(i16 %a0, i16 %a1, i16 %b0, i16 %c0, i16 %c1)
ret <2 x i16> %value
}