[RFC] Add RISC-V Vector Extension (RVV) Dialect

Thanks for the information!

Yes, it’s weird. It seems that the CPU is not detected in QEMU. If I run the lli on the X86 side, it can show that the host CPU is cascadelake.

LLVM (http://llvm.org/):
  LLVM version 14.0.0git
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: cascadelake

Apart from that, only the default target can be detected. With setting the -DLLVM_TARGETS_TO_BUILD="X86;RISCV", I would expect there are registered targets:

Registered Targets:
    riscv32 - 32-bit RISC-V
    riscv64 - 64-bit RISC-V

But these targets never appeared, now I’m not sure if these registered targets are set by the LLVM_TARGETS_TO_BUILD.

try to build newlib vesion of the riscv-tool-chain
change “riscv64-unknown-linux-gnu-xxxxx” → “riscv64-unknown-elf-gnu-xxxxx”
checkout this one with newlib:

cmake -G Ninja -DCMAKE_BUILD_TYPE="Debug" \
  -DBUILD_SHARED_LIBS=True -DLLVM_USE_SPLIT_DWARF=True \
  -DLLVM_OPTIMIZED_TABLEGEN=True \
  -DLLVM_BUILD_TESTS=True \
  -DDEFAULT_SYSROOT="/path/to/riscv-gcc-install-path/riscv32-unknown-elf" \
  -DGCC_INSTALL_PREFIX="/path/to/riscv-gcc-install-path" \
  -DLLVM_DEFAULT_TARGET_TRIPLE="riscv32-unknown-elf" \
  -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="RISCV" ../
cmake --build .

or with this flag

 -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="RISCV"

Thanks! I will have a try. But before this, I have a question about this toolchain. As far as I know, the newlib version is used for the embedded systems, so it seems unreasonable to use this one. If you make it work, could you figure out why the newlib version toolchain can work, but the linux version cannot.

Sorry,
The low-riscv’s solution doesn’t work either, but llvm/tool/lli is related to llvm/jit. It seems that the jitlink for risc-v is still under development (rG0ad562b48bfd). And after checking the building system currently llvm does not support jit_target for risc-v. llvm/CMakeList.txt. llvm/tool/lli/CMakeLists.txt

And here’s the commit (Supports building with a list of targets that does not contain · llvm/llvm-project@465f936 · GitHub)

Update

  • Naming (RVV → RISCVV)
  • Scalable Vector Type (Common Scalable Vector Type + Separate RISC-V Scalable Vector Type)

The first RISCVV dialect patch is ready for review now.

Naming

@River707 gives a good suggestion (in the previous review message) for the naming. “RVV” seems too general as the dialect name. RISC-V is an extensible ISA, so the idea is to combine the prefix of the architecture name (RISCV/riscv) and the abbreviation of the extension name (V/v). Now I use “RISCVV” and “riscvv” to name files, functions, namespaces, dialect, etc.

Scalable Vector Type

The previous introduction shows a mapping relationship between the LLVM IR vector type and the RVV vector features (LMUL and SEW). In the initial version, I used the same strategy with the LLVM IR vector type. Although it is easy to implement, it provides bad semantics to higher-level dialects. Like @ftynse said that a separate vector type is better.

After further thinking, I implement a RISCVV-specific vector type like this.

!riscvv.vector<!riscvv.m4,i32>

!riscvv.m4 means LMUL = 4, i32 means SEW = 32. The RISCVV vector type depends on LMUL and SEW settings, so the main idea is to expose these two settings on the vector type. After this, it needs a type mapping process to make sure that the semantics are lowered correctly. Specifically, there are the following changes:

  • Lift Scalable Vector Type to Vector Dialect

The previous version redefined the scalable vector type, which is the same as the SVE side. @ftynse and @aartbik give the suggestion (in the previous review message) that lifts the definition to a proper place.

Now I lift part of the definition (the scalable vector type tablegen class) to vector dialect. In this case, different dialects-specific scalable vector types can be derived from the same source. And these scalable vector types can have different definitions (parameters, parser, printer, etc.) to provide different semantics.

  • Define RISCVV LMUL Type, RISCVV Mask Type, and RISCVV Vector Type

The SEW setting can be inferred directly from the element type. For example, as the element type, i64 means SEW = 64, i32 means SEW = 32, and so on.

Element Type SEW Setting
i64 64
i32 32
i16 16
i8 8

Unlike SEW, the LMUL type cannot be directly expressed by the built-in type because RISCVV supports fractions in the LMUL setting. Although the fractional LMUL values are not in the “must support” list (“Implementations must support LMUL integer values of 1, 2, 4, 8”), it is an important feature for performance in mixed-width values cases. According to the RISCVV specification, “fractional LMUL is used to increase the number of usable architectural registers when operating on mixed-width values, by not requiring that larger-width vectors occupy multiple vector registers.” I thus implement an LMUL type to provide better support for the fractional setting.

LMUL Type LMUL Setting
!riscvv.mf8 1/8
!riscvv.mf4 1/4
!riscvv.mf2 1/2
!riscvv.m1 1
!riscvv.m2 2
!riscvv.m4 4
!riscvv.m8 8

According to LMUL and SEW types, the mask type can be determined. The ratio (SEW/LMUL) is the size for the mask type. I also define an RISCVV mask type to provide better semantics.

Mask Type SEW/LMUL
!riscvv.mask1 1
!riscvv.mask2 2
!riscvv.mask4 4
!riscvv.mask8 8
!riscvv.mask16 16
!riscvv.mask32 32
!riscvv.mask64 64

For example, when the LMUL type is !rvv.m4 and SEW type is i32, the mask type will be !rvv.mask8.

As for the RISCVV scalable vector type, it takes the LMUL/Mask type and SEW type as parameters, and it also uses the “vector” keyword to be consistent with the SVE side. I use the same example in my first post; if we want four vector registers to be a group to deal with the i32 element type.

Previous version:

!rvv.vector<8xi32> (mask type: !rvv.vector<8xi1>)

Current version.

!riscvv.vector<!riscvv.m4, i32> (mask type: !riscvv.vector<!riscvv.mask8, i1>)

Obviously, the current version provides better semantics for users, people now can put the spec down when they use the RISCVV dialect.

  • Implement Type Mapping Process

The price of better semantics is the type syntax gap between the RISCVV dialect and the LLVM dialect. I thus add a type mapping process according to the mapping table. In this case, the users of the RISCVV dialect do not need to consider the type mapping. The lowering pass will handle the type mapping, and people only need to determine what data type and how many registers are used to form a register group.

For the above example, when we use the -convert-vector-to-llvm="enable-riscvv" option, the type mapping process will be triggered.

RISCVV Scalable Vector Type:

!riscvv.vector<!riscvv.m4, i32> (mask type: !riscvv.vector<!riscvv.mask8, i1>)

LLVM Scalable Vector Type:

!llvm.vec<? x 8 x i32> (mask type: !llvm.vec<? x 8 x i1>)

Example

After changing the naming and adding the new types, the example should be changed accordingly.
The latest code of the example can be found here. The compilation path is the same with my first post, don’t forget to add option -reconcile-unrealized-casts for current mlir-opt.

Integration Test

As for the integration test, I tried to run the cross-compiled lli and mlir-cpu-runner with QEMU, but both of them reported that unable to find the target machine. As shown in the example, the AOT method can work well with QEMU, which proves the correctness. But I am not sure if JIT now supports running in RISC-V QEMU. So I think we can just include the unit tests in the first patch and leave the integration tests for future exploration.

FYI - The RISC-V vector extension v1.0 has been frozen for public review.

Hi Hongbin,

I’m the author of the ArmSVE dialect and, obviously, I’m also interested in scalable vectorization in MLIR and I’ll be happy to discuss the topic and solutions :slight_smile: I see you’ve based this vector dialect on my own and I thought I should give you a couple of warnings about the decisions I made and their implications, since you have implicitly accepted them for RVV.

First of all, the obvious one is that the dialect is quite disconnected from the rest of the infrastructure. It works as a back-end to generate scalable vector code, but none of the existing transformations will work with it. Adapting existing passes & ops to work with fixed-length and scalable-length, even when possible, is not trivial. But, as is, you can’t even do that without making those passes dependent on a close-to-hardware back-end dialect (be it RVV or SVE).

I went this way because it was the fastest, easiest, least intrusive way to get started with scalable vectors, but I think we should start thinking about how to promote scalable vectors to a built in type. There are a bunch of arithmetic and comparison ops that are there as a workaround, simply because the ones in StandardOps won’t accept scalable vectors as operands (again, without making them dependent on a back-end dialect), but all of those are unnecessary and should to go if scalable vectors become a built in type.

This means that there’s a lot of work left to do on the dialect from a maintenance point of view, work that requires a long-term commitment. Correct me if I am wrong but I believe you’re doing this work as part of an internship, are there any stakeholders on your side who can commit to “inherit” the responsibility once you’ve finished? It might be worth reaching out to people in industry and public research institutions with long-term interest in RISC-V Vector, it looks like the extension is ready to leave the “Draft” state, there should be a few.

That aside, I’ll be happy to discuss and collaborate with you on the topic :smiley:

2 Likes

Looping in @clattner @topperc

Hi Javier,

Thanks for your reply! I am very willing to discuss this topic with you :grin:

Now, I am a PhD student in the PLCT Lab, ISCAS (The Institute of Software, Chinese Academy of Sciences). Supporting RVV in MLIR is part of my work, and I am interested in exploring the compilation technology for vector architecture. I have about four years to graduate, and I can contribute to this direction during this time. My laboratory also has plans for continuous contributions. Our lab also has various RVV development experiences, including LLVM RVV backend and OpenCV RVV support. Apart from that, we also have project on the LFX platform to attract more contributors to explore how to make good use of RVV.

Hi javiersetoain ,

That’s true. Implementing and maintaining RVV dialect is a long-term project, and one contributor cannot get all the things done. This project is supported by the PLCT Lab. Hongbin Zhang is a PhD candidate who is leading the MLIR related projects in the PLCT Lab.

My name is Wei Wu, and I’m the director and co-founder of the PLCT Lab. PLCT has an engineering team with 30+ staff and 50+ students, focusing on compilers, simulators and language virtual machines (VM), and devotes significant effort on the fundamental open source projects especially for RISC-V ecosystem, including GCC, LLVM, OpenCV, V8 and MLIR. The PLCT Lab is also one of the first Development Partners of the RISC-V International, contributing on the implementations of Bitmanip, Krypto Scalar, Zfinx, Zce, and many other unratified specs. We also had maintaining a RVV implementation in LLVM (0.8, 0.9, ~0.10) until early 2021, and merged our efforts with the team from SiFive and EPI/BSC. We contributed the RVV support for OpenCV, which is believed one of the first RVV applications in big open source projects.

The PLCT Lab has several successful stories for continuously contributing and maintaining open source projects. Take OpenCV as an example: our another graduate Yin Zhang had contributed the initial RVV support for OpenCV as a GSoC project in 2020. He becomes an active contributor after his GSoC project. Further more, we now have a new contributor Liutong Han working on extending the RVV support for OpenCV since 2021. Each projects in PLCT has at least one senior staff supervising it. Mingjie Xing is our senior staff who is supervising RISC-V support projects for MLIR, LLVM, and OpenCV.

Feel free to contact me if you have any further concerns on the long term support. :slight_smile:

3 Likes

Brilliant! :smiley: I believe the dialect is in good hands :slight_smile: Once this has landed I’ll reach out to Hongbin, there’s a lot of work to do around scalable vectors outside of backend dialects, we should coordinate :slight_smile:

Thanks for taking the time to answer!

2 Likes

FYI, the type mapping from LMUL and SEW to llvm vscale types all falls apart if VLEN==32 instead of >= 64. We haven’t figured out to address this yet. The implementation defines vscale as VLENB/8, but if VLEN==32 then VLENB==4 and VLENB/8==0. Changing the mapping to support VLEN=32 leaves us no way to encode LMUL=1/8 for SEW=8.

Is the plan to support every RISCV vector operation or just the basic arithmetic, loads, stores, conversions? There is an ongoing effort to add intrinsics versions of basic IR instructions that take a mask and vector length argument. LLVM Language Reference Manual — LLVM 13 documentation It might make sense to target those instead of RISCV vector intrinsics. In theory those are supposed to work on multiple targets.

Thanks for the RFC!
I’m trying to support multi-backend with MLIR, and this could be really helpful!

IMO, supporting all the RVV operations is the ideal state, but we should add frequently used operations first, and then gradually support others. The reason why I only implement the basic arithmetic, loads, stores for the initial patch is that I want to keep the RFC simple to show the basic idea (flexible to modify and change direction) and these operations can build an executable example.

Thanks for informing this! I think this work can help us to create a unified vector abstraction layer in MLIR. I will learn more about the details of this work.

1 Like

Is the plan to support every RISCV vector operation or just the basic arithmetic, loads, stores, conversions?

There is no need to have hw-specific basic arithmetic operations. Standard arithmetic ops on scalable vectors already map neatly to whatever scalable hardware you want to target through LLVM IR. We should only need specific hw instructions for those operations that don’t map cleanly into LLVM IR ones (e.g.: matrix multiply or dot products). If we find ourselves having a 1-1 map between an MLIR dialect and a whole ISA, we’re very likely doing something wrong. 99% of the work will be adapting passes to work with scalable vectors and building new passes to deal with scalable vectorization. These dialects should be just an outlet for specialized instructions. The only reason we need these right now is because MLIR builtin vector types are fixed length only.

There is an ongoing effort to add intrinsics versions of basic IR instructions that take a mask and vector length argument. LLVM Language Reference Manual — LLVM 13 documentation It might make sense to target those instead of RISCV vector intrinsics. In theory those are supposed to work on multiple targets.

Indeed, those are the natural target for all masked vector operations. The reason why “masked” instructions in the Arm SVE dialect map to SVE intrinsics (which ended up replicated in RISC-V Vector) is because something was failing in the instruction selection, I was advised it’s a work in progress, and I decided to work around that. Eventually, similarly to basic arithmetic instructions, masked operations in the Vector dialect should map to masked vector operations in LLVM IR. Whether those are fixed-length vectors or scalable vector, RISC-V or SVE, can be determined by type of the vector operands in the Vector dialect and the target hw in LLVM respectively.

1 Like

Please read and provide feedback on [RFC] Add built-in support for scalable vector types. If that patch or something to that effect gets accepted, it would significantly simplify this change, as well as the approval process for it.

Thank you!
Javier

1 Like

The built-in support will be very helpful for the RVV side. I have replied to your RFC and expressed my thoughts. In general, I think it is challenging to design a unified scalable vector type, and I am very willing to discuss and contribute to this direction :grin:

1 Like

Indeed, I’m counting on that :smiley: Thanks, Hongbin!

I am writing to show the current state of the dialect. This work relies on two ongoing parts.

  • Built-in Scalable Vector Type

We have discussed this part in @javiersetoain’s RFC. After the patch lands, I will replace the current RVV specific type with the built-in scalable vector type.

  • Integration Test

The integration test needs lli or mlir-cpu-runner can work for the RISC-V backend. However, the RuntimeDyld doesn’t support the RISC-V side now. My teammate suggests that we should use JITLink, and we are working to support this. After the JIT supports the RISC-V backend, the problem of integration testing can be solved.

7 Likes

Update

  • Sync to the vector type with scalable dimensions.
  • Set the vta as an attribute.
  • Add setvl operation.
  • Some RISC-V + JIT progress (needed by integration test)

Here is the current RISCVV dialect patch .

Sync to the vector type with scalable dimensions.

According to the previous discussion, I sync the type to the built-in vector type with scalable dimensions.

Set the vta as an attribute.

The llvm intrinsic add vta argument to let users control the tail policy, see the patch for more details. Here I quote the sentences from the patch to show the meaning of the tail agnostic and tail undisturbed:

Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation.

Since the vta parameter is a tail policy option, it is more appropriate to be an attribute in MLIR. And the lowering pass is responsible for converting the attribute into an intrinsic argument.

Add setvl operation.

vsetvli is an useful instruction for RISC-V vector extension to set vector length according to the AVL, SEW, LMUL configurations. RVV uses this to achieve a direct and portable strip-mining approach, which is purposed to handle a large number of elements. The return value of this instruction is the number of elements for a single iteration. In this case, the vsetvli can help with strip-mining for loop iterations, which is different from the SIMD style (using masks for the tail processing).

After adding this operation, we can use strip-mining style loop iterations in MLIR for RVV target. I prepare an example to show this.

https://gist.github.com/zhanghb97/db87cd22d330ba6424b31c70b135b0ca#file-test-rvv-stripmining-mlir

Some RISC-V + JIT progress (needed by integration test)

My teammate has sent some patches hoping to support JIT for the RISC-V side.
Here I quote some sentences of his summary to show the point of the challenge:

In RISCV, temporary symbols will be used to generate dwarf, eh_frame sections…, and will be placed in object code’s symbol table. However, LLVM does not use names on these temporary symbols.

For more details, please see his patches:

https://reviews.llvm.org/D116475

https://reviews.llvm.org/D116794