RFC for omp.target construct

This post is aimed at starting a discussion for the design and implementation of the target operation in the OpenMP dialect('omp' Dialect - MLIR). The specification of the target construct can be found in Section 2.12 of the OpenMP 5.0 specification document.
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf.

If one looks at the earlier OpenMP construct implementations for the OpenMP dialect, the omp.parallel operation is very close and can be used as a template for the target operation.

https://mlir.llvm.org/docs/Dialects/OpenMPDialect/#ompparallel-mlirompparallelop

The current effort for OpenMP dialect is towards Flang and OpenMP target support has not been implemented in the front end yet. The following path is being adopted by the community to lower the OpenMP support to LLVM IR for Flang.

[Fortran code] → Parser → [AST] → Lowering → [FIR + OpenMP MLIR] → Conversion → [LLVM + OpenMP MLIR] → Translation (Use OpenMPIRBuilder) → [LLVM IR]

OpenMPIRBuilder does not support the OpenMP target construct yet. Based on the current support, one possible path to implementing the target operation is:

[C code] → Parser → [AST] → Lowering → [OpenMP MLIR] → Conversion → [LLVM + OpenMP MLIR] → [LLVM IR]

Since MLIR operations are front-end agnostic, the implementation can be used by the Flang front end when it has the target support.

Regarding design issues:

  1. Representation of the target Construct: The construct can be represented by an operation (omp.target) with one region.
  2. Representation of the clauses: It should be similar to the omp.parallel operations
    a. [OpenMP] Parallel Operation design issues
    b. 'omp' Dialect - MLIR

omp.target (::mlir::omp::TargetOp)

target construct

Map variables to a device data environment and execute the construct on that device

The $if_expr_var parameter specifies a Boolean result of a conditional check. If this value is 0, the target region is executed by the host device in the host data environment.

The $num_device_var specifies the device ID which is a non-negative integer value less than the value of omp_get_num_device().

The $default_map_val attribute specifies the mapping rule from firstprivate to tofrom for scalar variables.

The $private_vars and $firstprivate_vars parameters are a variadic list of values that specify the data-sharing attribute of those values.

The $map_var attribute specifies the data variables in the list to be explicitly mapped from the original variables in the host data environment to the corresponding variables in the device data environment of the device specified by the construct.

The $in_reduction_var specifies target task variables that are subject to reduction operation at the end of the regions. The GPU dialect has a reduction clause.

The $is_device_ptr_var indicates the data variables that are device pointers that exist within the device data environment.

The $depend_var establishes scheduling dependencies between the target task and sibling tasks that share a list of variables.

The $allocate_var specifies an integer expression of omp_allocator_kind.

The $uses_allocators_var specifies $allocate_var available in the target region.

The $ancestor_var specifies the parent device.

The $device_num_var indicates the device number.

Thanks @abidmalikwaterloo for this RFC. Great to see interest and progress in the OpenMP target side.

For the host-side constructs we are using the OpenMP IRBuilder during translation (LLVM dialect → LLVM IR) for lowering the OpenMP operations. As you might know, @clementval is working on the OpenACC dialect which lowers by making calls to the OpenMP runtime. The initial plan is to lower the data mapping constructs like enter_data which is similar to omp target enter data map. I believe Valentin is planning to implement it at the translation phase and use the OpenMP IRBuilder. Would you like to join efforts with Valentin and implement the data mapping constructs first and also implement using the OpenMP IRBuilder?

The representation of if, device and nowait clauses is straightforward. We are currently reviewing the way data-sharing clauses are being represented and handled in the OpenMP dialect. It might be good to wait till we have clarity on these clauses (private, firstprivate, in_reduction). It might also be good to spend some time thinking about the representation of the other clauses (depend, map, alloc, is_device_ptr). Like what should be the representation of dependencies? Should this have a specific type or can be anytype? The async dialect has a value and token-type and there is an operation async.await ('async' Dialect - MLIR).

FYI: @jdoerfert @clementval @kirankumartp @SouraVX

@kiranchandramohan Thank you, Kiran for the comments and feedback. I am already in contact with Valentin for the target construct and how the two dialects’ operations and paths can benefit from each other. Data mapping clauses seem the right first step in this direction. I am thinking of implementing the basic omp.target operation in MLIR first. By then the data mapping clause for OpenACC dialect will be done.

Thanks for the RFC! I have several comments.

The largest concern is typing. MLIR does not have the concept of a “variable” and at least half of the proposal uses it. Please clarify how OpenMP concepts map to MLIR concepts, namely the relation between variables and immutable SSA values as well as their types.

The second-largest is the relation with OpenMPIRBuilder. While I myself won’t oppose having a temporary flow that emits LLVM IR directly in the OpenMP dialect translation, we need a clear path and consensus on moving that functionality to OpenMPIRBuilder as soon as possible. Otherwise we create a double load on the maintainers of the translation who will have to understand and support both the custom translation code and OpenMPIRBuilder code. The alternative is to emit all of LLVM IR directly from MLIR (using an openmp-to-llvm dialect conversion, rather than translation) bypassing OpenMPIRBuilder, but this is against the current consensus. But please let’s not mix the two in the long run.

Global naming comment: I would consider dropping the _var suffix from op arguments unless it means something specific. For example, I can understand why private_vars has the suffix – it lists the equivalent of private variables, – but not why ancestor_var has it – it is merely the identifier of a device, which isn’t even mutable.

Which types are supported? There’s no “Boolean” in upstream MLIR.

Can this be an SSA value or is it always a constant (at which point it should be an attribute)?
omp_gen_num_device does not exist in MLIR.
What types are supported here?

There are no variables in MLIR.
I suppose the “implicit-behavior” part of the clause is specified as an enum attribute, please clarify if it is something else.
It is unclear to me how the “variable-category” part is represented here and how it maps to the MLIR’s open type system, i.e. what qualifies as “scalar”, “aggregate”, “allocatable” and “pointer” and how these are made available to MLIR types other than those representing Fortran types.

Again, there are no variables in MLIR. What happens when somebody wants to map a variable “from” a device to host. Do we define a new SSA value that is return from omp.target? Do we expect a pointer-like (define what pointer-like types are?) value as argument and modify the memory it points to? Something else?

There is no support for reductions in the OpenMP dialect so it’s unclear how this can identify reductions, where the results are stored (remember, no variables), etc.

The fact that GPU dialect has a reduction operation doesn’t sound relevant here. Several MLIR dialects have reduction-like constructs, including Affine, Linalg and SCF.

Can’t we just define such pointers in the region of omp.target?

I suppose this is another enum attribute, please clarify.

Specifies how? Another integer number?

The largest concern is typing. MLIR does not have the concept of a “variable” and at least half of the proposal uses it. Please clarify how OpenMP concepts map to MLIR concepts, namely the relation between variables and immutable SSA values as well as their types.

---------->

Thank you for the detailed reply and feed back. First, I misused the terminology . I wanted to say “operands” instead of variables. I was using the following discussion as a reference for this operation:

We can consider “variadic operands” for this operation as well.

----------------->

The second-largest is the relation with OpenMPIRBuilder. While I myself won’t oppose having a temporary flow that emits LLVM IR directly in the OpenMP dialect translation, we need a clear path and consensus on moving that functionality to OpenMPIRBuilder as soon as possible. Otherwise we create a double load on the maintainers of the translation who will have to understand and support both the custom translation code and OpenMPIRBuilder code. The alternative is to emit all of LLVM IR directly from MLIR (using an openmp-to-llvm dialect conversion, rather than translation) bypassing OpenMPIRBuilder, but this is against the current consensus. But please let’s not mix the two in the long run.

----->

I brought up this discussion because OpenMPIRBuilder is still under development and does not fully support the target construct. However, I fully agree that we should not bypass the path that includes OpenMPIRBuilder. Kiran also suggested the same. Since data scope operations for OpenACC dialect will be supported soon, we can start by using them first.

---->

For the following comments, the main confusion arises because of the term “variable” used in the discussion. As I mentioned earlier, these are operands. I am reproducing your comments here for the parallel operation which are also true for this operation:

“More importantly, OpenMP lives in core MLIR and should not be anyhow bound to FIR. I would suggest these clauses to accept any type since it cannot know whether a type defined in an external dialect has reference semantics (unless we implement something like type interfaces).”

I think it makes sense to make operands “anytype”. However, as kiran mentioned we need to think and discuss before adopting it for every operand.

Kiran comments

“The representation of if, device and nowait clauses is straightforward.”

—>These operands can be “anytype”.

“We are currently reviewing the way data-sharing clauses are being represented and handled in the OpenMP dialect. It might be good to wait till we have clarity on these clauses (private, firstprivate, in_reduction).”

—>I will wait . Although “anytype” would work??? I would like to consult the group before moving ahead.

“ It might also be good to spend some time thinking about the representation of the other clauses (depend, map, alloc, is_device_ptr). Like what should be the representation of dependencies? Should this have a specific type or can it be anytype? The async dialect has a value and token-type and there is an operation async.await (‘async’ Dialect - MLIR 1).

—> I am not sure if there is a thread that discusses this already. I would like to see where the discussion is heading.

Global naming comment: I would consider dropping the _var suffix from op arguments unless it means something specific. For example, I can understand why private_vars has the suffix – it lists the equivalent of private variables, – but not why ancestor_var has it – it is merely the identifier of a device, which isn’t even mutable.

----> Thanks. I will do that.

abidmalikwaterloo:

The $if_expr_var parameter specifies a Boolean result of a conditional check.

Which types are supported? There’s no “Boolean” in upstream MLIR.

—> “anytype”

abidmalikwaterloo:

The $num_device_var specifies the device ID which is a non-negative integer value less than the value of omp_get_num_device().

Can this be an SSA value or is it always a constant (at which point it should be an attribute)?

omp_gen_num_device does not exist in MLIR.

What types are supported here?

----> device num is an integer type but it is not constant. omp_gen_num_device() is a runtime routine. It will not be called within the operation. OpenACC has the same service. Would like to see how OpenACC dialect is handling it. As far as I know acc.parallel does not support acc_set_device_num().

abidmalikwaterloo:

The $default_map_val attribute specifies the mapping rule from firstprivate to tofrom for scalar variables.

There are no variables in MLIR.

I suppose the “implicit-behavior” part of the clause is specified as an enum attribute, please clarify if it is something else.

It is unclear to me how the “variable-category” part is represented here and how it maps to the MLIR’s open type system, i.e. what qualifies as “scalar”, “aggregate”, “allocatable” and “pointer” and how these are made available to MLIR types other than those representing Fortran types.

—> it makes sense to treat it as::mlir::StringAttr

abidmalikwaterloo:

The $map_var attribute specifies the data variables in the list to be explicitly mapped from the original variables in the host data environment to the corresponding variables in the device data environment of the device specified by the construct.

Again, there are no variables in MLIR. What happens when somebody wants to map a variable “from” a device to host. Do we define a new SSA value that is return from omp.target? Do we expect a pointer-like (define what pointer-like types are?) value as argument and modify the memory it points to? Something else?

—> As suggested by kiran we need more thinking before finalizing it.

abidmalikwaterloo:

The $in_reduction_var specifies target task variables that are subject to reduction operation at the end of the regions. The GPU dialect has a reduction clause.

There is no support for reductions in the OpenMP dialect so it’s unclear how this can identify reductions, where the results are stored (remember, no variables), etc.

The fact that GPU dialect has a reduction operation doesn’t sound relevant here. Several MLIR dialects have reduction-like constructs, including Affine, Linalg and SCF.

—> Agreed. Currently it is not supported for parallel operation as well. I would defer this as suggested by Kiran

abidmalikwaterloo:

The $is_device_ptr_var indicates the data variables that are device pointers that exist within the device data environment.

Can’t we just define such pointers in the region of omp.target?

—> As suggested by kiran we need more thinking before finalizing it.

abidmalikwaterloo:

The $allocate_var specifies an integer expression of omp_allocator_kind.

I suppose this is another enum attribute, please clarify.

—> Your observation is correct but , we need more thinking before finalizing it.

abidmalikwaterloo:

The $ancestor_var specifies the parent device.

Specifies how? Another integer number?

—> It’s an integer type.