RFC: OpenMP dialect in MLIR

This post introduces the MLIR dialect for OpenMP. The dialect was briefly discussed in the MLIR mailing list before (link below). The primary user of this dialect will the be Flang/F18 compiler currently under construction. It is hoped that other Frontends can also use this dialect as and when they are ready to use MLIR. The primary target for the dialect is LLVM IR. The design intends to re-use code from Clang for achieving this.
https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw

  • The proposed design for adding OpenMP support to Flang/F18 can be seen in slide 10 of the presentation (link below) given to the Flang/F18 community.
    openmp_design_for_flang.pdf - Google Drive
    The design uses the following two components.
  1. MLIR: Flang/F18 compiler uses the MLIR based FIR dialect as its IR. FIR models the Fortran language portion but does not have a representation for OpenMP constructs. By using MLIR for OpenMP we have a common representation for OpenMP and Fortran constructs in the MLIR framework and thereby take advantage of optimisations and avoid black boxes.
  2. OpenMP IRBuilder: For reusing OpenMP codegen of Clang. The OpenMP IRBuilder project refactors codegen for OpenMP directives from Clang and places them in the LLVM directory. This way both Clang and Flang can share the code. For details see link below.
    http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
  • Current and Proposed Flow in F18/Flang
  1. The current sequential code flow in Flang (Slide 5) of the presentation can be summarised as follows,
    openmp_design_for_flang.pdf - Google Drive
    [Fortran code] -> Parser -> [AST] -> Lowering -> [FIR MLIR] -> Conversion -> [LLVM MLIR] -> Translation -> [LLVM IR]
  2. The modified flow with OpenMP (Slide 10) will have lowering of the AST to a mix of FIR and OpenMP dialects. These are then optimised and finally converted to mix of OpenMP and LLVM MLIR dialects. The mix is translated to LLVM IR using the existing translation library for LLVM MLIR and the OpenMP IRBuilder currently under construction.
    [Fortran code] -> Parser -> [AST] -> Lowering -> [FIR + OpenMP MLIR] -> Conversion -> [LLVM + OpenMP MLIR] -> Translation (Use OpenMP IRBuilder) -> [LLVM IR]
  3. The MLIR infrastructure provides a lot of optimisation passes for loops. It is desirable that we take advantage of some of these. But the LLVM infrastructure also provides several optimisations. So there exist some questions regarding where the optimisations should be carried out. We will decide on which framework to choose only after some experimentation. If we decide that the OpenMP construct (for e.g. collapse) can be handled fully in MLIR and that is the best place to do it (based on experiments) then we will not use the OpenMP IRBuilder for these constructs.
  • OpenMP MLIR dialect
    Operations of the dialect will be a mix of fine and coarse-grained. e.g. Coarse: omp.parallel, omp.target, Fine: omp.flush. The operations in MLIR can have regions, hence there is no need for outlining at the MLIR level. While the detailed design of the dialect is TBD, the next section provides some walkthrough examples which provides a summary of the full flow as well as the use of MLIR operations for OpenMP directives, and attributes for representing clauses which are constant. The proposed plan involves a) lowering F18 AST with OpenMP directly to a mix of OpenMP and FIR dialects. b) converting this finally to a mix of OpenMP and LLVM dialects. This requires that OpenMP dialect can coexist and operate with other dialects. The design is also intended to be modular so that other frontends (C/C++) can reuse the OpenMP dialect in the future.

  • Examples
    A few example walkthroughs were sent before to the flang mailing lists. These walkthroughs illustrate with an example, the flow for a few constructs (parallel, target, collapse, simd). I am including parallel and collapse construct and will leave pointers for the others.
    Example 1: Parallel construct

  1. Example OpenMP code
!Fortran code
!$omp parallel
c = a + b
!$omp end parallel
!More Fortran code>
  1. Parse tree (Copied relevant section from -fdebug-dump-parse-tree)
<Fortran parse tree>

| | ExecutionPartConstruct -> ExecutableConstruct -> OpenMPConstruct -> OpenMPBlockConstruct

| | | OmpBlockDirective -> Directive = Parallel

| | | OmpClauseList ->

| | | Block

| | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> AssignmentStmt

| | | | | Variable -> Designator -> DataRef -> Name = 'c'

| | | | | Expr -> Add

| | | | | | Expr -> Designator -> DataRef -> Name = 'a'

| | | | | | Expr -> Designator -> DataRef -> Name = 'b'

| | | OmpEndBlockDirective -> OmpBlockDirective -> Directive = Parallel

<More Fortran parse tree>
  1. The first lowering will be to a mix of FIR dialect and OpenMP dialect. The OpenMP dialect has an operation called parallel with a nested region of code. The nested region will have FIR (and standard dialect) operations.
Mlir.region(…) {

%1 = fir.x(…)

…

%20 = omp.parallel {

        %1 = addf %2, %3 : f32

      }

%21 = <more fir>

…

}
  1. The next lowering will be to OpenMP and LLVM dialect
Mlir.region(…) {

%1 = llvm.xyz(...)

…

%20 = omp.parallel {

        %1 = llvm.fadd %2, %3 : !llvm.float

      }

 %21 = <more llvm dialect>

…

}
  1. The next conversion will be to LLVM IR. Here the OpenMP dialect will be lowered using the OpenMP IRBuilder and the translation library of the LLVM dialect. The IR Builder will see that there is a region under the OpenMP construct omp.parallel. It will collect all the basic blocks inside that region and then generate outlined code using those basic blocks. Suitable calls will be inserted to the OpenMP API.
define @outlined_parallel_fn(...)

{

  ....

  %1 = fadd float %2, %3

  ...

}

 

define @xyz(…)

{

  %1 = alloca float

  ....

  call kmpc_fork_call(...,outlined_parallel_fn,...)

}

For simd, target refer to the links below.

Example 2: Collapse construct
A walkthrough for the collapse clause on an OpenMP loop construct is given below. This is an example where the transformation (collapse) is performed in the MLIR layer itself.

  1. Fortran OpenMP code with collapse
!$omp parallel do private(j) collapse(2)

do i=lb1,ub1

  do j=lb2,ub2

    ...

    ...

  end do

end do
  1. The Fortran source with OpenMP will be converted to an AST by the F18 parser. Parse tree not shown here to keep it short.

  2. a) The Parse tree will be lowered to a mix of FIR and OpenMP dialects. There are omp.parallel and omp.do operations in the OpenMP dialect which represents parallel and OpenMP loop constructs. The omp.do operation has an attribute “collapse” which specifies the number of loops to be collapsed.

Mlir.region(…) {

  omp.parallel {

    omp.do {collapse = 2} {

      fir.do %i = %lb1 to %ub1 : !fir.integer {

        fir.do %j = %lb2 to %ub2 : !fir.integer {

        ...

        }

      }

    }

  }

}
  1. b) A transformation pass in MLIR will perform the collapsing. The collapse operation will cause the omp.do loop to be coalesced with the loop immediately following it. Note: There exists loop coalescing passes in MLIR transformation passes. We should try to make use of it.
Mlir.region(…) {

  omp.parallel {

    omp.do {

       fir.do %i = 0 to %ub3 : !fir.integer {

        ...

       }

    }

  }
}
  1. Next conversion will be to a mix of LLVM and OpenMP dialect.
Mlir.region(…) {

  omp.parallel {

    %ub3 =

    omp.do %i = 0 to %ub3 : !llvm.integer {

    ...

    }

  }
}
  1. Finally, LLVM IR will be generated for this code. The translation to LLVM IR can make use of the OpenMP IRBuilder. LLVM IR not shown here to keep it short.

For simd, target refer to the links below.
simd: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
target: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

  • Progress
  1. OpenMP MLIR
    → First patch which registers the OpenMP dialect with MLIR has been submitted and merged.
    https://github.com/tensorflow/mlir/pull/244
    → [Under Review] Implementation of a minimal OpenMP dialect with a single construct (barrier)
    Add OpenMP dialect with barrier operation by DavidTruby · Pull Request #275 · tensorflow/mlir · GitHub
    → [Under Review] Translation of barrier construct to LLVM IR by extending the translation of LLVM dialect.
    ⚙ D72962 [MLIR, OpenMP] Translation of OpenMP barrier construct to LLVM IR
  2. OpenMP IRBuilder
    [@Doerfert, Johannes] has added support for the OpenMPIRBuilder. series of patches introducing preliminary support for the IRBuilder which are either approved or under review. The initial set adds support for the parallel and barrier construct. Others (Roger Ferrer, Fady Ghanim, Kiran) have implemented it for constructs like taskwait, flush, critical etc.
    OMP builder + Flang OMP tasks - Google Sheets
    ⚙ D70290 [OpenMP] Use the OpenMPIRBuilder for "omp parallel"
    ⚙ D70290 [OpenMP] Use the OpenMPIRBuilder for "omp parallel"
  • Next Steps

→ Implement on a construct by construct basis starting with the barrier, flush, parallel constructs.

→ Represent construct in OpenMP MLIR
→ Refactor the code for the construct in OpenMP IRBuilder
→ Set up the translation library for OpenMP in MLIR to call the OpenMP IRBuilder
→ Set up the transformation from the frontend to OpenMP MLIR for this construct
→ Upstream changes

  • Maintainers
    The Fortran teams in Nvidia (PGI), Arm, AMD and some members from the US National Labs who are part of the flang community will be maintainers. Can provide list of people if required.

This RFC was sent to mlir google groups in December. Since discourse is the primary medium for communication I am reposting here as suggested by @ftynse. @River707 asked for a ping while reviewing https://reviews.llvm.org/D72400.

CC: @mehdi_amini @jdoerfert

Link to RFC in mlir google groups.
https://groups.google.com/a/tensorflow.org/d/msg/mlir/SCerbBpoxng/bVqWTRY7BAAJ

It seems that the dialect can be orthogonal to FIR and its type system, which the most important thing to me to integrate MLIR (favor reusability across other frontends / compiler frameworks).

As long as this stays an important principle for the development of this dialect, that looks fine to me!

Thanks @mehdi_amini for having a look.

Yes, our plan is to construct this dialect in an orthogonal way so that other frontends can also use it.

What’s this value being returned by omp.parallel? If you need something returned, you could just use the std.return as the terminator; you could elide its printing if this op returns 0 values.

Why is the collapsing here being performed on fir.do’s? The fir.do’s should have just been converted to loop.for’s and the collapsing could be implemented there; that way it’d be reusable on other paths. Ideally, you wouldn’t want to duplicate loop.for infra on fir.do - there is probably nothing in fir.do that you want to preserve that late.

Otherwise, this overall looks good to me. Would introducing omp.parallel in the dialect be the next step?

Thanks @bondhugula for pointing this out. Yes, we do not plan to implement collapse with fir loops. fir loops will be lowered to loop.for and that is the better place to do collapsing.
When this document was initially written the plan for fir loop lowering was not clear. I will update the RFC if discourse lets me edit.

Yes, parallel and flush would be the next two constructs that we will do.

It is not required. I will fix that inconsistency. Thanks for pointing out.

Hi,

I wanted to ask if the omp.parallel construct is available now? Is there a way to generate an omp parallel for/region using mlir currently?

On another point, I tried using mlir-opt with -convert-loops-to-std with a loop.parallel and a regular loop.for and there seems to be no difference in the IR generated for both the serial and parallel case.
I’m unclear how to parallelize a loop with mlir and generate the corresponding llvm ir with OpenMP.

We are currently working on the omp.parallel operation. We hope to have it working and available in a month. Thanks for your interest.
We discussed a little more about the parallel operation in the following discussion page.