Array layout changed by clang/llvm starting from version 7

Hi,
I’m trying to integrate clang in an open-source high-level synthesis tool we are developing. Clang/LLVM works pretty well but there is an unexpected behavior when constant arrays are compiled.

Here it is a constant declaration taken from countLeadingZeros32 function in softfloat library.

static const int8 countLeadingZerosHigh[256] = {
    8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  };

The code is translated by Clang/LLVM (version greater than 7) in this LLVM declaration:
@countLeadingZerosHigh = dso_local constant <{ [128 x i32], [128 x i32] }> <{ [128 x i32] [i32 8, i32 7, i32 6, i32 6, i32 5, i32 5, i32 5, i32 5, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1], [128 x i32] zeroinitializer }>, align 16

So, the compiler understands that there are 128 non-zero values followed by 128 zero values. This understanding implies a layout change like the one above. This I suppose is fine with standard processor targets but it creates some overhead when Clang/LLVM is used with a Verilog/VHDL target such as the one we are working on.
Since this behavior has been introduced starting from version 7, I was wondering if it is possible to disable such optimization and revert the layout to the simpler single array.
In case this is not possible at the command line level, I would appreciate receiving directions on how to control this behavior by hacking in someway the LLVM codebase.

thanks in advance,
Fabrizio

Why you’re saying that layout was changed? The memory layout is pretty much unchanged. The variable is also not externally visible, so compiler is free to change / optimize it as necessary.

So, I guess it’s rather a consumer of this IR should be changed to deal with this code. After all, you cannot “switch off” that optimization passes might create.

Hi,
You are correct that the memory layout is not changed but declaring the array in this way is changing the way we, as a consumer, are managing such array. When high-level synthesis is a target, a memory organized as a struct with two fields implies to decompose the variable into two distinct memories making the pointer arithmetic more complex.
One way to sort out the issue on our side is to classify this declaration as equivalent to a simple one-dimensional array but before implementing such classification I was asking first to the LLVM community if there exist better approaches to deal with this use case. Maybe this was addressed for some processor targets.
From your answer, what I understood is that this “switch off” is not foreseen. Is my understanding correct? Do you know which LLVM step is introducing this optimization?
thanks,
Fabrizio

Well, as I said, you cannot “protect” yourself from such transformations as this is pretty much valid LLVM IR that could be generated by a frontend and/or some optimization passes. It’s a pretty common misconception of downstream users trying to limit the input via “switching off” something. Typically this leads to nowhere as there are multiple ways for “something” to appear.

In your particular case, it’s a frontend that does such transformation, not an optimization pass (and even more, the latest version of clang tend to use a string as an initializer as in LLVM IR string is similar as array of i8’s).

So, instead of interpreting the global literary you’d need to perform proper memory layout and lowering transformation. And then all problems will go away. After all, everything is explicit in LLVM IR, there is no padding, etc. Maybe you could use AsmPrinter::emitGlobalConstant as a source of inspiration.

Thank you very much.
I’ll follow your suggestion.
Cheers,
Fabrizio