Bit fields and arbitrary-sized integers

Hi,
I’ve got a piece of C++ code that defines a struct with two bit fields. Clang is combining them into a single integer type, and I think: a) there must be a way to make it not do that, and b) there’s a bug in the process.

The code looks like:

enum enum_type1 {a, b, c};
enum enum_type2 {d, e, f};
struct bob {
enum_type1 foo : 64;
enum_type2 bar : 64;
}

When I examine the type information for a variable of type bob, it shows a single field that is an i128. (Ftr: if I increase each field to 128 bits, I again get a single integer, this time with 256 bits; if there is a non-bit-field value in between the two bit fields, the two bit fields do not combine into a single integer.)

So that’s a) – is there a way (a flag, setting, etc.) to keep clang/llvm from combining the two bit fields into a single IntegerType?

And b) – for fun, I went into include/llvm/IR/DerivedTypes and set the maximum-allowed integer to 64 bits. This causes IntegerType:Get to fail its assertion that the size of a bit field be under the max – meaning, clang is automatically creating these large bit fields without first checking if they’ll go through the rest of the compiler, which seems like a bug…?

Thanks as usual,
Tim

Can you post a godbolt link to a small example that illustrates what you mean?

With a quick look I get this from what you posted: Compiler Explorer

Sorry – I was in the middle of editing it when family called… :slight_smile:

It’d still be better with a reproducible link like I sent above.
It’d not clear to me what you mean by "When I examine the type information for a variable of type bob", is this operating on the clang AST or on LLVM IR?

Joker,

Again, thanks for responding so quickly and thoroughly.

My example I’m posting is based on a commercial test suite, so I’ve

been trying to use my own cutdown version as an example so as not

to violate any legalities.

The thing that was missing from your Compiler Explorer example is

that the definition of the enumeration type should read:

enum enum_type1 {a, b, c = ULONG_MAX}

…which forces the enum_type1 to be represented as a long(-long?).

I’ve edited your example: Compiler Explorer

…to answer your second question, I’m inside of llvm at the point

that I’m having trouble – it’s at this point that it tells me a variable

of type bob has only a single field that is an integer of 128 bits.

Tim

OK, so the compiler explorer example shows at this point that it is modeled as a structure in LLVM {i64, i64}.

Now when there is no need to cross the function ABI boundary, the compiler may optimize things differently, like here: Compiler Explorer

In this case it’ll use an i128 to load the struct as a single block and manipulate it a such. This is all quite normal in LLVM though, what kind of issue do you hit with this?

Cool – I was guessing that this was an efficiency optimization…

…but how does one turn it off?

For example, if one’s target architecture could not load an i128,

then this optimization would be more cumbersome than helpful, so

I would have thought that there’d be a clang/llvm setting or flag to

make it stick to the types the user specified…?

Best,

Tim

In general this kind of things are controlled by the target indeed, in the form of the DataLayout.

I don’t remember all the details, but I think it is always legal to load an i128 instead of two i64 at the LLVM IR level, and rely on lowering passes in the backend to legalize this by splitting such a wide load in two smaller ones.