LLVM Discussion Forums

Optimization generate super long function definition

In my code, I generate the following function:

define i32 @gl.qi([500 x i32] %x, i32 %i) {
  %x. = alloca [500 x i32]
  %i. = alloca i32
  %0 = alloca [500 x i32]
  store [500 x i32] %x, [500 x i32]* %x.
  store i32 %i, i32* %i.
  %x.1 = load [500 x i32], [500 x i32]* %x.
  %i.2 = load i32, i32* %i.
  store [500 x i32] %x.1, [500 x i32]* %0
  %1 = icmp slt i32 %i.2, 500
  br i1 %1, label %in-bound, label %out-of-bound

out-of-bound:                                     ; preds = %entry
  call void @gen.panic(i8* getelementptr inbounds ([22 x i8], [22 x i8]* @pool.str.2, i32 0, i32 0))

in-bound:                                         ; preds = %entry
  %2 = getelementptr inbounds [500 x i32], [500 x i32]* %0, i32 0, i32 %i.2
  %idx = load i32, i32* %2
  ret i32 %idx

the high level functionality is to use %i to index %x, and if %i is out of bound, a panic function is called instead.

consider the store line:

  store [500 x i32] %x, [500 x i32]* %x.

once I pass this function to opt -O1 -S --verify --verify-each, it generates code like this:

define i32 @gl.qi([500 x i32] %x, i32 %i) local_unnamed_addr {
  %0 = alloca [500 x i32], align 4
  %x.fca.0.extract = extractvalue [500 x i32] %x, 0
  %x.fca.1.extract = extractvalue [500 x i32] %x, 1
  %x.fca.2.extract = extractvalue [500 x i32] %x, 2
  %x.fca.3.extract = extractvalue [500 x i32] %x, 3
  %x.fca.4.extract = extractvalue [500 x i32] %x, 4
  %x.fca.5.extract = extractvalue [500 x i32] %x, 5

until 500. I put the number to 50000 and it won’t stop.

This is puzzling. I am not sure why must a store command be expanded to a sequence of etractvalues then stores? Is there a way to turn off this particular optimization without turning off the whole optimization?

Or am I looking at the wrong way to do this simple task?

store [500 x i32] %x, [500 x i32]* %x has to be somehow mapped to something the machine can execute, in general the CPU does not have instructions for this and decomposing it allows for better optimizations. LLVM decompose such “array manipulation” very early in SROA (Scalar Replacement of Aggregates): https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates