[RFC] Implicit addends for non-code sections

(This proposal is part of the CREL work, but this feature can be used standalone for SHT_REL with existing lld. I encourage that you comment on the CREL proposal as well!)

ELF defines two relocation formats, REL and RELA. REL uses implicit addends, saving space compared to RELA’s explicit addend field.
However, REL is often inadequate for static code relocations because the instruction to be modified (commonly 4 bytes on RISC architectures) limits the available space for the addend.

GNU assembler generates RELA static relocations for most ports and generates REL for very few ones (AArch32, BPF, M32R, MIPS o32).
Many GNU ld ports don’t support REL.
lld supports REL for all ports.

Using RELA can be unnecessarily bulky for data and debug sections where the constraints on addend size don’t apply.

I propose that we add an assembler option -Wa,--implicit-addends-for-data to allow non-code sections to use implicit addends to save space. My compact relocation format branch https://github.com/MaskRay/llvm-project/tree/demo-crel contains an implementation along with -Wa,--crel support.

  • When CREL (compact relocation) is not enabled, SHT_REL is selected.
  • When CREL is enabled, CREL with the addend_bit==0 mode is selected.

Let’s explore data relocations in a .debug_str_offsets section.

        .section        .debug_str_offsets,"",@progbits
        .long   .Linfo_string0
        .long   .Linfo_string1
        ...

Here are the number of bytes to encode one relocation:

  • Elf64_Rela: 24
  • Elf64_Rel: 16
  • CREL: 1 (all but the first). Yes, one byte to encode the delta offset and signal that type/symbol index do not change.

Using implicit addends primarily benefits debug sections such as .debug_str_offsets, .debug_names, .debug_addr, .debug_line, but also data sections such as .eh_frame, .data., .data.rel.ro, .init_array.

In a -O0 -g -gpubnames build, using REL for non-code sections decreased relocation size by 27.1% and the .o file size by 6.4%.
Using CREL (-Wa,--crel,--implicit-addends-for-data) decreases the .o file size by 21.6%!

       |reloc size | .o size
-------+-----------+------------
RELA   |550519056  | 2339938120  
REL    |401209104  | 2190607000  
CREL   | 44865612  | 1834284744  
# https://github.com/MaskRay/llvm-project/tree/demo-crel
clang -fuse-ld=lld -Wa,--implicit-addends-for-data a.c -o a

clang -Wa,--implicit-addends-for-data generated relocatable files can be linked with lld, which supports REL for all targets.
However, many GNU ld ports don’t support REL (many do not define elf_backend_may_use_rel_p, or the support may be incomplete).

% clang -Wa,--crel,--implicit-addends-for-data a.c b.c -fuse-ld=bfd
/usr/bin/ld.bfd: unknown architecture of input file `/tmp/a-10a324.o' is incompatible with i386:x86-64 output
/usr/bin/ld.bfd: unknown architecture of input file `/tmp/b-dde44a.o' is incompatible with i386:x86-64 output
/usr/bin/ld.bfd: error in /tmp/a-10a324.o(.eh_frame); no .eh_frame_hdr table will be created
/usr/bin/ld.bfd: error in /tmp/b-dde44a.o(.eh_frame); no .eh_frame_hdr table will be created
clang: error: linker command failed with exit code 1 (use -v to see invocation)

binutils feature request: 31567 – gas/ld: Implicit addends for non-code sections

One complication: there exist reloc types like R_AMDGPU_ABS32_HI which resolves to the high 32 bits of a 64-bit symbol-address-plus-addend. In this case you really need to be able to represent an arbitrary 64-bit addend, even though the field being relocated is only 32 bits.

I guess this particular reloc type is never used in data, but there might be others like it?

It’s not 33% because not all relocation sections go from 24->16 bytes, I take it? (the code sections still use rela, and so aren’t improved)

How’s this look with compression enabled? If we’re adding data to where relocations will be applied, that’ll get less compressible - probably still worth it, I guess - moving bytes out of an uncompressible section into a compressible section. But maybe doesn’t look quite as good that way.

eg: summed across all .o files in clang, what’s the size of .debug_addr and .rel[a].debug_addr before and after? (ideally, try a dbg build inside Google to get numbers most representative of our needs - there’s so many variables that it’s best to do exactly that that some approximation of it)

Right.

Thanks for the suggestion. This is worth experimenting.

-gz=zstd compresses .debug* sections but not relocation sections.
(My demo-crel branch supports compressing REL/RELA/CREL using zstd, which I don’t know whether is worth upstreaming. Decompressing requires just 2 extra lines in lld. The compression needs a few lines in the assembler.)

Let’s explore -g -gz=zstd builds of lld for both -O0 and -O2:

.o size    | reloc size | .debug size |.debug_addr|.c?rela?.debug_addr
1453265896 |  467465160 |  200379733  |     77894 |  51123648         | -g -gz=zstd
1361904480 |  345821648 |  230681356  |   1628142 |  34082432         | -g -gz=zstd -Wa,--implicit-addends-for-data
1042317288 |   56517599 |  200378501  |     77894 |   5000201         | -g -gz=zstd -Wa,--crel
1057438728 |   41336040 |  230681552  |   1628142 |   3720546         | -g -gz=zstd -Wa,--crel,--implicit-addends-for-data

 626745136 |  292634688 |  225932160  |     77920 |  47820480         | -O2 -g -gz=zstd
 564322008 |  201200656 |  254962205  |   3104850 |  31880320         | -O2 -g -gz=zstd -Wa,--implicit-addends-for-data
 363224200 |   29114818 |  225930949  |     77920 |   4513572         | -O2 -g -gz=zstd -Wa,--crel
 377970016 |   14829524 |  254962382  |   3104850 |   2118037         | -O2 -g -gz=zstd -Wa,--crel,--implicit-addends-for-data

(With -Wa,--implicit-addends-for-data, RELA/CREL builds have very similar but not identical .debug sizes because the path components s2-custom-deb1 s2-custom-deb3 compress differently in zstd!)

Observations:

  • With or without -gz=zstd, the .o size reduction ratios with REL are close.
  • Implicit addends make .debug* sections less compressible.
  • REL -gz=zstd is still smaller than RELA -gz=zstd, which is not surprising as we compare uncompressed REL/RELA (larger difference) and compressed non-zero/zero .debug contents (smaller difference).
  • For CREL -gz=zstd, using implicit addends increases .o file sizes likely because the “less compressible” factor is more significant when the relocation size becomes negligible.

Conclusions:

  • CREL reduction ratio becomes incredible with -gz=zstd at a high optimization level: for -O2 -g -gz=zstd, it’s 42.0% reduction in the .o size!
  • CREL with implicit addends might not be worth doing if the priority is debug sections. I’ll probably keep the assembler changes if we proceed with CREL since the additional complexity is very low.

(ideally, try a dbg build inside Google to get numbers most representative of our needs - there’s so many variables that it’s best to do exactly that that some approximation of it)

Will try later when I don’t need to fix merge conflicts.