LLD_DEFAULT_NOSTART_STOP_GC default on release/13.x and main

This is about a --gc-sections behavior on a niche feature __start_/__stop_ references to C identifier name sections.

As a follow-up to ⚙ D96914 [ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections, on 2021-04-16, @MaskRay pushed [ELF] Default to -z start-stop-gc with a glibc “_libc” special case.

On ⚙ D96914 [ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections, jrtc27 claimed “My approval for this change was conditional on it being opt-in.” and claimed this violated developer policy and therefore should be reverted.
hvdij was roughly on the similar view and called this “break GCC”.
Later jrtc27 filed start-stop-gc default change breaks multiple projects with no viable transition option · Issue #51726 · llvm/llvm-project · GitHub

MaskRay agrees that something went badly (no public review record) but disagrees with jrtc27’s claim because there was an approval with bd1976llvm’s correspondence (unfortunately not public) and

  • FreeBSD folks (dim, emaste) got notification quite a while ago before the commit and emaste agreed this could be proceed.
  • @MaskRay disagrees that the previous behavior (-z nostart-stop-gc) was an intentional behavior contract that projects could rely on.
  • There is a subtle technical reason that ld.lld’s new behavior matches traditional GNU ld behavior for quite a while (most of time before 2015-10). There was a window of 4 years where backsliding in softwares might cause problems.
  • @MaskRay asserts that the breakage was a very unusual case. In the fortunate cases it was actually very likely an NFC and community favored. It was unfortunate that it did show breakage for instance the FreeBSD ldc and hvdijk’s usage of NetworkManager, but …
  • @MaskRay disagrees with the claim that there is no “viable plan” for projects
  • more lengthy discussions like https://reviews.llvm.org/D114186#3168192

Quote D114186#3168192

A toolchain release may contain bugfixes which can cause software to break if they happen to rely on the buggy behavior.
We fix many bugs without introducing a mechanism to go back to the original state. (See below, it’s all about the potentially affected packages.)
In this case it is probably unfair to say software relying on the _start GC behavior has a bug because the ELF world for a long term does not provide good facility (no good != not-exist) for marking sections.
But it is fair to say they are not written with -Wl,–gc-sections in mind.

Using -Wl,–gc-sections for the unprepared software started to work with GNU ld>2015-10. I’ll say that is by accident.
-Wl,–gc-sections is uncommon (no distro enabled it by default), so historically there is no problem.
(The addition of retain will make this more reliable, but it is no mandatory.)

Whether the default can flip and how soon it can flip, depend on the number of software relying on the accidental behavior, how they can cover from the regression, how serious the regression will be.
From Debian Code Search, I’ll say the number is small (~20 even after counting duplicates).
I have checked many and all I checked don’t have internal -Wl,–gc-sections.

With -z nostart-stop-gc, I’ll say it is easy for them to recover from the regression.
With the introduction of a diagnostic and a new documentation page, I’ll say it is straightforward for a developer/user to catch the problem.

With these, I think keeping -z start-stop-gc for 13.0.1 is still fine.
Some people tend to be more conservative and don’t agree with me, but on the other hand cannot show more evidence.
With other points (e.g. at this point, flipping the behavior back and forth can probably just lead to confusion since people may not remember why 13.0.0/13.0.1/main are different. 13.0.0 cannot be changed now.)

In a follow-up, emaste did an exp run on FreeBSD and showed that ldc and its 3 dependent packages were affected.

On 2022-01-11, @tstellar pushed a variation of ⚙ D114186 [lld][CMake] Add LLD_DEFAULT_NOSTART_STOP_GC which defaulted to LLD_DEFAULT_NOSTART_STOP_GC=on.

MaskRay added additional points (some were added several months in numerous comments) favoring keeping the default -z start-stop-gc behavior for both main and release/13.x branches:

  • 13.0.0 has been out for more than 3 months ago and to the best of his knowledge there hasn’t been any new reports about issues with the -z start-stop-gc behavior
  • beside being tested by FreeBSD (with ldc breakage[1]), on Linux side, it’s likely quite safe now as some Gentoo Linux users use ld.lld with either GCC or Clang. Neither usage has reported problems.
  • switching to -z nostart-stop-gc for release/13.x needs justification. jrtc27’s claim is developer policy claim violation but MaskRay disagrees with it.
  • @MaskRay pushed ⚙ D114830 [ELF] Hint -z nostart-stop-gc for __start_ undefined references to both both main and release/13.x which can catch most issues if -z start-stop-gc ever catches a regression. The user will know what to do from the linker diagnostic. (They can always enable --no-gc-sections, even before every mechanism was available)
  • the current LLD_DEFAULT_NOSTART_STOP_GC=on state in release/13.x penalizes ld.lld’s GC efficiency on Clang PGO/coverage and sanitizer-coverage and potentially many new instrumentation techniques.

[1]: the ldc breakage is due to FreeBSD using LLVM 12 library. ldc is fine with LLVM 13.

Therefore, @MaskRay request LLD_DEFAULT_NOSTART_STOP_GC=off for release/13.x