[RFC] Freezing C++03 headers in libc++

Background

With C++ getting a major release every three years, libc++ must support a continuously growing number of C++ versions. There are currently seven versions of the Standard and we support all of them using the same source code, which is achieved through careful application of #ifdefs. With the increasing size of the library and our desire to keep improving its quality of implementation, sharing the same source code is becoming a barrier to evolving the code base. In particular, a major relic of the past we’ve been dragging for a long time is the need to support Clang in its -std=c++03 language configuration.

Furthermore, the benefits of keeping the C++03 implementation in the same code base as the rest of the library is questionable. Indeed, the C++03 parts of the library are getting very few improvements nowadays, since the most beneficial improvements have been implemented over the past decade. On the other hand, to keep evolving the library, changes frequently break code that isn’t strictly conforming, which results in a maintenance burden for old projects that are only on life support. Freezing the headers would increase the stability of the library, which is a great property for legacy C++03 projects. At the same time, it would make it easier for the C++11-and-later parts to be improved. For example, we foresee improvements to compilation times, debug performance, binary size and possibly compiler diagnostics.

Proposal

We propose making a copy of the libc++ headers, with one copy only having to support the C++03 language mode, and the other copy supporting newer language modes. The Clang driver would then select the appropriate copy of the headers based on the selected language mode.

After an initial period of time where we would provide equal support for both copies of the headers, we would freeze the C++03 headers and only apply security critical bug fixes to them. The reasoning is that these headers would be optimized for stability – even fixing bugs can often end up breaking code that was written in a certain way, which can create churn for legacy projects. In the same spirit, we would also not backport LWG issue resolutions to the C++03 language mode headers. However, note that we would reserve the right to change the C++03 headers if required for compatibility with the “new” headers. Keeping the C++03 headers 100% stable would be a goal, but not a hard promise since that could easily paint us into a corner.

Q&A

What about library configuration?

Libc++ has a __config_site header, which makes it possible to customize the library. These configuration options sometimes change over time. There are currently three major categories of customizations:

  • build-time ABI information, e.g. the inline namespace to use

  • disabled parts of the library, e.g. whether we have support for wchar_t or localization

  • information about implementation details, e.g. which threading API to use

  • the hardening mode

We expect the ABI information and the disabled parts of the library to stay fairly stable, most likely only adding additional options that would simply not be available in C++03 mode. Information about implementation details and the hardening mode are much more likely to change. If these options change in ways that are not purely additive, we expect there to be a way to translate to similar legacy functionality. Taking the change from assertions and debug mode to hardening modes as an example, it would be possible to map the hardening modes to the old assertion/debug modes (albeit in an imperfect way).

Are there ABI implications?

In the stable ABI there should be no significant changes to the current policy. That is, the ABI should stay stable except for changes that are technically ABI breaks but don’t actually break any code. The ability to use the unstable ABI in the C++03 language mode would be removed, since the ABI would otherwise be broken between C++03 and C++11 when we continue evolving the unstable ABI. We don’t foresee this being a major problem, since folks who use the unstable ABI are generally not on life support.

What about older/other compilers?

Libc++ only supports Clang and GCC, and GCC only in C++11 mode or higher. Because of that, copying the headers shouldn’t have any impact on GCC. For Clang, the current release and the last two releases are supported. We would have to keep the “new” headers compatible with C++03 until we only support Clang versions that know to look in the alternate include directory, which represents a migration of 6 months to a year.

How are ODR violations mitigated?

Libc++ uses a lot of tools to avoid ODR violations. One of the most important utilities are [[gnu::abi_tag]]s. The C++03 headers will get a tag distinct from the C++11-and-later headers to avoid ODR violations between the two implementation.

Thanks to @ldionne, @EricWF and @AaronBallman for giving early feedback on the proposal.

4 Likes

The Clang driver would then select the appropriate copy of the headers based on the selected language mode.

Is there some reason not to just #if CXX_VERSION == 2003 #include <foo_cxx03> #else [normal header] #endif? Messing with clang driver include paths is complicated, and doesn’t really seem to provide much benefit here.

2 Likes

Could you elaborate a bit on what would be complicated about changing the driver? Something like

std::string getLibcxxSubDirectory() {
  if (getLangOpts().CPlusPlus11)
    return "v1";
  return "cxx03"; // or whatever the name would be in the end
}

should do the trick, shouldn’t it?

Generally, I’d like to do it in the driver to avoid a mess of 100+ headers whose only job it is to forward to different headers depending on the language mode. We’d have to move all the headers into sub directories and always prepend cxx03 or cxx11 (or similar) when including any detail headers. Doing it in the driver seems to me like the much cleaner option.

Forwarding headers are what would provide best separation of concerns. Consider: Are there libcxx consumers other than clang? (I was under the impression the answer is Yes.)

1 Like

What concerns would that separate? I’m honestly curious, because I can’t think of any. Maybe you mean what should be in the driver vs. the library, but selecting different configurations based on command line arguments seems to me like it’s the whole reason a driver exists.

There is also GCC (which we only support in C++11 and later), but other than that I’m not aware of any. Since we very much rely on (sometimes very new) compiler extension and nobody said anything about other tooling support that I’m aware of, I’m not sure there are any other consumers. Or they are very diligent about keeping up with changes in libc++.

For the search paths, we’d have to detect the version of the libc++ headers, in case someone tries to use a newer clang with an older libc++. And for different targets, we have different logic for computing the search paths, so we’d need to update all of them. And anyone specifying their paths by hand for whatever reason would need to update their code.

I was thinking you’d just stick this logic into the C++11 headers, so we don’t waste time opening additional files in C++11 mode. But in any case, it’s only a few lines of boilerplate per file that nobody ever needs to touch after it’s written.

I am (perhaps unsurprisingly) very supportive of this proposal. I think it addresses long standing pain points in libc++ that have been growing bigger recently, while also improving the stability of the library for code bases that are not actively maintained anymore.

I do think implementing this proposal without involving the driver is something that needs to be investigated as well - it could potentially end up being simple if we strip down the C++03 headers. We wouldn’t put the new headers under a separate directory, we would only put the 03 ones in a separate directory so this may not have a large impact on the rest of the codebase. In any case, I think it is worth considering both implementation approaches.

That makes sense. Assuming we want to select the headers in the driver, what would be sufficient here? Would it be OK to check whether the directory is empty? Should there be a file that specifies the libc++ version, and if that doesn’t exist the fallback would be the v1 include path?

I’ve looked through the files in clang/Driver/ToolChains (I hope they were the correct files), and these are the different ways libc++ is included:
AIX: <sysroot>/opt/IBM/openxlCSDK/include/c++/v1
AMDGPU, AMDGPUOpenMP: Seem to use the default include path
AVR: Seems to use the default include path
BareMetal: <sysroot>/usr/include/c++/v1 if it exists, otherwise <sysroot>/include/c++/v1
Clang, CommonArgs: I don’t think these are target configurations.
CrossWindows: <sysroot>/usr/include/c++/v1
CSKYToolChain, Cuda: Seem to use the default include path.
Darwin: One of <install>/include/c++/v1, <clang-executable-folder>/../include/c++/v1 or <sysroot>/usr/include/c++/v1
DragonFly: Seems to be the default include path.
Flang: I don’t think this is a target configuration.
FreeBSD: <sysroot>/usr/include/c++/v1
Fuchsia: <clang-executable-folder>/../include/<target-triple>/c++/v<MaxNumber> if it exists and <clang-executable-folder>/../include/c++/v<MaxNumber>
Gnu: This seems to be the default if nothing else is configured. <target-dir>/c++/v<MaxNumber> if it exists and <sysroot>/c++/v<MaxNumber>
Haiku: <sysroot>/boot/system/develop/headers/c++/v1
Hexagon: if musl is used <sysroot>/usr/include/c++/v1 if it exists, /usr/include/c++/v1 otherwise. If musl isn’t used /hexagon/include/c++/v1.
HIPAMD, HIPSPV, HLSL, Hurd: Seem to use the default include path.
HIPUtility, InterfaceStubs: Doesn’t seem to be a target configuration
Linux: Seems to use the default include path.
MinGW: <sysroot>/include/<target-dir>/c++/v1 if it exists and <sysroot>/include/c++/v1
MipsLinux: <multilib-sysroot>/c++/v1
MSP430: Seems to use the default include path.
MSVC: Doesn’t seem to have any default path? Pretty sure I got something wrong here.
NaCl: <sysroot>/<target-dir>/include/c++/v1
NetBSD: The first directory in which __config exists: <clang-executable-dir>/../include/c++/v1, <sysroot>/usr/include/c++/v1, <sysroot>/usr/include/c++
OHOS: <clang-executable-dir>/../include/c++/v1 and <clang-executable-dir>/../<target-triple>/c++/v1 if the latter exists
OpenBSD: <sysroot>/usr/include/c++/v1
PPC*: Seem to be special cases for some wrappers, not a general target configuration
PS4CPU, RISCVToolchain: Seem to use the default include path.
Solaris, SPIRV: Seem to use the default include path.
TCE: Doesn’t seem to support C++?
VEToolchain: Seems to use the default include path.
WebAssembly: <sysroot>/include/<target-triple>/c++/v<MaxNumber> and <sysroot>/c++/v<MaxNumber>
XCore: Doesn’t seem to support C++?
ZOS: <install>/include/c++/v1

While this is quite the list and I probably got a few things wrong, almost all of them follow a very similar pattern. Most are <some-dir>/include/c++/v1 and if it exists <some-dir>/include/<target-triple>/c++/v1. While the list of places that need to be updated is quite long, I’m pretty sure this could be consolidated into at most 3-4 functions with different lookup strategies.

Yes. Luckily that is a very small minority, and they’d have to use C++03 for this to actually affect them. I don’t expect the intersection of these groups to be very significant.

I guess that would make it a bit simpler. I’m more concerned about the detail headers though.

How the library works, versus how the compiler works. I don’t dispute that libcxx has dependencies on brand new compiler features, but that doesn’t mean the compiler is tied to brand new library features. Feels very much like a layering violation.

Checking whether the directory exists should be sufficient, I guess.

Something like this, yes.

For headers we don’t expect users to include directly, we don’t need to do anything special? The C++11 headers would use the path to the C++11 implementation headers, the C++03 headers would use the path to the C++03 implementation headers. Stick in some #error directives if you’re worried someone will misuse the headers.


One other advantage of not modifying the driver: no migration period. You can cut over whenever the code is ready.

From a distro-perspective (well, at least mine…) it would be great to simply not ship those C++03 headers – they’re not used and just dead weight to carry around.

From the OP I get the impression that libcxx won’t consider dropping C++03, but I think it’s worth a thought. If people have a legacy code base they can’t / won’t touch, how often are they going to upgrade libcxx? And if the libcxx they’d get from an update is frozen anyway, what benefit would they get from upgrading? Wouldn’t it be much easier to just freeze their libcxx version directly?

It sounds like a large, self-imposed implementation burden to me, with questionable benefit for hypothetical(?) users. I think it’s completely natural that legacy software ends up using legacy versions of their dependencies, but maybe that’s just me :person_shrugging:

3 Likes

I think I’m still missing something. How would the compiler be tied to new library features?

If you don’t want the headers you don’t have to install them. I don’t think we want to force anybody to keep C++03 working if they don’t want it.

We have brought this thought up and have gotten tremendous pushback. It’s simply not an option at this point.

This. Our original desire was to drop C++03 support entirely, however based on the early feedback we got we believe this might never happen, or at least definitely not in the near future.

What directory the headers live in would count as a library feature here.

I got that, but why is the compiler tied to a new library? Checking whether the directory exists and if not falling back to the old include directory sounds like a completely valid strategy. From looking through the code, that seems like it would be just a few more lines. My “code snippet” at the start was too simplistic, but that doesn’t necessarily mean the compiler can’t compile older libraries. This ties the library to a newer compiler, but that’s already the case by using numerous extensions.
I’ll try to prepare a patch for the driver. I think that would help determining whether it’s viable and avoid us talking past each other. I feel like that might be happening a bit here.

1 Like

One possible complication for our embedded toolchain is the removal of the unstable ABI (IIUC this is _LIBCPP_ABI_VERSION >= 2).

As an embedded toolchain we only support static linking, but have to provide pre-compiled binaries for all the possible build configurations that a customer might use.

We’re currently using _LIBCPP_ABI_VERSION of 2 primarily to pick up the standards conformance bug-fixes. ABI stability is less of an issue for us as for an embedded device there is always the opportunity to rebuild.

If the C++03 headers always use _LIBCPP_ABI_VERSION of 1, we definitely can’t use the same precompiled binaries for the C++03 and > C++03 headers, so if we wanted to maintain support for our current configuration we’d either need to go back to _LIBCPP_ABI_VERSION 1 or double the number of pre-compiled binaries.

Right now I’m not sure how big of a problem this will be. I expect that this will be a niche use case, possibly enough of a use case that I can persuade my product management to drop C++03 support, that this is not a blocker for the proposed change.

One possible mitigating change that I can think of is to move the unstable version of _LIBCPP_ABI_VERSION to 3 and have the C++03 headers support _LIBCPP_ABI_VERSION 1 and 2.

I think sunsetting C++03 support is a good direction. How long do we think we need to retain C++03 support? After how many years do we think we can remove these frozen headers altogether?

Could you make a full copy/snapshot of the libcxx and then call it libcxx-lts? It will get low energy security and bug fixes for X years. It may get even packaged as llvm-libcxx-lts.

The canonical libcxx will remove all references of C++03 in X days/weeks/years/releases.

Please read this as: I want to make your life easier. I am not a consumer of C++03 from libcxx.

1 Like

Is this from entities that are contributing to libcxx, or paying people that do?

If not, why do they get to externalize their costs onto libcxx developers? Would the pushback be enough for your boss to sign off on dedicating an FTE to C++03 support?[1]

If the cost/benefit of continued C++03 support is justified, great! If the libcxx-lts approach allows to do this in a low-effort way, even better! I just think it’s worth to do such an evaluation, (ideally) in public.


  1. an extreme simplification for argument’s sake ↩︎