[RFC] Documentation of Clang diagnostics ... an automated approach

I started some weeks ago a small side project with the goal to automatically generate documentation of all clang compiler diagnostics. When I learned programming over 2 decades ago, I loved the msvc compiler (compared to gcc) because there every diagnostic has a unique number and an associated article that documents its behavior. My initial idea was to also add a unique id combined with a wiki where the users can help built up a documentation that goes beyond what is available on Diagnostic flags in Clang — Clang 19.0.0git documentation .

So I looked into the diagnostic subsystem and was surprised that there already exists an id for (most) diagnostic messages that is used internally within the compiler but is not outlined to the end user. Furthermore, the diagnostics can be easily extracted from the source code by taking a look at the tablegen generated inc files. I started to extract all information I could get from one diagnostics (id, message, category, groups, etc.) and after playing around a bit with ChatGPT, I thought that maybe larger parts of a documentation could be automatically extracted from source and combined with chat bot knowledge.

The first thing that I tried was to generate small code snippets that generate the diagnostics and could serve as an example for a documentation. So I worked on my first AI application that asked ChatGPT via API to generate for a specific diagnostic code that would trigger it. I sent the source code to CompilerExplorer to get the stderr and checked the result for the particular diagnostic message. I sent ChatGPT up to 5 times the result of stderr if the code was not triggering the diagnostics. With that approach I had a success rate of 30-40% (for c++ related diagnostics).

I soon found out that ChatGPT needed more knowledge about the diagnostic to achieve better results. Some diagnostic messages are very generic and it is hard to know what compiler feature is involved or even which input language to use. So I extracted parts of clang’s source code where the diagnostic is trigger, searched for git commit message that first introduced the diagnostics and looked for existing test cases that trigger the diagnostics. With that information I could increase the success rate to 70-80% (for c++).

After having a remarkable amount of information per diagnostic, I uploaded everything into a wiki and started to generate articles using lua modules. I used again ChatGPT API to generate a generic description and an explanation of the code examples within each wiki article.

The result of the first 100 articles can be found at Category:Clang Errors - emmtrix Wiki and Category:Clang Warnings - emmtrix Wiki .

Now I am looking for feedback about the articles. What do you think about the result? What is good/bad? What other information could be included?

The current result is from clang 17.0.6. I plan to further include information how the diagnostic evolved over the past versions. When was it introduced or removed, messages changed, etc.

This is an interesting project, I’d say. While better (and more) documentation for our diagnostics would definitely be appreciated, I’m personally not entirely convinced that this is the best way of going about it.

This sounds like a sensible approach if you want to use AI for this, but the thing about the test cases is also that some of them are very pathological examples (by design), and there may sometimes be more ‘user-friendly’ examples, if you will, that would be simpler to understand while demonstrating the error just as well—that said, I haven’t looked at it much yet, so maybe the code it managed to pick out is actually fairly reasonable in most cases.

This here is the main part of this that I’m rather wary about: From my personal experience w/ ChatGPT, I’ve found that it is remarkably good at generating both code and text that looks or sounds exactly like something an actual person would write, but which is at the same time completely and utterly meaningless or just flat-out wrong.

You’d probably have to double-check every last entry, and that would take a while because we have a lot of diagnostics, as you’re surely aware of now. The point is that what we definitely wouldn’t want is someone getting an error, looking it up in the AI-generated documentation, and then getting a completely wrong idea of what the problem is because the documentation is just wrong (because the AI that wrote it didn’t actually ‘understand’ a word of what it was writing about).

Using AI as a tool to try and explain things like error messages and code is (at least according to my understanding) still a fairly new approach to producing documentation, but at least from personal experience, I’m not convinced it’s quite there (yet). We ran an experiment w/ some of our students not too long ago that entailed explaining compiler errors using AI to see how good of a job it’d do, and it fared very poorly

Moreover, in my experience, if you ask an AI to ‘explain what int x = 4 does’, it’ll tell you that ‘it declares a variable x of type int, initialised with the value 4’, which, while definitely correct, also adds nothing of value whatsoever, because that much is already evident from the code—that said, maybe this is better now; I haven’t checked this with a substantially complex piece of code in a while.

I do think that there may well be value in this as a resource, even though it may not be completely correct, but considering that it is very likely not completely correct, I don’t think we’d want to make this a part of the official Clang documentation, seeing as users probably do and honestly ought to be able to expect their compiler vendor’s docs to be authoritative and because of that also 100% correct (as best as we can manage keeping it up-to-date, that is).

The other parts (essentially anything that doesn’t involve AI-generated descriptions or code that hasn’t been compiled), e.g. when a diagnostic was introduced etc. definitely sound like useful information to have—though off the top of my head, I’m not sure how we’d integrate it or what we’d do with it.

Lastly, if it turns out (or if you already know) that the AI-generated descriptions are at least mostly correct, then (and perhaps irrespective of that, really) it might make more sense to keep this as a separate site, hosted by you, for instance, if you’d be willing to do that. That way, people could reference it as a resource to possibly get an idea as to what may be wrong w/ their code when they get an error, but since it wouldn’t be official documentation, I should hope that they’d take it with a grain of salt and be aware of the fact that it may not be 100% accurate (at least the descriptions wouldn’t be).

All of this is just my personal opinion based on my experience w/ Clang and AI tools in general, though, so I’d definitely also wait and see what other people here have to say about this.

1 Like

I suppose one thing we could consider is brand it as ‘experimental AI-generated documentation that may well be incorrect in some places’; in my opinion, at least, we’d have to make a point of the fact that we can’t guarantee its correctness. I’m candidly not really convinced that this would be a good idea; I’m just bringing it up because it came to mind just now.

Im an end user that actively managed our warning list and fix them in our code, both for MSVC and Clang, I’m intrigued. Warnings (and even compiler errors) are hard to understand and any improvement for it is beneficial.

I’m aware of these pages in MSVC, though when I needed them, I always was disappointed. The examples you share do look better.

Right now, warning flags are either a grouping of other flags, a set of different messages or the combination of both. Having the messages split per text sounds like good, however as a user, I am enabling or disabling a flag and not a specific message. It would be great if we could have one ID per message that is used for enabling/disabling that message and that groups are pure grouping instead of adding messages themselves.

As an end user, this would throttle the usability of this documentation a lot. If it ain’t included/linked from Diagnostic flags in Clang — Clang 19.0.0git documentation it will be much harder to find.

I completely agree here. Incorrect documentation is worse that no documentation.

As a user, I think it makes more sense trying to improve the messages. For example Diagnostic flags in Clang — Clang 19.0.0git documentation The message clearly indicates what the technicalities of the problem are, though it doesn’t explain why this is a problem. Adding “accessing this address is undefined behavior as it accesses the class outside of it’s lifetime”. As such, it might be an interesting experiment to propose new text for the warnings and errors instead of a full page of text.

Another useful thing are the fix it hints. As a user I don’t care that much about a page of documentation if my compiler says “add typename at this location to resolve the issue”.

When these 2 things (message text and fix it) fail me, that is when this elaborate documentation would really help. Though that would also be the place where I would be the most sceptical for an LLM as it can write a very good looking and convincing text which contains big mistakes.

Based on all this reasoning, I feel it would be more beneficial that:

  • we get a better split up of messages such that each has its ID to do lookups
  • the actual messages get improved
  • fixits are added where possible
  • documentation gets added for the background of it

Given this will require a lot of manual review time to get these things landed, try to focus on those messages where the need would be the highest.

For bonus points, you might want to add links to the standard (drafts), such that one can read the standardize if one likes. Maybe that can even be used to compare 2 standard (draft) version to report where things changed and provide a list of elements to review.

TL;DR: I like this experiment, though I consider it a basis to improve the current diagnostics.

I think this is a neat idea and certainly we would benefit from having better documentation for warnings.

I wonder if a better long-term approach for Clang would be to do something like what we do for attributes where each diagnostic’s definition in Tablegen refers to a doc string and we can generate static doc pages the clang docs for each diagnostic.

I see a few advantages to that approach. One is that the documentation can be added with new warnings when they are introduced. The other is that the documentation will be versioned with clang versions, so you can refer to the appropriate documentation for each version of clang.

This last bit is more important for Clang than it is for MSVC. In MSVC the public-facing warning numbers effectively become a stable reference point. MSVC does not repurpose or change the meaning of diagnostic ids between compiler versions. That is not something we promise with Clang.

2 Likes

Thank you for looking into ways to improve our documentation, that’s really appreciated!!

We already have this. :slight_smile:

However, unlike with attributes, diagnostics do not require a documentation field to be present because 99% of diagnostics do not have documentation associated with them. If we got to the point where most diagnostics had documentation, we could easily change this to require a docs field to encourage new additions to have appropriate documentation.

100% agreed; I have very little faith that ChatGPT or the likes will produce quality documentation without extensive code review effort. I did a spot check of what’s already generated and it’s not the worst first take for documentation, but it does require careful review to catch the subtle issues:

"The absence of the semicolon disrupts the compiler’s ability to correctly parse and interpret the code, resulting in… " – no, the compiler knows exactly where it expects the semicolon which is the only reason we can issue that diagnostic in the first place; nothing is disrupted in the compiler.
“Reviewing code for proper statement termination and ensuring that each declaration ends with a semicolon is essential to resolve this error and successfully compile the program.” – basically reads as “ensuring your code is correct is essential to resolving this error” but it manages to also be incorrect because it says “each declaration” without recognition that: int x, y, z declares three declarations but would be wrong to write as int x; y; z;.

Other entries have similar issues.

Another concern is that the documentation has a ton of “fluff” in it and reads very much like AI-generated content. e.g., “This message precisely indicates the line and column where the semicolon is expected, helping to quickly identify and correct the mistake.” or “Declarations are fundamental components in C, C++, and Objective-C programming languages used to specify the interpretation and attributes of some set of identifiers.” so we’ll likely need to edit for brevity and tone while also correcting technical mistakes.

So I think this is an interesting idea and may give us a base from which to work from, but I worry that the review needed for this will be overwhelming unless effort is taken to fix each individual diagnostic before putting it up for review. Finding a way to break this work up into manageable chunks both for the author adding the documentation and for the community reviewing it will be crucial, IMO.

That said, there’s a lot I like about the documentation regarding the implementation details included in them. For example: the “Triggered in Clang Tests” seems like something we can automate to discover where we’re missing test coverage, which would be fantastic. The “Used in Clang Sources” seems like something we can automate to discover diagnostics we never actually emit, which is also a thing that comes up from time to time. The information may not be too useful to Clang users, but it’s still really useful to the project! It’s more related to testing than to documentation, but it might be useful to split some of this off into scripts we could run from Precommit CI (assuming they’re inexpensive enough to do so) to catch “hey you added this diagnostic but never bothered to test it” kind of situations that can be easy to miss in larger reviews adding a number of diagnostics, etc.

2 Likes

:exploding_head:

TIL. Maybe we should require docs on new warnings?

We could put in an “Undocumented” value like we have in Attr.td but make setting something required so that anything left undocumented is at least a conscious decision.

I’d like to get to the point of doing that at some point, but I think it’d be a lot of churn to do with the current state of things because the vast majority of our diagnostics are undocumented. Once we have something like Undocumented explicitly in the source, that’s the perfect time to require documentation on new diagnostics.

(Another thought about where this sort of automation may benefit us is that it might be nice to automatically generate a table of off-by-default warnings so it’s easier for users to discover diagnostics they wouldn’t otherwise encounter.)

1 Like

I have two big concerns with using our diagnostic ID for anything:

1- These are very much not stable. We add/remove/change them pretty often.

2- We often use the same diagnostic for multiple (albeit similar) things via ‘select’ or insert/etc. So this would likely cause this documentation to be ‘useless’ at best in many cases.

I’m generally against using AI for this. I’d be OK with us having documentation on diagnostics in place/require documentation on new diagnostics, though IMO, the process to getting there needs to be manual and actively reviewed.

This is perhaps a good GSoC/etc type thing as well, but it is a HUGE task.

3 Likes

“Triggered in Clang Tests” seems like something we can automate to discover where we’re missing test coverage, which would be fantastic .

Finding the clang tests per diagnostics was more work than I initially thought. I analyzed the lit output to extract all cc1 run commands and re-executed them without -verify flag and used regular expressions + some conflict resolution logic to find the right diagnostic.

One approach could be to extend the cc1 -verify mode that at the end a small summary about the triggered diagnostics is output e.g. “Triggered diagnostics: err_x, warn_y, …” that could be easily parsed and statistics about diagnostic coverage can be created. A new commit should (if possible) not increase the number of uncovered diagnostics.

Another idea is to extend the expected-warning/error functionality that the expected diagnostic id could be specified, e.g. expected-warning {{err_extraneous_token_before_semi, ...}}. That would allow to check that new diagnostics are covered without running the tests.

1 Like

Thank you all for your comments. Here are my thoughts about the AI-generated content:

The most controversial topic was the AI-generated content. I think there was an agreement that AI-generated content for the official documentation is problematic and might be only acceptable with a review process. I also do not trust AI-generated content and I fully agree that the official clang documentation should be correct and verified.

The questions is now if an (unofficial) AI-generated documentation is useful? From my point of view it is … under the following conditions:

  1. It must be marked as AI-generated content so that everybody can decide how much he wants to trust it. The Internet is full of wrong content and we all decide based on the source of information if we can trust them.
  2. A minimum level of quality must be ensured. I would only generate content of diagnostics where I have a small verified example that triggers the diagnostics. The example guarantees that ChatGPT is understanding the purpose of it. Just generating content without background information is not working out.
  3. It must be better than the next best alternative and the next best alternative is google the text and hope to find a forum post about it.
  4. There must be a feedback mechanism: Since the current docu is hosted on a wiki, everybody can modify and correct it. Currently, I don’t know how to in-cooperate user modifications with AI content especially if the content is regenerated e.g. for GPT-5, but that is a different topic. I am also thinking of a voting mechanism such that bad content can be flagged by the community and could then be reworked.

I am interested to continue working on the topic and also extend the non-AI-generated content. The questions is if such an unofficial documentation is supported by the project, e.g. by placing one link (or even a link per diagnostic) from the official documentation.

I have two big concerns with using our diagnostic ID for anything:

I used the ID as central mechanism to identify the diagnostics and I am thinking about providing a patch that would allow to output the diagnostic id behind the diagnostic message like -Xclang -fdiagnostics-show-category -Xclang name.

1- These are very much not stable. We add/remove/change them pretty often.

That might be an issue for the official documentation that always reflects one repository state.

I just added information how the diagnostic evolved over time e.g. Clang error: expected addressable lvalue expression, array element...... (err_omp_expected_addressable_lvalue_or_array_item) - emmtrix Wiki . Having an unstable id is much better than having no one.

2- We often use the same diagnostic for multiple (albeit similar) things via ‘select’ or insert/etc. So this would likely cause this documentation to be ‘useless’ at best in many cases.

I have not yet found an diagnostic with a meaningful message that is used for multiple (distinct) purposes. There are some generic diagnostics like inline assembler, _Pragma, #error, … diagnostics that just consists of “%0” that I ignore for now.

If such a situation occurs it would be simple to split the diagnostic within the source code to improve documentability.

I have more the opposite problem that there exists diagnostics with same (or overlapping) messages but different ids. Without id it is hard or impossible to distinguish them. See Duplicated clang diagnostics with same message ¡ Issue #83955 ¡ llvm/llvm-project ¡ GitHub

1 Like

Ah, good to know!

and

I think either one of these is implementable so long as we have some machinery that can take an integer ID and map it back to a named diagnostic ID that we can make a string from. We could do this from tablegen, for example (if we don’t already have something for it already, I’ve not looked).

My personal preference is for printing the diagnostic IDs as part of the diagnostic output rather than having people write the diagnostic IDs into the tests directly. We sometimes rename diagnostics, so putting the IDs into the tests makes that a more involved process, whereas having a -fdiagnostic-include-ids option or some such would seem likely to lead to less maintenance burden.

However, this seems orthogonal to your original idea, so don’t feel obligated to investigate further unless the idea excites you.

There is a more fundamental question here.

We have thousands of diagnostics and we cannot possibly provide high, tutorial-like quality for all of them (nor should we, imo).
ChatGPT is, at best, a way for contributor to seed their documentation efforts, but that’s it.
I bet that in many case we would have to rewrite everything chatgpt provides during the review.

More to the point, If a diagnostic is not clearly actionable, or hard to understand… We should improve the diagnostic.
The question of whether some diagnostics require a higher level of details and verbosity is interesting.

1 Like

Are you envisioning this documentation is hosted somewhere on llvm.org even if it’s unofficial? If so, I’m a bit worried about that – we can put all the disclaimers we want on the content, but human nature will lead to some folks ignoring/not noticing the disclaimers and it will be assumed to be official documentation by virtue of where it’s hosted. If it’s hosted off llvm.org, then that’s one more signal to folks that this is not official documentation.

I think there’s support for the idea of trying this experiment out to see how well it works in practice, but I don’t think we should link to it from official sources until we’re ready to “own” the content by hosting it on llvm.org. Hosting it ourselves would also require us to figure out processes like how we handle edits to the wiki, moderation, etc because we don’t really have content that lives outside of the repo and doesn’t use the typical code review mechanisms. Seeing how well the unofficial documentation works in the wild would give us time to explore those kinds of things.

Are you envisioning this documentation is hosted somewhere on llvm.org even if it’s unofficial?

Currently, I want to keep it on emmtrix Wiki to keep things simple and to see under which condition this is working out. At least emmtrix.com has sufficient google relevance that the individual pages can be found if you enter a specific error message.

Hosting costs are negligible and the expected OpenAI API costs of 1000$ are also not an issue.

I think there’s support for the idea of trying this experiment out to see how well it works in practice, but I don’t think we should link to it from official sources until we’re ready to “own” the content by hosting it on llvm.org.

Alright, lets see how it works in the wild. There will come a point when I have to decide what to do with it, because I don’t have unlimited time. At least I have prepared everything that it is easy to move to another location in future.

Maybe it also makes sense to extend the functionality to GCC or other compilers. There could be a reuse of examples where both projects could profit from. Then maybe a neutral URL/domain would be better.

On the other side, another idea would be add a switch to clang to output for each diagnostic a short URL like d.llvm.org/HASH that could forward to a diagnostic page. The cool thing is that you could monitor the click rates to identify diagnostics that might be problematic and are candidates for improvement. Also changes in the click rate between two versions could be an early indicator of new bugs. What I want to say is that there are definitely synergetic effects on having it on llvm.org.

However, I’ll continue writing updates here…

1 Like

I had a brief look on the msvc documentation of compiler errors: Compiler errors C2600 Through C2699 | Microsoft Learn

And you’re right. In my memories they were much better than what I saw…