Hi all, I’m looking for some feedback on the following problem and proposed solution.
I want to add a new mode where the background index only indexes changed files and a static index is used for unchanged files.
For large codebases, a static index can take hours to generate. It is therefore infeasible to regenerate it every time a developer wants to pull changes into their workspace. The background indexer is also unable to help for large codebases, because it indexes all files, making it slow to find the files of interest and using significant system resources while doing so. If the background index is used with a static index, it also duplicates effort for files that have not changed since the static index was generated.
A developer can open the changed files so that clangd builds them in the in-memory FileIndex, but
this is cumbersome and does not persist between loads.
An extra ‘mode’ is added to background indexing where it only indexes files changed after a certain time. A static index is still supplied (remote, or with --index-file) to support unchanged files. The background indexing bridges the gap between the static index and the current state of the codebase, without having to open all the changed files.
Summary of modes:
Equivalent to current --background-index=false
Equivalent to --background-index=true
Only indexes files changed after a certain time (baseline time). The baseline time can be startup or the modification timestamp of a certain file.
In ‘changed’ mode, the background index does not observe updates to the CDB (which is what currently happens for --background-index=true). Instead, a ‘BackgroundFileWatcher’ starts a thread that traverses the file system under a specific directory (given as an argument at startup) looking for
files whose changed time is after the baseline time. It enqueues those files as they are found, using a lambda of the BackgroundIndexer’s ‘enqueue’ method (like BackgroundIdx, BackgroundFileWatcher is a field of ClangdServer).
I’ve created a prototype for the above solution that works, and would be interested in making a pull request if there’s interest/approval. I have found that the background index and the static index work well together as designed.
The prototype uses timestamps to find changed files, but the final implementation could support swapping this for different modules depending on how the user wants the filesystem to be monitored; e.g., a git module for git repos.
I’m looking for feedback on whether this problem is considered an actual problem, and, if so, whether the proposed solution sounds suitable.
Thanks in advance.