Proof of concept: Caching #23

furtib · 2025-07-24T12:50:17Z

My idea for caching is to have CodeChecker tell us which files it used during its analysis, then add every other input file to an unused_inputs_list, since this is exactly what it is for (according to the documentation).

This proof of concept proves that this is a possible path forward; however, it mustn't be used in its current state.

How it works now:

I give every input file as a parameter to the script.
I store these files for later use
With strace, I monitor which files have CodeChecker opened (This must be replaced)
I write every file that CodeChecker hasn't opened to the unused_inputs_list file (this should be reworked)

This makes it so that Bazel can cache our results and only rerun actions it deems necessary.

This solution also works with the "--ctu" flag (in the zlib project, changing the trees.c file and re-analyzing the project re-analyses inftrees.c too)

nettle · 2025-07-24T14:26:41Z

Well, to be honest, looks a bit complicated and absolutely "not Bazel" idea to me :) with "store these files for later use" and "strace", but, sure, you can try.

Basically that's the third approach :)
Previous two look like the following:

Use internal CodeChecker "caching" mechanism and run CodeChecker analyze on all files using compile_commands.json file, so CodeChecker will get chached data. This approach is compatible with "current" solution, but may require minor updates in current codechecker() rule implementation, ideally should work out of the box. Note that this approach is still not Bazel-compatible!
Use Bazel dependency tracking and run CodeChecker analyze --file as a Bazel Action for each file. This is the only correct way for Bazel-compatible implementation at least of today. Internally we call this "per file" solution. We think that if implemented correctly this solution should cover all cases correctly and satisfy all Bazel requirements.

nettle · 2025-07-25T15:09:38Z

I took a bit closer look and now I think it might be not that bad idea.
However, I still dont understand how unused_inputs_list should work.
Here is the original commit bazelbuild/bazel@5736381
but I do not get it yet - I see no optimization there. I think I just dont understand.

furtib · 2025-07-28T13:18:09Z

Well, as I understand it works something like this:
Files given as input are available in the sandbox environment when the job is running.
When deciding whether to re-run an action, Bazel checks for changes in the contents of the input files.
The unused_files_list gives an exemption from this check.

I marked this solution as a proof of concept because of the use of strace. Monitoring syscalls to know which files we accessed feels really wrong; it should be done by CodeChecker.

Also, there are a couple of pitfalls with the solution in its current form:

When there are 2 files with the same name, and we only need one of them, both will be missing from the unused_files_list, thus the action will re-run for changes in a completely irrelevant file. (Hopefully, this won't be an issue when CodeChecker reports the files it opened.)
Adding new files to the projects: Every action will get the new file as input, and will reanalyze the entire project.
This is also a problem with compile flags; changes in compile_commands.json will result in reanalyzing the complete project.

I'm not sure I have answered all your questions, please specify if I have missed some!

nettle · 2025-07-28T21:06:21Z

Well, yeah, using strace does not look nice.

When there are 2 files with the same name, and we only need one of them, both will be missing from the unused_files_list, thus the action will re-run for changes in a completely irrelevant file. (Hopefully, this won't be an issue when CodeChecker reports the files it opened.)

The same name should not be a problem because, as I can understand, Bazel should not tolerate ambiguity in files.

Adding new files to the projects: Every action will get the new file as input, and will reanalyze the entire project.
...
This is also a problem with compile flags; changes in compile_commands.json will result in reanalyzing the complete project.

Well, we better get rid of compile_commands.json :)

furtib · 2025-07-29T07:35:53Z

The same name should not be a problem because, as I can understand, Bazel should not tolerate ambiguity in files.

That's a relief.

Well, we better get rid of compile_commands.json :)

Sadly even if we add compile_commands.json to the unused_files_list, adding new files would still cause a re-run of the analysis. New files will not be in the unused_files_list, but they will be in the input.

Why: We should have a unit test for caching! (for #24 and #23) What: Created a test case that: - Analyzes a two-source file target - Edits one file from the two - Reruns the analysis, counts the number of files analyzed, with the subcommands option Addresses: #60

furtib added 2 commits July 23, 2025 14:12

Initial caching algo (primitive)

4be573e

Add file-opening-tracking

22f6dfb

furtib marked this pull request as draft July 24, 2025 12:50

Szelethus linked an issue Jul 25, 2025 that may be closed by this pull request

Incremental analysis doesn't work with the per-file rule #21

Closed

Szelethus mentioned this pull request Jul 28, 2025

Incremental analysis doesn't work with the per-file rule #21

Closed

Szelethus requested review from Szelethus, dkrupp and nettle July 28, 2025 14:55

furtib mentioned this pull request Aug 13, 2025

Unit test: caching #59

Merged

furtib self-assigned this Aug 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proof of concept: Caching #23

Proof of concept: Caching #23

Uh oh!

furtib commented Jul 24, 2025

Uh oh!

nettle commented Jul 24, 2025

Uh oh!

nettle commented Jul 25, 2025

Uh oh!

furtib commented Jul 28, 2025

Uh oh!

nettle commented Jul 28, 2025

Uh oh!

furtib commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Proof of concept: Caching #23

Are you sure you want to change the base?

Proof of concept: Caching #23

Uh oh!

Conversation

furtib commented Jul 24, 2025

Uh oh!

nettle commented Jul 24, 2025

Uh oh!

nettle commented Jul 25, 2025

Uh oh!

furtib commented Jul 28, 2025

Uh oh!

nettle commented Jul 28, 2025

Uh oh!

furtib commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants