Skip to content

Conversation

@furtib
Copy link
Contributor

@furtib furtib commented Jul 24, 2025

My idea for caching is to have CodeChecker tell us which files it used during its analysis, then add every other input file to an unused_inputs_list, since this is exactly what it is for (according to the documentation).

This proof of concept proves that this is a possible path forward; however, it mustn't be used in its current state.

How it works now:

  • I give every input file as a parameter to the script.
  • I store these files for later use
  • With strace, I monitor which files have CodeChecker opened (This must be replaced)
  • I write every file that CodeChecker hasn't opened to the unused_inputs_list file (this should be reworked)

This makes it so that Bazel can cache our results and only rerun actions it deems necessary.

This solution also works with the "--ctu" flag (in the zlib project, changing the trees.c file and re-analyzing the project re-analyses inftrees.c too)

@furtib furtib marked this pull request as draft July 24, 2025 12:50
@nettle
Copy link
Collaborator

nettle commented Jul 24, 2025

Well, to be honest, looks a bit complicated and absolutely "not Bazel" idea to me :) with "store these files for later use" and "strace", but, sure, you can try.

Basically that's the third approach :)
Previous two look like the following:

  1. Use internal CodeChecker "caching" mechanism and run CodeChecker analyze on all files using compile_commands.json file, so CodeChecker will get chached data. This approach is compatible with "current" solution, but may require minor updates in current codechecker() rule implementation, ideally should work out of the box. Note that this approach is still not Bazel-compatible!
  2. Use Bazel dependency tracking and run CodeChecker analyze --file as a Bazel Action for each file. This is the only correct way for Bazel-compatible implementation at least of today. Internally we call this "per file" solution. We think that if implemented correctly this solution should cover all cases correctly and satisfy all Bazel requirements.

@Szelethus Szelethus linked an issue Jul 25, 2025 that may be closed by this pull request
@nettle
Copy link
Collaborator

nettle commented Jul 25, 2025

I took a bit closer look and now I think it might be not that bad idea.
However, I still dont understand how unused_inputs_list should work.
Here is the original commit bazelbuild/bazel@5736381
but I do not get it yet - I see no optimization there. I think I just dont understand.

@furtib
Copy link
Contributor Author

furtib commented Jul 28, 2025

Well, as I understand it works something like this:
Files given as input are available in the sandbox environment when the job is running.
When deciding whether to re-run an action, Bazel checks for changes in the contents of the input files.
The unused_files_list gives an exemption from this check.

I marked this solution as a proof of concept because of the use of strace. Monitoring syscalls to know which files we accessed feels really wrong; it should be done by CodeChecker.

Also, there are a couple of pitfalls with the solution in its current form:

  • When there are 2 files with the same name, and we only need one of them, both will be missing from the unused_files_list, thus the action will re-run for changes in a completely irrelevant file. (Hopefully, this won't be an issue when CodeChecker reports the files it opened.)
  • Adding new files to the projects: Every action will get the new file as input, and will reanalyze the entire project.
  • This is also a problem with compile flags; changes in compile_commands.json will result in reanalyzing the complete project.

I'm not sure I have answered all your questions, please specify if I have missed some!

@nettle
Copy link
Collaborator

nettle commented Jul 28, 2025

Well, yeah, using strace does not look nice.

When there are 2 files with the same name, and we only need one of them, both will be missing from the unused_files_list, thus the action will re-run for changes in a completely irrelevant file. (Hopefully, this won't be an issue when CodeChecker reports the files it opened.)

The same name should not be a problem because, as I can understand, Bazel should not tolerate ambiguity in files.

Adding new files to the projects: Every action will get the new file as input, and will reanalyze the entire project.
...
This is also a problem with compile flags; changes in compile_commands.json will result in reanalyzing the complete project.

Well, we better get rid of compile_commands.json :)

@furtib
Copy link
Contributor Author

furtib commented Jul 29, 2025

The same name should not be a problem because, as I can understand, Bazel should not tolerate ambiguity in files.

That's a relief.

Well, we better get rid of compile_commands.json :)

Sadly even if we add compile_commands.json to the unused_files_list, adding new files would still cause a re-run of the analysis. New files will not be in the unused_files_list, but they will be in the input.

@furtib furtib mentioned this pull request Aug 13, 2025
@furtib furtib self-assigned this Aug 25, 2025
Szelethus pushed a commit that referenced this pull request Aug 29, 2025
Why: We should have a unit test for caching! (for #24  and #23)

What:
Created a test case that:
- Analyzes a two-source file target
- Edits one file from the two
- Reruns the analysis, counts the number of files analyzed, with the
subcommands option

Addresses: #60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incremental analysis doesn't work with the per-file rule

2 participants