_sandboxremote.py: avoid reusing failed actions#2033
_sandboxremote.py: avoid reusing failed actions#2033abderrahim wants to merge 1 commit intomasterfrom
Conversation
| stub = self.exec_remote.exec_service | ||
| request = remote_execution_pb2.ExecuteRequest( | ||
| instance_name=self.exec_remote.instance_name, action_digest=action_digest, skip_cache_lookup=False | ||
| instance_name=self.exec_remote.instance_name, action_digest=action_digest, skip_cache_lookup=True |
There was a problem hiding this comment.
We don't want to always skip cache lookup. If no action-service-cache is declared, internal action cache lookup by the remote execution server should still be used.
There was a problem hiding this comment.
Yeah, even when the action-cache-service is defined, there is still value in having the execution re-do cache lookup (in case the action is requested and built by someone else before our action reaches the top of the queue).
You're right that this is probably working around a "broken" remote execution server. I've only tested it with buildbox-casd, and wasn't aware it was different with other servers.
|
I think the main issue is that failed actions are cached in the action cache in the first place. While there may be circumstances where caching failed actions is useful, I don't think we need this for BuildStream at all (given that we have a higher level caching mechanism where we already support caching failures) and most remote execution servers default to not caching failed actions, mainly because failures may be spurious, e.g., due to a worker running out of RAM. BuildGrid caches failures by default but it can be disabled with On the BuildStream side, might it be sufficient to skip action cache lookup (direct action cache query as well as indirectly via Execute()) if |
Yeah, I was using this with buildbox-casd as a server. The failures in question were due to a bug in buildbox-fuse.
The thing is |
Good point but maybe we can find a way to forward this information also in the interactive retry case. |
This is a proposal towards #2020. It's probably not the most efficient way to do it, but at least it works.
What this does is: