Skip to content

Conversation

@gabrielrussoc
Copy link

@gabrielrussoc gabrielrussoc commented Nov 20, 2024

The local registry gets the image config as input, which contains the hashes of all the layers inside of the image.
We were simply returning it but it turns out the hashes of the layers in the config does not always match the actual layers.
This is caused by a weird interaction in Databricks internal rules where we have a docker layer as a parent of a docker base like docker_layer -> docker_base -> docker_layer.

Essentially, if the docker layer changes, we are required to rebuild the docker_base but there is nothing enforcing that today (see https://github.com/databricks-eng/universe/blob/bf55a02cb20c5bb72992a4bb401212bea69f9397/bazel/rules/docker.bzl#L2051).

When that happens, we get errors like the ones seen on:
https://runbot-ci.cloud.databricks.com/build/Unit-Compile-Pr/run-logs/45377642

layers from manifest don't match image configuration

So to workaround this we make the behaviour of the loader_tool to be the same as the original incremental loader: we overwrite the hashes in the config using the values computed from the actual layers:

"diff_ids": [$(join_by , ${diff_ids[@]})],


Also we use an ThreadingHTTPServer and retry pulls to increase reliability (to avoid errors like):

error pulling image configuration: download failed after attempts=6: net/http: TLS handshake timeout

https://runbot-ci.cloud.databricks.com/module-results/30996083051?runId=45355617


Lastly, we also make sure to pass down RUNFILES from the incremental loader script into the loader binary and make sure to unset RUNFILES_MANIFEST_FILE since our scala rules don't set those.

@gabrielrussoc gabrielrussoc changed the title [rbe] Local registry: build config diff ids from layers [rbe] Local registry: build config diff ids from layers, use http server and better runfiles propagation Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant