New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dockerTools: Eliminate layer creation copypasta #51587
dockerTools: Eliminate layer creation copypasta #51587
Conversation
Docker images used to be, essentially, a linked list of layers. Each layer would have a tarball and a json document pointing to its parent, and the image pointed to the top layer: imageA ----> layerA | v layerB | v layerC The current image spec changed this format to where the Image defined the order and set of layers: imageA ---> layerA |--> layerB `--> layerC For backwards compatibility, docker produces images which follow both specs: layers point to parents, and images also point to the entire list: imageA ---> layerA | | | v |--> layerB | | | v `--> layerC This is nice for tooling which supported the older version and never updated to support the newer format. Our `buildImage` code only supported the old version, so in order for `buildImage` to properly generate an image based on another image with `fromImage`, the parent image's layers must fully support the old mechanism. This is not a problem in general, but is a problem with `buildLayeredImage`. `buildLayeredImage` creates images with newer image spec, because individual store paths don't have a guaranteed parent layer. Including a specific parent ID in the layer's json makes the output less likely to cache hit when published or pulled. This means until now, `buildLayeredImage` could not be the input to `buildImage`. The changes in this PR change `buildImage` to only use the layer's manifest when locating parent IDs. This does break buildImage on extremely old Docker images, though I do wonder how many of these exist. This work has been sponsored by Target.
Extract the layer creation code in to make-layer.sh and update each layer creation function to use it.
@GrahamcOfBorg build nixosTests.docker-tools |
tar -C image/$parentID/layer -xpf image/$parentID/layer.tar | ||
rm image/$parentID/layer.tar | ||
extractionID=0 | ||
for layerTar in $(cat layer-list); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while IFS= read layerTar; do
# ...
done <layer-list
while [[ -n "$currentID" ]]; do | ||
layerChecksum=$(sha256sum image/$currentID/layer.tar | cut -d ' ' -f1) | ||
|
||
for layerTar in $(cat ./layer-list); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too.
# would fail if layer-list was completely empty. | ||
echo "$layerID/layer.tar" | ||
cat layer-list | ||
) | ${pkgs.moreutils}/bin/sponge layer-list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would break in the cross-compiling case.
Better to use:
(
echo "$layerID/layer.tar"
cat layer-list
) > layer-list-new
mv layer-list-new layer-list
This also removes the dependency on moreutils while having the same amount of external commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of cross-compiling: All buildInputs
above should become nativeBuildInputs
@@ -673,6 +646,9 @@ rec { | |||
if [[ -n "$fromImage" ]]; then | |||
echo "Unpacking base image..." | |||
tar -C image -xpf "$fromImage" | |||
|
|||
cat ./image/manifest.json | jq -r '.[0].Layers | .[]' > layer-list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several cases of cat file |
over the whole file, that could become:
cat ./image/manifest.json | jq -r '.[0].Layers | .[]' > layer-list | |
jq < ./image/manifest.json -r '.[0].Layers | .[]' > layer-list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that cat is useless, but I find it much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also jq -r '.[0].Layers | .[]' ./image/manifest.json > layer-list
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement, but if performance is a concern, this should probably be rewritten in a 'real' language that doesn't need to start processes to perform non-trivial operations.
Motivation for this change
See the second commit, as the first commit is in #51528.
Essentially, there were several places which ran the same exact commands to build a docker layer, and now there is only one copy of that code :)
I experimented with deduplicating the image creation code, but boy howdy that is for another day.
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)nix path-info -S
before and after)