Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerTools: Eliminate layer creation copypasta #51587

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Dec 5, 2018

Motivation for this change

See the second commit, as the first commit is in #51528.

Essentially, there were several places which ran the same exact commands to build a docker layer, and now there is only one copy of that code :)

I experimented with deduplicating the image creation code, but boy howdy that is for another day.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Docker images used to be, essentially, a linked list of layers. Each
layer would have a tarball and a json document pointing to its parent,
and the image pointed to the top layer:

    imageA  ----> layerA
                    |
                    v
                  layerB
                    |
                    v
                  layerC

The current image spec changed this format to where the Image defined
the order and set of layers:

    imageA  ---> layerA
            |--> layerB
            `--> layerC

For backwards compatibility, docker produces images which follow both
specs: layers point to parents, and images also point to the entire
list:

    imageA  ---> layerA
            |      |
            |      v
            |--> layerB
            |      |
            |      v
            `--> layerC

This is nice for tooling which supported the older version and never
updated to support the newer format.

Our `buildImage` code only supported the old version, so in order for
`buildImage` to properly generate an image based on another image
with `fromImage`, the parent image's layers must fully support the old
mechanism.

This is not a problem in general, but is a problem with
`buildLayeredImage`.

`buildLayeredImage` creates images with newer image spec, because
individual store paths don't have a guaranteed parent layer. Including
a specific parent ID in the layer's json makes the output less likely
to cache hit when published or pulled.

This means until now, `buildLayeredImage` could not be the input to
`buildImage`.

The changes in this PR change `buildImage` to only use the layer's
manifest when locating parent IDs. This does break buildImage on
extremely old Docker images, though I do wonder how many of these
exist.

This work has been sponsored by Target.
Extract the layer creation code in to make-layer.sh and update
each layer creation function to use it.
@grahamc grahamc requested review from nlewo, globin and samueldr and removed request for nlewo and globin December 5, 2018 20:04
@grahamc
Copy link
Member Author

grahamc commented Dec 5, 2018

@GrahamcOfBorg build nixosTests.docker-tools

tar -C image/$parentID/layer -xpf image/$parentID/layer.tar
rm image/$parentID/layer.tar
extractionID=0
for layerTar in $(cat layer-list); do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while IFS= read layerTar; do
 # ...
done <layer-list

while [[ -n "$currentID" ]]; do
layerChecksum=$(sha256sum image/$currentID/layer.tar | cut -d ' ' -f1)

for layerTar in $(cat ./layer-list); do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too.

# would fail if layer-list was completely empty.
echo "$layerID/layer.tar"
cat layer-list
) | ${pkgs.moreutils}/bin/sponge layer-list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would break in the cross-compiling case.
Better to use:

(
 echo "$layerID/layer.tar"
 cat layer-list
) > layer-list-new
mv layer-list-new layer-list

This also removes the dependency on moreutils while having the same amount of external commands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of cross-compiling: All buildInputs above should become nativeBuildInputs

@@ -673,6 +646,9 @@ rec {
if [[ -n "$fromImage" ]]; then
echo "Unpacking base image..."
tar -C image -xpf "$fromImage"

cat ./image/manifest.json | jq -r '.[0].Layers | .[]' > layer-list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several cases of cat file | over the whole file, that could become:

Suggested change
cat ./image/manifest.json | jq -r '.[0].Layers | .[]' > layer-list
jq < ./image/manifest.json -r '.[0].Layers | .[]' > layer-list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that cat is useless, but I find it much more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and slower.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also jq -r '.[0].Layers | .[]' ./image/manifest.json > layer-list :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement, but if performance is a concern, this should probably be rewritten in a 'real' language that doesn't need to start processes to perform non-trivial operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants