Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.09] layered docker images #49240

Merged
merged 4 commits into from Oct 29, 2018

Conversation

flokli
Copy link
Contributor

@flokli flokli commented Oct 27, 2018

Motivation for this change

This backports layered docker images from master to 18.09, PR for master was #47411

cc @grahamc

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

graham-at-target and others added 4 commits October 27, 2018 11:57
Using a simple algorithm, convert the references to a path in to a
sorted list of dependent paths based on how often they're referenced
and how deep in the tree they live. Equally-"popular" paths are then
sorted by name.

The existing writeReferencesToFile prints the paths in a simple
ascii-based sorting of the paths.

Sorting the paths by graph improves the chances that the difference
between two builds appear near the end of the list, instead of near
the beginning. This makes a difference for Nix builds which export a
closure for another program to consume, if that program implements its
own level of binary diffing.

For an example, Docker Images. If each store path is a separate layer
then Docker Images can be very efficiently transfered between systems,
and we get very good cache reuse between images built with the same
version of Nixpkgs. However, since Docker only reliably supports a
small number of layers (42) it is important to pick the individual
layers carefully. By storing very popular store paths in the first 40
layers, we improve the chances that the next Docker image will share
many of those layers.*

Given the dependency tree:

    A - B - C - D -\
     \   \   \      \
      \   \   \      \
       \   \ - E ---- F
        \- G

Nodes which have multiple references are duplicated:

    A - B - C - D - F
     \   \   \
      \   \   \- E - F
       \   \
        \   \- E - F
         \
          \- G

Each leaf node is now replaced by a counter defaulted to 1:

    A - B - C - D - (F:1)
     \   \   \
      \   \   \- E - (F:1)
       \   \
        \   \- E - (F:1)
         \
          \- (G:1)

Then each leaf counter is merged with its parent node, replacing the
parent node with a counter of 1, and each existing counter being
incremented by 1. That is to say `- D - (F:1)` becomes `- (D:1, F:2)`:

    A - B - C - (D:1, F:2)
     \   \   \
      \   \   \- (E:1, F:2)
       \   \
        \   \- (E:1, F:2)
         \
          \- (G:1)

Then each leaf counter is merged with its parent node again, merging
any counters, then incrementing each:

    A - B - (C:1, D:2, E:2, F:5)
     \   \
      \   \- (E:1, F:2)
       \
        \- (G:1)

And again:

    A - (B:1, C:2, D:3, E:4, F:8)
     \
      \- (G:1)

And again:

    (A:1, B:2, C:3, D:4, E:5, F:9, G:2)

and then paths have the following "popularity":

    A     1
    B     2
    C     3
    D     4
    E     5
    F     9
    G     2

and the popularity contest would result in the paths being printed as:

    F
    E
    D
    C
    B
    G
    A

* Note: People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.

(cherry picked from commit fd04517)
Create a many-layered Docker Image.

Implements much less than buildImage:

 - Doesn't support specific uids/gids
 - Doesn't support runninng commands after building
 - Doesn't require qemu
 - Doesn't create mutable copies of the files in the path
 - Doesn't support parent images

If you want those feature, I recommend using buildLayeredImage as an
input to buildImage.

Notably, it does support:

 - Caching low level, common paths based on a graph traversial
   algorithm, see referencesByPopularity in
   0a80233487993256e811f566b1c80a40394c03d6
 - Configurable number of layers. If you're not using AUFS or not
   extending the image, you can specify a larger number of layers at
   build time:

       pkgs.dockerTools.buildLayeredImage {
         name = "hello";
         maxLayers = 128;
         config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
       };

 - Parallelized creation of the layers, improving build speed.
 - The contents of the image includes the closure of the configuration,
   so you don't have to specify paths in contents and config.

   With buildImage, paths referred to by the config were not included
   automatically in the image. Thus, if you wanted to call Git, you
   had to specify it twice:

       pkgs.dockerTools.buildImage {
         name = "hello";
         contents = [ pkgs.gitFull ];
         config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
       };

   buildLayeredImage on the other hand includes the runtime closure of
   the config when calculating the contents of the image:

       pkgs.dockerTools.buildImage {
         name = "hello";
         config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
       };

Minor Problems

 - If any of the store paths change, every layer will be rebuilt in
   the nix-build. However, beacuse the layers are bit-for-bit
   reproducable, when these images are loaded in to Docker they will
   match existing layers and not be imported or uploaded twice.

Common Questions

 - Aren't Docker layers ordered?

   No. People who have used a Dockerfile before assume Docker's
   Layers are inherently ordered. However, this is not true -- Docker
   layers are content-addressable and are not explicitly layered until
   they are composed in to an Image.

 - What happens if I have more than maxLayers of store paths?

   The first (maxLayers-2) most "popular" paths will have their own
   individual layers, then layer #(maxLayers-1) will contain all the
   remaining "unpopular" paths, and finally layer #(maxLayers) will
   contain the Image configuration.

(cherry picked from commit 4fe9006)
(cherry picked from commit d1e46df)
(cherry picked from commit fb2d153)
@flokli flokli requested a review from grahamc October 27, 2018 11:12
@@ -37,29 +37,29 @@
<example xml:id='ex-dockerTools-buildImage'>
<title>Docker build</title>
<programlisting>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated some whitespace changes from 18.09 into that as well.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are basically only additions and should therefore pose very little risk.

@roberth roberth added the 8.has: port to stable A PR already has a backport to the stable release. label Oct 28, 2018
@flokli flokli merged commit ec6af33 into NixOS:release-18.09 Oct 29, 2018
@flokli flokli deleted the layered-docker-images-18.09 branch October 29, 2018 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants