Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Layered Docker Images #47411

Merged
merged 5 commits into from Oct 1, 2018

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Sep 26, 2018

Create a many-layered Docker Image.

Implements much less than buildImage:

  • Doesn't support specific uids/gids
  • Doesn't support runninng commands after building
  • Doesn't require qemu
  • Doesn't create mutable copies of the files in the path
  • Doesn't support parent images

If you want those feature, I recommend using buildImage.

Notably, it does support:

  • Caching low level, common paths based on a graph traversial
    algorithm, see later in this description.

  • Configurable number of layers. If you're not using AUFS or not
    extending the image, you can specify a larger number of layers at
    build time:

    pkgs.dockerTools.buildLayeredImage {
      name = "hello";
      maxLayers = 128;
      config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
    };
    
  • Parallelized creation of the layers, improving build speed.

  • The contents of the image includes the closure of the configuration,
    so you don't have to specify paths in contents and config.

    With buildImage, paths referred to by the config were not included
    automatically in the image. Thus, if you wanted to call Git, you
    had to specify it twice:

    pkgs.dockerTools.buildImage {
      name = "hello";
      contents = [ pkgs.gitFull ];
      config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
    };
    

    buildLayeredImage on the other hand includes the runtime closure of
    the config when calculating the contents of the image:

    pkgs.dockerTools.buildImage {
      name = "hello";
      config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
    };
    

Minor Problems

  • If any of the store paths change, every layer will be rebuilt in
    the nix-build. However, beacuse the layers are bit-for-bit
    reproducable, when these images are loaded in to Docker they will
    match existing layers and not be imported or uploaded twice.

Common Questions

  • Aren't Docker layers ordered?

    No. People who have used a Dockerfile before assume Docker's
    Layers are inherently ordered. However, this is not true -- Docker
    layers are content-addressable and are not explicitly layered until
    they are composed in to an Image.

  • What happens if I have more than maxLayers of store paths?

    The first (maxLayers-2) most "popular" paths will have their own
    individual layers, then layer #(maxLayers-1) will contain all the
    remaining "unpopular" paths, and finally layer #(maxLayers) will
    contain the Image configuration.

Popularity Contest Algorithm

Using a simple algorithm, convert the references to a path in to a
sorted list of dependent paths based on how often they're referenced
and how deep in the tree they live. Equally-"popular" paths are then
sorted by name.

The existing writeReferencesToFile prints the paths in a simple
ascii-based sorting of the paths.

Sorting the paths by graph improves the chances that the difference
between two builds appear near the end of the list, instead of near
the beginning. This makes a difference for Nix builds which export a
closure for another program to consume, if that program implements its
own level of binary diffing.

For an example, Docker Images. If each store path is a separate layer
then Docker Images can be very efficiently transfered between systems,
and we get very good cache reuse between images built with the same
version of Nixpkgs. However, since Docker only reliably supports a
small number of layers (42) it is important to pick the individual
layers carefully. By storing very popular store paths in the first 40
layers, we improve the chances that the next Docker image will share
many of those layers.*

Given the dependency tree:

A - B - C - D -\
 \   \   \      \
  \   \   \      \
   \   \ - E ---- F
    \- G

Nodes which have multiple references are duplicated:

A - B - C - D - F
 \   \   \
  \   \   \- E - F
   \   \
    \   \- E - F
     \
      \- G

Each leaf node is now replaced by a counter defaulted to 1:

A - B - C - D - (F:1)
 \   \   \
  \   \   \- E - (F:1)
   \   \
    \   \- E - (F:1)
     \
      \- (G:1)

Then each leaf counter is merged with its parent node, replacing the
parent node with a counter of 1, and each existing counter being
incremented by 1. That is to say - D - (F:1) becomes - (D:1, F:2):

A - B - C - (D:1, F:2)
 \   \   \
  \   \   \- (E:1, F:2)
   \   \
    \   \- (E:1, F:2)
     \
      \- (G:1)

Then each leaf counter is merged with its parent node again, merging
any counters, then incrementing each:

A - B - (C:1, D:2, E:2, F:5)
 \   \
  \   \- (E:1, F:2)
   \
    \- (G:1)

And again:

A - (B:1, C:2, D:3, E:4, F:8)
 \
  \- (G:1)

And again:

(A:1, B:2, C:3, D:4, E:5, F:9, G:2)

and then paths have the following "popularity":

A     1
B     2
C     3
D     4
E     5
F     9
G     2

and the popularity contest would result in the paths being printed as:

F
E
D
C
B
G
A
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

This PR was written while working for Target.

cc @adelbertc @shlevy @stew

Using a simple algorithm, convert the references to a path in to a
sorted list of dependent paths based on how often they're referenced
and how deep in the tree they live. Equally-"popular" paths are then
sorted by name.

The existing writeReferencesToFile prints the paths in a simple
ascii-based sorting of the paths.

Sorting the paths by graph improves the chances that the difference
between two builds appear near the end of the list, instead of near
the beginning. This makes a difference for Nix builds which export a
closure for another program to consume, if that program implements its
own level of binary diffing.

For an example, Docker Images. If each store path is a separate layer
then Docker Images can be very efficiently transfered between systems,
and we get very good cache reuse between images built with the same
version of Nixpkgs. However, since Docker only reliably supports a
small number of layers (42) it is important to pick the individual
layers carefully. By storing very popular store paths in the first 40
layers, we improve the chances that the next Docker image will share
many of those layers.*

Given the dependency tree:

    A - B - C - D -\
     \   \   \      \
      \   \   \      \
       \   \ - E ---- F
        \- G

Nodes which have multiple references are duplicated:

    A - B - C - D - F
     \   \   \
      \   \   \- E - F
       \   \
        \   \- E - F
         \
          \- G

Each leaf node is now replaced by a counter defaulted to 1:

    A - B - C - D - (F:1)
     \   \   \
      \   \   \- E - (F:1)
       \   \
        \   \- E - (F:1)
         \
          \- (G:1)

Then each leaf counter is merged with its parent node, replacing the
parent node with a counter of 1, and each existing counter being
incremented by 1. That is to say `- D - (F:1)` becomes `- (D:1, F:2)`:

    A - B - C - (D:1, F:2)
     \   \   \
      \   \   \- (E:1, F:2)
       \   \
        \   \- (E:1, F:2)
         \
          \- (G:1)

Then each leaf counter is merged with its parent node again, merging
any counters, then incrementing each:

    A - B - (C:1, D:2, E:2, F:5)
     \   \
      \   \- (E:1, F:2)
       \
        \- (G:1)

And again:

    A - (B:1, C:2, D:3, E:4, F:8)
     \
      \- (G:1)

And again:

    (A:1, B:2, C:3, D:4, E:5, F:9, G:2)

and then paths have the following "popularity":

    A     1
    B     2
    C     3
    D     4
    E     5
    F     9
    G     2

and the popularity contest would result in the paths being printed as:

    F
    E
    D
    C
    B
    G
    A

* Note: People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.
Create a many-layered Docker Image.

Implements much less than buildImage:

 - Doesn't support specific uids/gids
 - Doesn't support runninng commands after building
 - Doesn't require qemu
 - Doesn't create mutable copies of the files in the path
 - Doesn't support parent images

If you want those feature, I recommend using buildLayeredImage as an
input to buildImage.

Notably, it does support:

 - Caching low level, common paths based on a graph traversial
   algorithm, see referencesByPopularity in
   0a80233487993256e811f566b1c80a40394c03d6
 - Configurable number of layers. If you're not using AUFS or not
   extending the image, you can specify a larger number of layers at
   build time:

       pkgs.dockerTools.buildLayeredImage {
         name = "hello";
         maxLayers = 128;
         config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
       };

 - Parallelized creation of the layers, improving build speed.
 - The contents of the image includes the closure of the configuration,
   so you don't have to specify paths in contents and config.

   With buildImage, paths referred to by the config were not included
   automatically in the image. Thus, if you wanted to call Git, you
   had to specify it twice:

       pkgs.dockerTools.buildImage {
         name = "hello";
         contents = [ pkgs.gitFull ];
         config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
       };

   buildLayeredImage on the other hand includes the runtime closure of
   the config when calculating the contents of the image:

       pkgs.dockerTools.buildImage {
         name = "hello";
         config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
       };

Minor Problems

 - If any of the store paths change, every layer will be rebuilt in
   the nix-build. However, beacuse the layers are bit-for-bit
   reproducable, when these images are loaded in to Docker they will
   match existing layers and not be imported or uploaded twice.

Common Questions

 - Aren't Docker layers ordered?

   No. People who have used a Dockerfile before assume Docker's
   Layers are inherently ordered. However, this is not true -- Docker
   layers are content-addressable and are not explicitly layered until
   they are composed in to an Image.

 - What happens if I have more than maxLayers of store paths?

   The first (maxLayers-2) most "popular" paths will have their own
   individual layers, then layer #(maxLayers-1) will contain all the
   remaining "unpopular" paths, and finally layer #(maxLayers) will
   contain the Image configuration.
@grahamc grahamc requested a review from nlewo September 26, 2018 21:59
@grahamc
Copy link
Member Author

grahamc commented Sep 26, 2018

Docs rendered:
image

@grahamc
Copy link
Member Author

grahamc commented Sep 26, 2018

cc @roberth @moretea

@srhb
Copy link
Contributor

srhb commented Sep 27, 2018

I'm not sure I understand exactly how your "automatic closure inclusion" differs from dockerTools.buildImage and the example doesn't really clear it up.

Using buildImage and starting a container from this image outputs "Hello, world!" as expected:

pkgs.dockerTools.buildImage {
  name = "hello";
  config.Cmd = [ "${pkgs.hello}/bin/hello" ];
}

Also adding it to contents just creates the moral "buildEnv" in /

@grahamc
Copy link
Member Author

grahamc commented Sep 27, 2018

Great catch, @srhb -- fixed up my docs. Not sure where I got that idea.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really promising… gotta find some time to try it out though 👼
@grahamc I wonder if at some point we could/should build OCI images (that said… docker cannot import oci image as of today 😓)

@nlewo
Copy link
Member

nlewo commented Sep 27, 2018

@grahamc It would be nice to add a basic test (build and run a layered image) in nixos/test/docker-tools.nix. Otherwise, LGTM.

@vdemeester Docker cannot, but Skopeo can convert a OCI image to a Docker image! I already tried a bit to use Buildah to build an image instead of our shell scripts, but still wip...

@vdemeester
Copy link
Member

@nlewo there is also umoci to build OCI images. But yeah, forgot about using skopeo in that use 👼. I wanted to investigate using it to simplify the scripts too 😝

@roberth
Copy link
Member

roberth commented Sep 27, 2018

Nice work @grahamc!
I wonder how it performs for various kinds of images, depending on languages used, use of data dependencies, etc. Do you have any info about this?

It seems to me that small packages might be a problem if there are too many, but it may not be a problem in practice. If it is, there may be another heuristic to cluster the small ones in a clever way.
Well, that's just me speculating.

It looks very compelling!

@grahamc
Copy link
Member Author

grahamc commented Sep 27, 2018

It would be nice to add a basic test (build and run a layered image) in nixos/test/docker-tools.nix.

Test added!

It seems to me that small packages might be a problem if there are too many, but it may not be a problem in practice. If it is, there may be another heuristic to cluster the small ones in a clever way.

Unfortunately trying to combine any paths makes it much less likely that different images would share anything. Small packages don't seem to be such a problem or cause issue.

I've tested this with images containing python, fortran, java, haskell, php, bash... sometimes all at once :)

@adisbladis
Copy link
Member

It seems to me that small packages might be a problem if there are too many, but it may not be a problem in practice. If it is, there may be another heuristic to cluster the small ones in a clever way.
Well, that's just me speculating.

I did run into this issue building images based on nodejs with ~800 packages in one image.
I'm thinking that this PR is a good general approach to the problem but we may need to take a different approach in some pathological cases.

@grahamc
Copy link
Member Author

grahamc commented Sep 28, 2018 via email

@grahamc
Copy link
Member Author

grahamc commented Sep 28, 2018

I did run into this issue building images based on nodejs with ~800 packages in one image.

I tried building images with large numbers of dependencies like quassel-webserver and azure-cli. Not 800, but a large number none the less.

I found that base layers were well selected: glibc, nodejs, ncurses, etc. I think if you want greater control on what layers are created, buildImage is for you.

@nlewo nlewo merged commit 56b4db9 into NixOS:master Oct 1, 2018
@nlewo
Copy link
Member

nlewo commented Oct 1, 2018

Thanks!

@moretea
Copy link
Contributor

moretea commented Oct 1, 2018

Great! I was not aware that there was a practical limit on the number of layers in an image.

I wrote https://github.com/ContainerSolutions/nixpkgs-overlay/blob/master/docker-tools.nix in last April. It differs by building the layers as separate derivations that can be cached by Nix. That might speed up building images quite a bit actually, by doing the minimal amount of work possible.

It also offers to types of outputs: a directory of image layers, and a final tar file that is accepted by docker load.
My plan is to write a simple script that will upload the layers from this layers directory to a Docker Registry directly soonish.

@graham-at-target
Copy link
Contributor

Indeed, a hard limit at 125: https://github.com/moby/moby/blob/b3e9f7b13b0f0c414fa6253e1f17a86b2cff68b5/layer/layer_store.go#L23-L26 it looks like your code there uses IFD. is that right?

@graham-at-target graham-at-target deleted the multi-layered-images-crafted branch October 1, 2018 16:28
@moretea
Copy link
Contributor

moretea commented Oct 1, 2018

Hm. I only tried small programs so far ;) Too bad that docker has this this limitation...

My code also only relies on writeReferencesToFile. See about here.

@graham-at-target
Copy link
Contributor

Ah, yeah, unfortunately that does do a build and then imports it, so we can't use it in Nixpkgs.

@Nadrieril
Copy link
Member

@graham-at-target Cool work ! I just read your blog post on this, and I was wondering: would it be possible to merge together some of the very common layers if they are often used together ?
For example, I would guess most images will want bash, so having the first layer be bash+its deps would save a few layers compared to having bash split into glibc/ncurses/readline. This wouldn't be much worse for cache hits since most people who want glibc also want bash. In a big project, the few layers saved that way can be used to cache more dependencies, which might significantly improve build time when e.g. changing only the app code.
Finding an algorithm that computes this seems hard though, in particular since it doesn't seem to be guessable from a single project.
Does that make sense ? I'm guessing that your solution is already a significant improvement, I might be just nitpicking.

@adisbladis
Copy link
Member

adisbladis commented Oct 2, 2018

@Nadrieril I had the same thought, that it's probably better to explicitly cut the image into layers at certain known points like nodejs, python3, glibc etc.

It's by no means done since I didn't have much time yet, here is the current progress: https://gist.github.com/adisbladis/777c7d8240be35faa107bc8d6c869a9f.

I plan to add some minor features and port all the python code to nix before I consider it done.

@grahamc
Copy link
Member Author

grahamc commented Oct 2, 2018 via email

@flokli flokli mentioned this pull request Oct 27, 2018
9 tasks
@copumpkin
Copy link
Member

@grahamc this is delightful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet