Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some performance optimizations to dockerTools.build{,Layered}Image #87154

Merged
merged 2 commits into from May 19, 2020

Conversation

utdemir
Copy link
Member

@utdemir utdemir commented May 7, 2020

Motivation for this change

This PR aims to speed up creation of Docker images using the dockerTools utilities by reducing the disk I/O performed while creating layers.

The first commit reduces the number of of tar -r invocations, and the second one calculates tarsum's on the fly while creating the tarball. These two changes seem to make the buildLayeredImage function almost twice as fast when building large (> 4GB) images when disk I/O is the bottleneck (eg. on cloud machines).

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (nix-build release.nix -A tests.docker-tools)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

…ing layers

Appending to an existing tar archive repeatedly seems to be a quadratic
operation, since tar seems to traverse the existing archive even using
the `-r, --append` flag. This commit avoids that by passing the list of
files to a single tar invocation.
Calculating the tarsum after creating a layer is inefficient, since
we have to read the tarball we've just written from the disk.

This commit simultaneously calculates the tarsum while creating the
tarball.
@purcell
Copy link
Member

purcell commented May 7, 2020

Adding a little more context, some of the reason for the original slowness is the inefficiency of appending to TAR files - see http://tiamat.name/blogposts/fast-appending-files-to-tar-archive-is-impossible/

The portions of this change that eliminate the separate pass with tarsum are less dramatic in their effect, but still significant.

We have a case with a client where a sizeable image built with buildImage takes around 4-5 minutes, while buildLayeredImage takes 20 minutes with the same contents: the changes in this PR bring the latter down to around 12 minutes, but further opportunities exist for improvements. In particular, multiple layer tar files are currently realised on disk before being joined into an overall archve, but this intermediate storage could likely be eliminated. It might be that a tailored program for this task would be the right approach. It would also be ideal to unify the code paths for buildImage and buildLayeredImage.

Copy link
Member

@nlewo nlewo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, lgtm.

# Compute a checksum of the tarball.
tarhash=$(tarsum < $layerPath/layer.tar)
--transform 's,^nix(/|$),/nix/,' \
--transform 's,^[^/],/nix/store/\0,rS' |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, it seems my sed powers are now powerful enough ;)

  • what is the purpose of ^[^/]? Why ^ is not sufficient?
  • I also don't understand why rS flags are added? What is the default behavior, especially regarding the r flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems my sed powers are now powerful enough ;)

I think that's only because they're tar's variant of sed :).

what is the purpose of ^[^/]? Why ^ is not sufficient?

The transforms get applied one after another, and since the first one converts ^nix to ^/nix, I had to make sure that the second transform won't match to that. Otherwise that nix/ directory at the root would become /nix/store/nix.

I also don't understand why rS flags are added? What is the default behavior, especially regarding the r flag?

The previous code was only replacing the exact store path(s,$n,/nix/store/$n), but since we now archive multiple paths at the same time, we need to prepend /nix/store to everything. But turns out those transformations are applied to symlink targets too, and even when they are relative (eg. ...libpng.so.1 -> libpng.so would become ...libpng.so.1 -> /nix/store/libpng.so). That S flag disables transforming the symlinks. However, just setting S overrides all of the default flags(rsh is the default), so it doesn't apply the transformations to the regular files either. Only setting both r and S makes it so that the transformation is applied to regular files, but not to the symlinks.

Here's the doc I used: https://www.gnu.org/software/tar/manual/html_section/tar_51.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow... ok. Thanks for explanations!

@utdemir
Copy link
Member Author

utdemir commented May 19, 2020

@nlewo @bjornfor Thank you for the reviews! Do you think there is anything else I should to before we can merge this? I think I addressed @nlewo 's comment above.

# Compute a checksum of the tarball.
tarhash=$(tarsum < $layerPath/layer.tar)
--transform 's,^nix(/|$),/nix/,' \
--transform 's,^[^/],/nix/store/\0,rS' |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow... ok. Thanks for explanations!

@nlewo
Copy link
Member

nlewo commented May 19, 2020

@GrahamcOfBorg test docker-tools

@nlewo nlewo merged commit a498da3 into NixOS:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants