Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sd-image: use parallel pbzip2 implementation #85941

Closed
wants to merge 1 commit into from

Conversation

prusnak
Copy link
Member

@prusnak prusnak commented Apr 24, 2020

Motivation for this change

Using parallel bzip2 implementation is faster on multi-processor machines. I tried to compress a 2.7 GB aarch64 NixOS image with the following results:

bzip2

  • output size: 627353716 bytes
  • duration: 217 seconds
real	3m36.963s
user	3m35.370s
sys	0m1.552s

pbzip2 -p4:

  • output size: 632445704 bytes (increase of ~0.1%)
  • duration: 96 seconds (55% faster)

real	1m36.191s
user	6m11.421s
sys	0m8.036s
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@prusnak
Copy link
Member Author

prusnak commented Apr 24, 2020

Another option would be to use zstd for the final compression of the image (zstd is already a dependency as it decompresses the rootfs image).

What do you think? cc @lovesegfault @AmineChikhaoui

zstd -T4:

  • output size: 614908377 (decrease of 2%)
  • duration: 10 seconds (95% faster) 🤯
real	0m10.069s
user	0m36.025s
sys	0m1.431s

It seems that zstd is even in the Debian stable, so this change might be worth considering.

@lovesegfault
Copy link
Member

@prusnak Hi!

I agree that using bzip2 is counterproductive as it's extremely slow, especially on more constrained HW. I think pbzip is okay as a stop-gap, but just using zstd is a much better approach that will benefit even single-core systems.

FWIW: I've suggested in the past that we move all our compression to zstd as it has amazing ratio and speed.

@prusnak
Copy link
Member Author

prusnak commented Apr 24, 2020

I will gladly rework the PR to use zstd instead if that's what we really want.

@lovesegfault
Copy link
Member

IIRC the last time I tried to do this there was some discussion on whether or not the output format, in this case the compression algo, of a drv constituted a "public interface" or not; in which case changing it can't be done without further consideration as it breaks things.

IMO: The benefits are significant enough to be worth it, but it needs to be added to the release notes.

@prusnak
Copy link
Member Author

prusnak commented Apr 24, 2020

Okay, let's close this and pursue a bigger change (zstd) in another PR: #85947

@prusnak prusnak closed this Apr 24, 2020
@prusnak prusnak deleted the sd-image-pbzip2 branch April 24, 2020 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants