Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importPaths: Don't copy imported NAR into memory. #2206

Closed
wants to merge 1 commit into from

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Jun 3, 2018

Fixes error: out of memory of nix-store --serve --write
when receiving packages via SSH (and perhaps other sources).

See #1681 #1969 #1988 NixOS/nixpkgs#38808.

Performance improvement on nix-store --import of a 2.2 GB cudatoolkit closure:

When the store path already exists:
  Before:
    10.82user 2.66system 0:20.14elapsed 66%CPU (0avgtext+0avgdata   12556maxresident)k
  After:
    11.43user 2.94system 0:16.71elapsed 86%CPU (0avgtext+0avgdata 4204664maxresident)k
When the store path doesn't yet exist (after `nix-store --delete`):
  Before:
    11.15user 2.09system 0:13.26elapsed 99%CPU (0avgtext+0avgdata 4204732maxresident)k
  After:
     5.27user 1.48system 0:06.80elapsed 99%CPU (0avgtext+0avgdata   12032maxresident)k

The reduction is 4200 MB -> 12 MB RAM usage, and it also takes less time.

Fixes `error: out of memory` of `nix-store --serve --write`
when receiving packages via SSH (and perhaps other sources).

See NixOS#1681 NixOS#1969 NixOS#1988 NixOS/nixpkgs#38808.

Performance improvement on `nix-store --import` of a 2.2 GB cudatoolkit closure:

When the store path already exists:
  Before:
    10.82user 2.66system 0:20.14elapsed 66%CPU (0avgtext+0avgdata   12556maxresident)k
  After:
    11.43user 2.94system 0:16.71elapsed 86%CPU (0avgtext+0avgdata 4204664maxresident)k
When the store path doesn't yet exist (after `nix-store --delete`):
  Before:
    11.15user 2.09system 0:13.26elapsed 99%CPU (0avgtext+0avgdata 4204732maxresident)k
  After:
     5.27user 1.48system 0:06.80elapsed 99%CPU (0avgtext+0avgdata   12032maxresident)k

The reduction is 4200 MB -> 12 MB RAM usage, and it also takes less time.
@nh2
Copy link
Contributor Author

nh2 commented Jun 3, 2018

Convenience override to try this patch out (stick it into the machine config of the target machine, e.g. I put it in nixops's libvirtd-image.nix):

{
      # Patch nix to avoid problem running out of RAM when sending closures to the machine.
      # See
      #   https://github.com/NixOS/nix/issues/1681
      #   https://github.com/NixOS/nix/issues/1969
      #   https://github.com/NixOS/nixpkgs/issues/38808
      nixpkgs.config.packageOverrides = pkgs: {
        nix = pkgs.nixUnstable.overrideAttrs (oldAttrs: {
          src = pkgs.fetchFromGitHub {
            owner = "NixOS";
            repo = "nix";
            rev = "54b1c596435b0aaf3a2557652ad4bf74d5756514";
            sha256 = "0g7knsfj445r50rk0d9hm5n1pv20k542bz6xf5c47qmkgvfa40x4";
          };
          patches = [
            (pkgs.fetchpatch {
              url = "https://github.com/nh2/nix/commit/d31a4410d92790e2c27110154896445d99d7abfe.patch";
              sha256 = "08gcw2xw8yc61zz2nr1j3cnd6wagp5qs02mjfazrd9wa045y26hg";
            })
          ];
          # Changes cherry-picked from upstream nix `release-common.nix` that
          # aren't in `pkgs.nixUnstable` yet:
          buildInputs = oldAttrs.buildInputs ++ [
            (pkgs.aws-sdk-cpp.override {
              apis = ["s3" "transfer"];
              customMemoryManagement = false;
            })
          ];
        });
      };
}

@nh2
Copy link
Contributor Author

nh2 commented Jun 3, 2018

When we got this fixed, ideally somebody would also update the 18.03 EC2 images so that people can actually use them as a base on EC2 machines with <= 4 GB ram.

@edolstra
Copy link
Member

edolstra commented Jun 4, 2018

This doubles the amount of I/O and disk space needed on the target machine.

@nh2
Copy link
Contributor Author

nh2 commented Jun 4, 2018

This doubles the amount of I/O and disk space needed on the target machine.

Yes, based on the idea that most machines have more than 2x as much disk as they have RAM (especially when they have a /nix/store somewhere).

In terms of IO, my measurements show it to be faster overall. Not super sure how that happens.

But in general for machines with reasonable RAM, all the data written to disk will be in the OS buffer cache anyway (so it's practically in RAM if you have enough to store it there).

I think the best solution would be to stream the incoming data right into is final location in the nix store, but that was beyond my time allocation for this fix to make it work at all.

@bgamari
Copy link
Contributor

bgamari commented Jun 10, 2018

@edolstra, in my opinion doubling necessary disk storage in exchange for constant memory operations is absolutely the correct trade-off. Disk space is generally more plentiful than memory and is far easier to provision.

Even in the short nine months I have used nix I have run into memory issues more times than I care to recall. Just today I was unable to rebuild my mail server as nix started OOMing after a nixpkgs update. While thankfully today's event occurred during non-essential maintenance, this failure could just as easily happen during a more time-critical event. Events like this has made me think twice before using nix on smaller machines.

@sboosali
Copy link

do we know if/when this is getting merged?

@joepie91
Copy link

It's worth considering that while reducing disk space by stripping down installed packages is possible as an end user, this isn't really true for memory usage; it's very easy to hit the limit even when installing relatively few things. This suggests that the disk-space-instead-of-RAM tradeoff is worth it.

@samueldr
Copy link
Member

FWIW, this allowed a build to complete when it otherwise couldn't here. The disk-space-for-RAM tradeoff is important when dealing with big (gigabytes) products on machines with a smaller amount of RAM like SBCs (raspberry pi, etc.).

@thoughtpolice
Copy link
Member

I'm going to chime in here and say this is a very annoying problem for us at work, where we tend to have very large objects we push into the store -- we have one binary object that's over 30GB of raw data, and right now we have to manually register it into the store with mount/remount hacks. This is really nasty and troublesome because we have to do it on nearly a dozen machines individually.

The disk/memory tradeoff here is absolutely the correct one to make, as others have mentioned here. There will always be hierarchies in the storage layer -- slower, larger storage that's further away, and faster, less plentiful storage (DRAM) that's available locally. The fact that this increases I/O on the lower tier while decreasing the requirements on the less plentiful, far more expensive tier, is the only argument you need, because it's far more sustainable and scalable than expecting DRAM to magically increase in density or $/GB over night.

You could probably even make an argument based on the ratio of DRAM prices to storage and compute prices have skewed so much over the past few years that trading off disk space isn't just faster and actually makes Nix more usable -- it's actually a superior architecture for the way modern workloads and system capacity have progressed.

@angerman
Copy link
Contributor

angerman commented Aug 3, 2018

What's the holdup with this?

@edolstra
Copy link
Member

edolstra commented Aug 3, 2018

I've gone for a different solution (2825e05) that doesn't require writing the NAR to disk.

@thoughtpolice
Copy link
Member

Thank you @edolstra! I'm excited to try this since we actually ran into NAR import issues again just this week (copying large local files into the store). I'll see how this works for our use case of extremely large objects.

@domenkozar
Copy link
Member

closing as it's fixed

@domenkozar domenkozar closed this Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants