New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multi threaded xz encoder #1848
Conversation
the case of hydra where the overhead of single threaded encoding is more noticeable e.g most of the time spent in "Sending inputs"/"Receiving outputs" is due to compression while the actual upload to the binary cache seems to be negligible.
some comments about possible improvements wrt memory usage/threading.
Some numbers to compare the 2 options:
So using a 1G build output, parallel compression takes 11477 ms compared to 24765 ms with single threaded xz. Compression ratio seems to deteriorate a bit though:
|
@AmineChikhaoui How many cores was that? |
Hm, maybe the setting should be moved to BinaryCacheStore. Then it can be specified in the store URI, e.g. |
@edolstra that was on my laptop with 4 cores. |
yeah I'll move the setting to the binary cache store, seems better to have it in the url 👍 |
can be done now from the url e.g s3://nix-cache?parallel-compression=1 instead of nix.conf.
May want to touchup Hydra's code and handling of its "compress_num_threads" option to avoid unexpected behavior when used with a storeURI that sets parallel-compression. |
@dtzWill from what I see in hydra
I don't think this PR has an impact on that part though ? |
@AmineChikhaoui eep my mistake! Thanks for taking a look :). |
This seems to work for me, but will probably need to test more and check if we need to handle some more corner cases e.g if the number of cpu cores cannot be detected fallback to lzma_easy_encoder or set the threads to 1 which is suggested in some comments in xz source/examples.
The intention of the PR is mainly to take a stab at reducing the overhead of compression in hydra + a binary cache where it seems there is a considerable time spent in compression whenever hydra-queue-runner sends derivations inputs to the binary cache or uploads build outputs.