Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add support for zstd compression #2277

Closed
wants to merge 7 commits into from

Conversation

dtzWill
Copy link
Member

@dtzWill dtzWill commented Jul 5, 2018

Notes:

  • Default zstd level is '3' (range: [1,19])

  • This PR builds on top of add support for specifying 'compression-level' for NAR's #2276, so duplicates that commit

    • Seemed more useful and easier to implement level support than add later
  • Copying paths to local stores and checking with 'verify' for various closures

  • ... but I'd feel better with some more testing

  • Additional review of API usage and such would be appreciated,

    • "Some day" I'd like to revisit this with 'fresh' eyes/brain, but who can say when that'll be... :)
  • The sizes of the buffers used are much larger than used for other compression methods-- this might bias throughput tests and such. This is for three reasons:

    1. zstd suggests size of buffers to use with special methods (for each of {compress,decompress}x{input,output})
    2. it's what examples/documentation use
    3. using sufficiently sized buffers provides some nice properties like "guarantee to successfully flush at least one complete compressed block in all circumstances."
  • zstd documentation is pretty clear that stream/context objects should be reused instead of always allocated/initialized-- which isn't what our compression bits are geared for currently, but if we're seeing things like very small files being slow to compress w/zstd this might be the cause.

Fixes #2255.

Fixes Makefile.config containing:
`HAVE_BROTLI = 1]`

Which was apparently harmless (unused, I think)
but is good to fix anyway.
allocate out-of-object to keep things neat.
Copy link
Contributor

@endgame endgame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done C++ in a while, so feel free to ignore these comments if you disagree with them.

@@ -13,12 +13,37 @@
#include <brotli/encode.h>
#endif // HAVE_BROTLI

#if HAVE_ZSTD
#include <zstd.h>
// Not set by earlier versions, so be sure it's set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to do this check with autoconf, or by making a wrapper-header around zstd.h?
I think the answer depends on how many files you expect to do zstd stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I'm not loving the macro usage either so suggestions welcome. Not sure what you're suggesting in first clause-- the HAVE_ZSTD macro is the result of an autoconf check added in this PR.
The second suggestion, generating a wrapper, has potential re:making things less messy.

We could just make zstd non-optional (same for brotli) but isn't done for various reasons-- I think the most important being that not all distributions Nix wants to support provide these packages...

Suggestions welcome, I'll think about this a bit. Popular alternative is to have build system selectively include files -- not sure if that's preferred or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first alternative, I mean do a check in configure.ac:

AC_MSG_CHECKING([for ZSTD_CLEVEL_DEFAULT in zstd.h])
AC_PREPROC_IFELSE([AC_LANG_SOURCE([[#include <zstd.h>
#ifndef ZSTD_CLEVEL_DEFAULT
#error
#endif]])],[AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no]), AC_DEFINE([ZSTD_CLEVEL_DEFAULT], [3])])

A wrapper header is probably better, that includes <zstd.h> and defines things that might not be present.

@@ -214,14 +281,21 @@ void decompress(const std::string & method, Source & source, Sink & sink)
return decompressBzip2(source, sink);
else if (method == "br")
return decompressBrotli(source, sink);
#if HAVE_ZSTD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need CPP guards, but brotli doesn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because nix's brotli support is always "possible"-- if not built with libbrotli Nix will look for the brotli executable on PATH and try to use that with fingers crossed. Among other disadvantages, the way this is implemented operates on the entire input at once IIRC, reading into a giant std::string. Doing this for zstd didn't seem like a good idea, thinking it should be supported properly or not at all.

Brotli actually has some preprocessor bits but they're inside the decompressBrotli method instead of around it.

@@ -302,11 +376,11 @@ struct XzSink : CompressionSink
#ifdef HAVE_LZMA_MT
struct ParallelXzSink : public XzSink
{
ParallelXzSink(Sink &nextSink) : XzSink(nextSink, [this]() {
ParallelXzSink(Sink &nextSink, int level) : XzSink(nextSink, [this](unsigned level) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you capture level in the lambda, and avoid having to declare/pass the parameter in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First--yikes indeed re:this mess. :(. Apologies.

That out of the way, I actually originally captured level in the lambda but then the checkLevel(..) was either duplicated or a new function....

Hmm I think I just left this to focus on getting things working first--- but never came back to try to do something reasonable here O:). Good call, thank you.

Match what we do in other compression methods.
This is particularly important when using higher
compression levels, in order to ensure interrupts
are noticed in a somewhat responsive manner.

Breaking into chunks as hinted by zstd should help
this further, but this makes things much better already.

(not following those hints shouldn't be much worse
than what we have with other methods)
Copy link

@coretemp coretemp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that a lot of parameters are being passed around everywhere. It would be better to refactor to have a CompressionConfiguration class.

Otherwise I agree with most points raised by the other reviewer.

Good change overall.

@domenkozar
Copy link
Member

This would be nice to revive as xz issue is getting annoying :)

@tomberek
Copy link
Contributor

tomberek commented Jan 7, 2020

Too much has changed in master for a trivial rebase.

@domenkozar
Copy link
Member

@dtzWill are you going to give this one another go?

@edolstra
Copy link
Member

edolstra commented Apr 9, 2020

#3333 may be a better way to get zstd support.

@domenkozar domenkozar closed this Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support zstd compression for binary caches
6 participants