[WIP] add support for zstd compression #2277

dtzWill · 2018-07-05T23:42:06Z

Notes:

Default zstd level is '3' (range: [1,19])
This PR builds on top of add support for specifying 'compression-level' for NAR's #2276, so duplicates that commit
- Seemed more useful and easier to implement level support than add later
Copying paths to local stores and checking with 'verify' for various closures
... but I'd feel better with some more testing
Additional review of API usage and such would be appreciated,
- "Some day" I'd like to revisit this with 'fresh' eyes/brain, but who can say when that'll be... :)
The sizes of the buffers used are much larger than used for other compression methods-- this might bias throughput tests and such. This is for three reasons:
1. zstd suggests size of buffers to use with special methods (for each of {compress,decompress}x{input,output})
2. it's what examples/documentation use
3. using sufficiently sized buffers provides some nice properties like "guarantee to successfully flush at least one complete compressed block in all circumstances."
zstd documentation is pretty clear that stream/context objects should be reused instead of always allocated/initialized-- which isn't what our compression bits are geared for currently, but if we're seeing things like very small files being slow to compress w/zstd this might be the cause.

Fixes Makefile.config containing: `HAVE_BROTLI = 1]` Which was apparently harmless (unused, I think) but is good to fix anyway.

allocate out-of-object to keep things neat.

endgame

I haven't done C++ in a while, so feel free to ignore these comments if you disagree with them.

endgame · 2018-07-05T23:45:29Z

src/libutil/compression.cc

@@ -13,12 +13,37 @@
 #include <brotli/encode.h>
 #endif // HAVE_BROTLI

+#if HAVE_ZSTD
+#include <zstd.h>
+// Not set by earlier versions, so be sure it's set


Would it be better to do this check with autoconf, or by making a wrapper-header around zstd.h?
I think the answer depends on how many files you expect to do zstd stuff.

Not sure, I'm not loving the macro usage either so suggestions welcome. Not sure what you're suggesting in first clause-- the HAVE_ZSTD macro is the result of an autoconf check added in this PR.
The second suggestion, generating a wrapper, has potential re:making things less messy.

We could just make zstd non-optional (same for brotli) but isn't done for various reasons-- I think the most important being that not all distributions Nix wants to support provide these packages...

Suggestions welcome, I'll think about this a bit. Popular alternative is to have build system selectively include files -- not sure if that's preferred or not.

In the first alternative, I mean do a check in configure.ac:

AC_MSG_CHECKING([for ZSTD_CLEVEL_DEFAULT in zstd.h]) AC_PREPROC_IFELSE([AC_LANG_SOURCE([[#include <zstd.h> #ifndef ZSTD_CLEVEL_DEFAULT #error #endif]])],[AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no]), AC_DEFINE([ZSTD_CLEVEL_DEFAULT], [3])])

A wrapper header is probably better, that includes <zstd.h> and defines things that might not be present.

endgame · 2018-07-05T23:47:06Z

src/libutil/compression.cc

@@ -214,14 +281,21 @@ void decompress(const std::string & method, Source & source, Sink & sink)
        return decompressBzip2(source, sink);
    else if (method == "br")
        return decompressBrotli(source, sink);
+#if HAVE_ZSTD


Why does this need CPP guards, but brotli doesn't?

Because nix's brotli support is always "possible"-- if not built with libbrotli Nix will look for the brotli executable on PATH and try to use that with fingers crossed. Among other disadvantages, the way this is implemented operates on the entire input at once IIRC, reading into a giant std::string. Doing this for zstd didn't seem like a good idea, thinking it should be supported properly or not at all.

Brotli actually has some preprocessor bits but they're inside the decompressBrotli method instead of around it.

endgame · 2018-07-05T23:48:33Z

src/libutil/compression.cc

@@ -302,11 +376,11 @@ struct XzSink : CompressionSink
 #ifdef HAVE_LZMA_MT
 struct ParallelXzSink : public XzSink
 {
-  ParallelXzSink(Sink &nextSink) : XzSink(nextSink, [this]() {
+  ParallelXzSink(Sink &nextSink, int level) : XzSink(nextSink, [this](unsigned level) {


Can you capture level in the lambda, and avoid having to declare/pass the parameter in?

First--yikes indeed re:this mess. :(. Apologies.

That out of the way, I actually originally captured level in the lambda but then the checkLevel(..) was either duplicated or a new function....

Hmm I think I just left this to focus on getting things working first--- but never came back to try to do something reasonable here O:). Good call, thank you.

Match what we do in other compression methods. This is particularly important when using higher compression levels, in order to ensure interrupts are noticed in a somewhat responsive manner. Breaking into chunks as hinted by zstd should help this further, but this makes things much better already. (not following those hints shouldn't be much worse than what we have with other methods)

coretemp

It seems that a lot of parameters are being passed around everywhere. It would be better to refactor to have a CompressionConfiguration class.

Otherwise I agree with most points raised by the other reviewer.

Good change overall.

domenkozar · 2019-07-02T17:43:38Z

This would be nice to revive as xz issue is getting annoying :)

tomberek · 2020-01-07T03:10:17Z

Too much has changed in master for a trivial rebase.

domenkozar · 2020-04-09T09:26:16Z

@dtzWill are you going to give this one another go?

edolstra · 2020-04-09T09:50:29Z

#3333 may be a better way to get zstd support.

dtzWill added 6 commits July 5, 2018 14:25

add support for specifying 'compression-level' for NAR's

4d14484

configure.ac: minor touchup to brotli code

39e67d7

Fixes Makefile.config containing: `HAVE_BROTLI = 1]` Which was apparently harmless (unused, I think) but is good to fix anyway.

add initial zstd support

64445e3

zstd: use recommended buffer size

8a4ad60

allocate out-of-object to keep things neat.

binary-cache-store: zstd file extension

cdae719

zstd decompression

b41432b

endgame reviewed Jul 5, 2018

View reviewed changes

coretemp reviewed Jul 6, 2018

View reviewed changes

domenkozar closed this Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] add support for zstd compression #2277

[WIP] add support for zstd compression #2277

dtzWill commented Jul 5, 2018 •

edited

endgame left a comment

endgame Jul 5, 2018

dtzWill Jul 6, 2018

endgame Jul 7, 2018

endgame Jul 5, 2018

dtzWill Jul 6, 2018

endgame Jul 5, 2018

dtzWill Jul 6, 2018

coretemp left a comment

domenkozar commented Jul 2, 2019

tomberek commented Jan 7, 2020

domenkozar commented Apr 9, 2020

edolstra commented Apr 9, 2020

[WIP] add support for zstd compression #2277

[WIP] add support for zstd compression #2277

Conversation

dtzWill commented Jul 5, 2018 • edited

endgame left a comment

Choose a reason for hiding this comment

endgame Jul 5, 2018

Choose a reason for hiding this comment

dtzWill Jul 6, 2018

Choose a reason for hiding this comment

endgame Jul 7, 2018

Choose a reason for hiding this comment

endgame Jul 5, 2018

Choose a reason for hiding this comment

dtzWill Jul 6, 2018

Choose a reason for hiding this comment

endgame Jul 5, 2018

Choose a reason for hiding this comment

dtzWill Jul 6, 2018

Choose a reason for hiding this comment

coretemp left a comment

Choose a reason for hiding this comment

domenkozar commented Jul 2, 2019

tomberek commented Jan 7, 2020

domenkozar commented Apr 9, 2020

edolstra commented Apr 9, 2020

dtzWill commented Jul 5, 2018 •

edited