glusterfs service: add support for TLS communication #27340

bachp · 2017-07-12T20:23:10Z

Motivation for this change

This allows to easily setup glusterfs with TLS support. The user has to provide three files:

Private Key
Certificate
Certificate Authority

There are still some open points I would appreciate some feedback:

Enabling TLS in GlusterFS works by placing the certificates in /etc/ssl and creating the marker file /var/lib/glusterd/secure-access. This is the same setup for Servers and Clients wanting to mount from a secure server. This PR only activates this for the server part. This means one can mount a TLS secured volume on a machine where the server is also running. To improve this the TLS option should probably be moved out of the service config, but I'm not sure how.

Things done

Tested using sandboxing
(nix.useSandbox on NixOS,
or option build-use-sandbox in nix.conf
on non-NixOS)
Built on platform(s)
- NixOS
- macOS
- Linux
Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
Tested execution of all binary files (usually in ./result/bin/)
Fits CONTRIBUTING.md.

/cc @nh2

mention-bot · 2017-07-12T20:23:12Z

@bachp, thanks for your PR! By analyzing the history of the files in this pull request, we identified @nh2 to be a potential reviewer.

grahamc · 2017-07-22T13:10:52Z

nixos/modules/services/network-filesystems/glusterfs.nix

+
+  restartTriggers = [
+    config.environment.etc."ssl/glusterfs.pem".source
+    config.environment.etc."ssl/glusterfs.key".source


Users should probably not put the private key in to the nix store.

What is a better approach?

What is a better approach?

Giving it a path (that's not in the nix store).

For example, nixops has a keys functionality that places secret keys into a special directory that doesn't survive reboots.

nh2 · 2017-07-22T17:59:05Z

nixos/modules/services/network-filesystems/glusterfs.nix

+  ''
+  else
+  ''
+    rm -f /var/lib/glusterd/secure-access


Deleting stuff in /var isn't soething that NixOS modules typically do, this behaviour must definitely be documented in the description of enableTLS below.

I guess it is OK to do it in this case because glusterfs leaves us no other choice: There's no other way to tell it to turn encryption on or off.

If we don't want to add the keys to the nixstore and don't manipulate the file in /var/lib/glusterd then there is actually nothing much this change does. As anyway everything (put keys somewhere, put activation file in /var) has to be done outside of nix?

I think you misunderstood me: I think this PR is good, and that deleting secure-access here is OK. You just need to document that behaviour in the option description.

Also, having the pubkey in the store is OK, it's just the private key that should be a file path instead.

Ok thanks for clarifying :)

I would imagine instead of putting the private key into the config I would just put a path to a file.
What's the mechanism to make sure this gets symlinked to the correct location in etc? Also via config.environment.etc?

Is there a similar mechanism that could be used to put the file into /var?

nh2 · 2017-07-22T18:01:35Z

nixos/modules/services/network-filesystems/glusterfs.nix

+
+  restartTriggers = [
+    config.environment.etc."ssl/glusterfs.pem".source
+    config.environment.etc."ssl/glusterfs.key".source


What is a better approach?

Giving it a path (that's not in the nix store).

For example, nixops has a keys functionality that places secret keys into a special directory that doesn't survive reboots.

nh2 · 2017-07-22T18:06:27Z

nixos/modules/services/network-filesystems/glusterfs.nix

+      };
+
+      caCert = mkOption {
+        default = null;


There should be some asserts here that ensure that when enableTLS is true, tlsKey and tlsKey and tlsPem are not null.

Better yet, make a tlsSettings option of type submodule that can be either null (for TLS being disabled), or a {} with the 3 options inside, each non-nullable.

An example of how to do that can be found here:

https://github.com/nh2/nixops-gluster-example/blob/8a859f0a702efd38bef9a202e1b031f1db6a44d3/modules/nh2-glusterfs-server.nix#L128-L160

bachp · 2017-09-10T07:43:20Z

I reworked the PR to make use of submodule as @nh2 suggested.

joachifm · 2017-09-17T11:14:07Z

Is this ready?

bachp · 2017-09-17T11:27:25Z

@joachifm from my point it is. I would appreciate some feedback from @nh2 tough.

nh2 · 2017-09-17T13:29:34Z

nixos/modules/services/network-filesystems/glusterfs.nix

+            caCert = mkOption {
+              default = null;
+              type = types.path;
+              description = "Path certificate authority used to signe the cluster certificates.";


2 small typos: Path to and signe -> sign.

nh2 · 2017-09-17T13:29:53Z

nixos/modules/services/network-filesystems/glusterfs.nix

+
+      tlsSettings = mkOption {
+        description = ''
+          Make the server communicat via TLS.


typo communicate

I recommend adding a link to the docs: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/SSL/

nh2 · 2017-09-17T13:35:10Z

nixos/modules/services/network-filesystems/glusterfs.nix

+        type = types.nullOr (types.submodule {
+          options = {
+            tlsKey = mkOption {
+              default = null;


I think you need to remove this default = null and the other two below.

Either we want that the entire tlsSettings dict is null, or we want that it is not null, and in that case we want that the user provides all 3 of tlsKey, tlsPem and caCert, so those should not be nullable.

nh2 · 2017-09-17T13:49:56Z

nixos/modules/services/network-filesystems/glusterfs.nix

+    environment.etc = mkIf (cfg.tlsSettings != null) {
+      "ssl/glusterfs.pem".source = cfg.tlsSettings.tlsPem;
+      "ssl/glusterfs.key".source = cfg.tlsSettings.tlsKey;
+      "ssl/glusterfs.ca".source = cfg.tlsSettings.caCert;


OK, here I'm not quite sure what source exactly does.

Does it just point the symlink /etc/ssl/glusterfs.key -> theGivenPath? Or does it take a local path to a file, put it into the nix store, and create a symlink /etc/ssl/glusterfs.key -> /nix/store/thefile?

The latter would be bad, because that would make the private key world readable in /nix/store.

We definitely need the former, so that e.g. in the case of nixops we can obtain a symlink like /etc/ssl/glusterfs.key -> /var/run/keys/... where the target directory is the keys directory managed by nixops and not part of the store.

Can anybody shed light on this?

On my machine it just creates the files end up in the nix store :(

lrwxrwxrwx 1 root root 28 Sep 17 17:33 glusterfs.ca -> /etc/static/ssl/glusterfs.ca lrwxrwxrwx 1 root root 29 Sep 17 17:33 glusterfs.key -> /etc/static/ssl/glusterfs.key lrwxrwxrwx 1 root root 29 Sep 17 17:33 glusterfs.pem -> /etc/static/ssl/glusterfs.pem

where:

lrwxrwxrwx 1 root root 50 Jan 1 1970 glusterfs.ca -> /nix/store/n9h1w46qjcbsrqpr38i7n6vdqb8cfj16-ca.pem lrwxrwxrwx 1 root root 61 Jan 1 1970 glusterfs.key -> /nix/store/8skzrsdyx46kjzpz8m5kvj6y9x4qhqyy-cleopatra-key.pem lrwxrwxrwx 1 root root 57 Jan 1 1970 glusterfs.pem -> /nix/store/54y38wlzipf142mxzki2qh3sz4hp4zxz-cleopatra.pem

I'm not sure how to do this properly?

Is there a particular reason for requiring the use of environment.etc? Compared to, say, using preStart or a dedicated unit for this purpose?

The original entry I have in my configuration.nix:

services.glusterfs = { enable = true; tlsSettings = { tlsKey = /home/pascal/ca/cleopatra-key.pem; tlsPem = /home/pascal/ca/cleopatra.pem; caCert = /home/pascal/ca/ca.pem; }; };

What if you enclose the paths in quotes, so they're treated as string literals and not paths (which will be added to the store on eval)?

(My brief reading of the etc builder code suggests it'll take whatever you give it, so long as it looks like a filepath).

nh2 · 2017-09-17T14:03:08Z

nixos/modules/services/network-filesystems/glusterfs.nix

@@ -70,11 +128,12 @@ in
        PIDFile="/run/glusterd.pid";
        LimitNOFILE=65536;
        ExecStart="${glusterfs}/sbin/glusterd -p /run/glusterd.pid --log-level=${cfg.logLevel} ${toString cfg.extraFlags}";
-        KillMode="process";
+        KillMode="control-group";


I would not make this change part of this commit / PR, but do that separately.

I have a branch nh2-glusterfs-service-improvements that includes this commit to handle this.

My plan is that after your PR here has landed, we can merge the remaining improvements from my branch, and then do the gluster 3.12 upgrade (#29062).

Good point I will remove this.

My plan is that after your PR here has landed, we can merge the remaining improvements from my branch

I've PR'd the changes here now: #29868

nh2 · 2017-09-17T14:04:55Z

I've made a couple comments / requests, the most important one is to figure out whether or not the current approach puts the private key into the nix store, if yes we have to find a different way.

nh2 · 2017-09-17T14:05:23Z

Also note for myself: Once we've figured it out, I need to update my https://github.com/nh2/nixops-gluster-example/blob/8a859f0a702efd38bef9a202e1b031f1db6a44d3/modules/glusterfs-SSL-setup.nix#L87 which definitely has it in /nix/store, which is bad.

joachifm · 2017-09-17T16:52:08Z

nixos/modules/services/network-filesystems/glusterfs.nix

              default = null;
-              type = types.path;
+              type = types.str;


This does not protect against accidentally copying things into the store. It still accepts path literals, as in /my/secret.key which will be copied on eval.

TLS settings are implemented as submodule.

bachp · 2017-09-17T16:54:23Z

@nh2 Thanks for your feedback. All comments are addressed.

joachifm · 2017-09-17T16:57:15Z

Changing the option type won't protect against accidentally copying files into the Nix store.

bachp · 2017-09-17T19:46:20Z

@joachifm How you mean that? Like this I'm not able to pass a path to the option?

joachifm · 2017-09-17T19:50:45Z

Two things: first, any value accepted by types.path is also accepted by types.str (the latter is a superset of the former). Secondly, Nix copies stuff into the store as a side-effect of evaluating them. This eval is different from module eval and may occur before option type checking, so you could still end up copying into the Nix store even if the module type-check eventually fails.

joachifm · 2017-09-17T19:52:08Z

What you really want is a Nix level mechanism for preventing copies, I don't think you can easily achieve this via module options.

joachifm · 2017-09-17T19:58:23Z

See https://github.com/NixOS/nixpkgs/blob/master/lib/types.nix#L167 for the definition of the types.path, you'll note it's just a string that begins with '/'.

bachp · 2017-09-17T20:00:14Z

@joachifm Just to clarify if:

tlsKeyPath = /path/key.pem;
a. copies the key to nix store if types.path
b. gives an error of type missmatch if types.str;
tlsKeyPath = "/path/key.pem"; creates a symlink from /etc/static/ssl/glusterfs.key -> /path/key.pem for both cases

You are saying in that case 1b fails but still has copied the key to the store?

Is case 2 always "safe" if I give it a string? And 1 is always "unsafe"?

joachifm · 2017-09-17T20:01:43Z

By my analysis, 1 is always unsafe, and 2 is always safe. The copying that occurs in 1 is unrelated to the declared option type.

joachifm · 2017-09-17T20:02:38Z

Well, I should say, safe in terms of what the Nix evaluator will do, the module could of course copy stuff in a different way (including it in writeText or what have you).

joachifm · 2017-09-17T20:03:48Z

To clarify, the copying I'm worried about is what occurs when Nix evaluates /foo/bar/baz versus what happens when it evaluates "/foo/bar/baz".

bachp · 2017-09-17T20:14:05Z

@joachifm @nh2 So from what I hear I say we can drop 40e122e again as it doesn't add any safety.

joachifm · 2017-09-17T20:21:39Z

@bachp yes, if safety is the only motivation for the change it can be dropped IMO.

joachifm · 2017-09-17T20:31:57Z

I think the best you can do right now is warn/fail if a keyfile path points into the Nix store, as that indicates accidental capture. The key will have leaked but at least the user knows. Other than that I imagine you need a Nix level mechanism to really fix this (unless I'm taking crazy pills or overlooking something, which is certainly possible).

bachp · 2017-09-17T20:35:41Z

I dropped the second commit.

nh2 · 2017-09-18T23:29:12Z

@bachp I think we should keep the commit.

There's an important distinction here:

If you use nixops, then as you describe in #27340 (comment), method (1) copies the value into your local nix store. But afterwards, the evaluation will fail, and nixops will not get a chance to copy the world-readable key into the nix store of the remote machine(s).

That is much better, because that way the key leakage is confined to your own machine (the one you run nixops on), where you can easily undo it by deleting the built store path, and does not get deployed into the world.

And, as @joachifm says, it will ensure that the the user knows that they messed up (even though they still have to manually remove the key from their store).

This pervents the user from accidently commiting the key to the nix store. If providing a path instead of a string.

bachp · 2017-09-21T18:37:00Z

@joachifm @nh2 I re-added the commit. I think it is good to go now.

joachifm · 2017-09-21T20:27:27Z

Thank you

nh2 · 2017-09-21T23:20:45Z

Thanks!

nh2 · 2017-10-04T15:57:47Z

Posting a comment here that I had before in my custom config file:

Note: If the /var/lib/glusterd/secure-access thing breaks in the future and we get

error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number

in the server log, then that means that nixpkgs has switched the prefix localstatedir=/var to something different; the secure-access file must be under that prefix.

Just in case somebody needs to google that error message in the future.

So far, after a full reboot of all machines, some would sometimes have failed systemd units. Key changes: * A mount-only machine is added to test that this use case works. This made me find all the below troubles. * Fix SSH hang by using .mount unit instead of fstab converter. This apparently works around NixOS/nixpkgs#30348 for me. No idea why the fstab converter would have this problem. The nasty pam_systemd(sshd:session): Failed to create session: Connection timed out error would slow down SSH logins by 25 seconds, also making reboots slower (because nixops keys upload uses SSH). It would also show things like `session-1.scope` as failed in systemctl. * More robustly track (via Consul) whether the Gluster volume is already mountable from the client (that is, up and running on the servers). This has come a long way; to implement this, I've tried now * manual sessions, but those have 10 second min TTL which gets auto-extended even longer when rebooting, so I tried * script checks, which don't kill the subprocess even when you give a `timeout` and don't allow to set a TTL, so I tried * TTL checks + manual update script, and not even those set the check to failed when the TTL expires See my filed Consul bugs: * hashicorp/consul#3569 * hashicorp/consul#3563 * hashicorp/consul#3565 So I am using a more specific workaround now: A TTL check + manual update script, AND a script (`consul-scripting-helper.py waitUntilService --wait-for-index-change`) run by a service (`glusterReadyForClientMount.service`) that waits until the TTL of a check for the service is observed to be bumped at least once during the life-time of the script. When the script observes a TTL bump, we can be sure that at least one of the gluster servers has its volume up. * `gluster volume status VOLUME_NAME detail | grep "^Online.*Y"` is used to check whether the volume is actually up. * Using consul's DNS feature to automatically pick an available server for the mount. dnsmasq is used to forward DNS queries to the *.consul domain to the consul agent. `allow_stale = false` is used to ensure that the DNS queries are not outdated. * Create `/etc/ssl/dhparam.pem` to avoid spurious Gluster warnings (see https://bugzilla.redhat.com/show_bug.cgi?id=1398237). * `consul-scripting-helper.py` received some fixes and extra loops to retry when Consul is down. This commit also switches to using `services.glusterfs.tlsSettings` as implemented in NixOS/nixpkgs#27340 which revealed a lot of the above issues.

grahamc reviewed Jul 22, 2017

View reviewed changes

nh2 requested changes Jul 22, 2017

View reviewed changes

bachp force-pushed the glusterfs-tls branch from 0929800 to 964b5ce Compare September 10, 2017 07:42

joachifm added 0.kind: enhancement 6.topic: nixos labels Sep 17, 2017

nh2 requested changes Sep 17, 2017

View reviewed changes

joachifm reviewed Sep 17, 2017

View reviewed changes

bachp force-pushed the glusterfs-tls branch from d682923 to 40e122e Compare September 17, 2017 16:53

bachp force-pushed the glusterfs-tls branch from 40e122e to c68118c Compare September 17, 2017 20:34

joachifm added this to the 17.09 milestone Sep 20, 2017

gluster service: use str instead of path for private key

8ed7586

This pervents the user from accidently commiting the key to the nix store. If providing a path instead of a string.

joachifm merged commit c913f71 into NixOS:master Sep 21, 2017

nh2 mentioned this pull request Sep 27, 2017

glusterfs service: a few fixes and improvements #29868

Merged

8 tasks

glusterfs service: add support for TLS communication #27340

glusterfs service: add support for TLS communication #27340

Conversation

bachp commented Jul 12, 2017

Motivation for this change

Things done

mention-bot commented Jul 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachp commented Sep 10, 2017

joachifm commented Sep 17, 2017

bachp commented Sep 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nh2 commented Sep 17, 2017

nh2 commented Sep 17, 2017

Choose a reason for hiding this comment

bachp commented Sep 17, 2017

joachifm commented Sep 17, 2017 • edited Loading

bachp commented Sep 17, 2017

joachifm commented Sep 17, 2017 • edited Loading

joachifm commented Sep 17, 2017

joachifm commented Sep 17, 2017

bachp commented Sep 17, 2017 • edited Loading

joachifm commented Sep 17, 2017

joachifm commented Sep 17, 2017

joachifm commented Sep 17, 2017 • edited Loading

bachp commented Sep 17, 2017

joachifm commented Sep 17, 2017

joachifm commented Sep 17, 2017 • edited Loading

bachp commented Sep 17, 2017

nh2 commented Sep 18, 2017

bachp commented Sep 21, 2017

joachifm commented Sep 21, 2017

nh2 commented Sep 21, 2017

nh2 commented Oct 4, 2017

joachifm commented Sep 17, 2017 •

edited

Loading

joachifm commented Sep 17, 2017 •

edited

Loading

bachp commented Sep 17, 2017 •

edited

Loading

joachifm commented Sep 17, 2017 •

edited

Loading

joachifm commented Sep 17, 2017 •

edited

Loading