-
-
Notifications
You must be signed in to change notification settings - Fork 15.5k
glusterfs service: add support for TLS communication #27340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
restartTriggers = [ | ||
config.environment.etc."ssl/glusterfs.pem".source | ||
config.environment.etc."ssl/glusterfs.key".source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should probably not put the private key in to the nix store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a better approach?
Giving it a path (that's not in the nix store).
For example, nixops
has a keys
functionality that places secret keys into a special directory that doesn't survive reboots.
'' | ||
else | ||
'' | ||
rm -f /var/lib/glusterd/secure-access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting stuff in /var
isn't soething that NixOS modules typically do, this behaviour must definitely be documented in the description
of enableTLS
below.
I guess it is OK to do it in this case because glusterfs leaves us no other choice: There's no other way to tell it to turn encryption on or off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't want to add the keys to the nixstore and don't manipulate the file in /var/lib/glusterd
then there is actually nothing much this change does. As anyway everything (put keys somewhere, put activation file in /var
) has to be done outside of nix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misunderstood me: I think this PR is good, and that deleting secure-access
here is OK. You just need to document that behaviour in the option description
.
Also, having the pubkey in the store is OK, it's just the private key that should be a file path instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for clarifying :)
I would imagine instead of putting the private key into the config I would just put a path to a file.
What's the mechanism to make sure this gets symlinked to the correct location in etc? Also via config.environment.etc
?
Is there a similar mechanism that could be used to put the file into /var
?
|
||
restartTriggers = [ | ||
config.environment.etc."ssl/glusterfs.pem".source | ||
config.environment.etc."ssl/glusterfs.key".source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a better approach?
Giving it a path (that's not in the nix store).
For example, nixops
has a keys
functionality that places secret keys into a special directory that doesn't survive reboots.
}; | ||
|
||
caCert = mkOption { | ||
default = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be some assert
s here that ensure that when enableTLS
is true, tlsKey
and tlsKey
and tlsPem
are not null.
Better yet, make a tlsSettings
option of type submodule
that can be either null
(for TLS being disabled), or a {}
with the 3 options inside, each non-nullable.
An example of how to do that can be found here:
0929800
to
964b5ce
Compare
I reworked the PR to make use of submodule as @nh2 suggested. |
Is this ready? |
caCert = mkOption { | ||
default = null; | ||
type = types.path; | ||
description = "Path certificate authority used to signe the cluster certificates."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 small typos: Path to
and signe
-> sign
.
|
||
tlsSettings = mkOption { | ||
description = '' | ||
Make the server communicat via TLS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo communicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding a link to the docs: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/SSL/
type = types.nullOr (types.submodule { | ||
options = { | ||
tlsKey = mkOption { | ||
default = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to remove this default = null
and the other two below.
Either we want that the entire tlsSettings
dict is null
, or we want that it is not null, and in that case we want that the user provides all 3 of tlsKey
, tlsPem
and caCert
, so those should not be nullable.
environment.etc = mkIf (cfg.tlsSettings != null) { | ||
"ssl/glusterfs.pem".source = cfg.tlsSettings.tlsPem; | ||
"ssl/glusterfs.key".source = cfg.tlsSettings.tlsKey; | ||
"ssl/glusterfs.ca".source = cfg.tlsSettings.caCert; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, here I'm not quite sure what source
exactly does.
Does it just point the symlink /etc/ssl/glusterfs.key -> theGivenPath
? Or does it take a local path to a file, put it into the nix store, and create a symlink /etc/ssl/glusterfs.key -> /nix/store/thefile
?
The latter would be bad, because that would make the private key world readable in /nix/store
.
We definitely need the former, so that e.g. in the case of nixops we can obtain a symlink like /etc/ssl/glusterfs.key -> /var/run/keys/...
where the target directory is the keys directory managed by nixops and not part of the store.
Can anybody shed light on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine it just creates the files end up in the nix store :(
lrwxrwxrwx 1 root root 28 Sep 17 17:33 glusterfs.ca -> /etc/static/ssl/glusterfs.ca
lrwxrwxrwx 1 root root 29 Sep 17 17:33 glusterfs.key -> /etc/static/ssl/glusterfs.key
lrwxrwxrwx 1 root root 29 Sep 17 17:33 glusterfs.pem -> /etc/static/ssl/glusterfs.pem
where:
lrwxrwxrwx 1 root root 50 Jan 1 1970 glusterfs.ca -> /nix/store/n9h1w46qjcbsrqpr38i7n6vdqb8cfj16-ca.pem
lrwxrwxrwx 1 root root 61 Jan 1 1970 glusterfs.key -> /nix/store/8skzrsdyx46kjzpz8m5kvj6y9x4qhqyy-cleopatra-key.pem
lrwxrwxrwx 1 root root 57 Jan 1 1970 glusterfs.pem -> /nix/store/54y38wlzipf142mxzki2qh3sz4hp4zxz-cleopatra.pem
I'm not sure how to do this properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for requiring the use of environment.etc
? Compared to, say, using preStart
or a dedicated unit for this purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original entry I have in my configuration.nix
:
services.glusterfs = {
enable = true;
tlsSettings = {
tlsKey = /home/pascal/ca/cleopatra-key.pem;
tlsPem = /home/pascal/ca/cleopatra.pem;
caCert = /home/pascal/ca/ca.pem;
};
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you enclose the paths in quotes, so they're treated as string literals and not paths (which will be added to the store on eval)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My brief reading of the etc builder code suggests it'll take whatever you give it, so long as it looks like a filepath).
@@ -70,11 +128,12 @@ in | |||
PIDFile="/run/glusterd.pid"; | |||
LimitNOFILE=65536; | |||
ExecStart="${glusterfs}/sbin/glusterd -p /run/glusterd.pid --log-level=${cfg.logLevel} ${toString cfg.extraFlags}"; | |||
KillMode="process"; | |||
KillMode="control-group"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not make this change part of this commit / PR, but do that separately.
I have a branch nh2-glusterfs-service-improvements that includes this commit to handle this.
My plan is that after your PR here has landed, we can merge the remaining improvements from my branch, and then do the gluster 3.12 upgrade (#29062).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point I will remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan is that after your PR here has landed, we can merge the remaining improvements from my branch
I've PR'd the changes here now: #29868
I've made a couple comments / requests, the most important one is to figure out whether or not the current approach puts the private key into the nix store, if yes we have to find a different way. |
Also note for myself: Once we've figured it out, I need to update my https://github.com/nh2/nixops-gluster-example/blob/8a859f0a702efd38bef9a202e1b031f1db6a44d3/modules/glusterfs-SSL-setup.nix#L87 which definitely has it in |
default = null; | ||
type = types.path; | ||
type = types.str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not protect against accidentally copying things into the store. It still accepts path literals, as in /my/secret.key
which will be copied on eval.
TLS settings are implemented as submodule.
d682923
to
40e122e
Compare
@nh2 Thanks for your feedback. All comments are addressed. |
Changing the option type won't protect against accidentally copying files into the Nix store. |
@joachifm How you mean that? Like this I'm not able to pass a path to the option? |
Two things: first, any value accepted by |
What you really want is a Nix level mechanism for preventing copies, I don't think you can easily achieve this via module options. |
See https://github.com/NixOS/nixpkgs/blob/master/lib/types.nix#L167 for the definition of the |
@joachifm Just to clarify if:
You are saying in that case 1b fails but still has copied the key to the store? Is case 2 always "safe" if I give it a string? And 1 is always "unsafe"? |
By my analysis, 1 is always unsafe, and 2 is always safe. The copying that occurs in 1 is unrelated to the declared option type. |
Well, I should say, safe in terms of what the Nix evaluator will do, the module could of course copy stuff in a different way (including it in |
To clarify, the copying I'm worried about is what occurs when Nix evaluates |
@bachp yes, if safety is the only motivation for the change it can be dropped IMO. |
I think the best you can do right now is warn/fail if a keyfile path points into the Nix store, as that indicates accidental capture. The key will have leaked but at least the user knows. Other than that I imagine you need a Nix level mechanism to really fix this (unless I'm taking crazy pills or overlooking something, which is certainly possible). |
40e122e
to
c68118c
Compare
I dropped the second commit. |
@bachp I think we should keep the commit. There's an important distinction here: If you use nixops, then as you describe in #27340 (comment), method (1) copies the value into your local nix store. But afterwards, the evaluation will fail, and nixops will not get a chance to copy the world-readable key into the nix store of the remote machine(s). That is much better, because that way the key leakage is confined to your own machine (the one you run nixops on), where you can easily undo it by deleting the built store path, and does not get deployed into the world. And, as @joachifm says, it will ensure that the the user knows that they messed up (even though they still have to manually remove the key from their store). |
This pervents the user from accidently commiting the key to the nix store. If providing a path instead of a string.
Thank you |
Thanks! |
Posting a comment here that I had before in my custom config file:
Just in case somebody needs to google that error message in the future. |
So far, after a full reboot of all machines, some would sometimes have failed systemd units. Key changes: * A mount-only machine is added to test that this use case works. This made me find all the below troubles. * Fix SSH hang by using .mount unit instead of fstab converter. This apparently works around NixOS/nixpkgs#30348 for me. No idea why the fstab converter would have this problem. The nasty pam_systemd(sshd:session): Failed to create session: Connection timed out error would slow down SSH logins by 25 seconds, also making reboots slower (because nixops keys upload uses SSH). It would also show things like `session-1.scope` as failed in systemctl. * More robustly track (via Consul) whether the Gluster volume is already mountable from the client (that is, up and running on the servers). This has come a long way; to implement this, I've tried now * manual sessions, but those have 10 second min TTL which gets auto-extended even longer when rebooting, so I tried * script checks, which don't kill the subprocess even when you give a `timeout` and don't allow to set a TTL, so I tried * TTL checks + manual update script, and not even those set the check to failed when the TTL expires See my filed Consul bugs: * hashicorp/consul#3569 * hashicorp/consul#3563 * hashicorp/consul#3565 So I am using a more specific workaround now: A TTL check + manual update script, AND a script (`consul-scripting-helper.py waitUntilService --wait-for-index-change`) run by a service (`glusterReadyForClientMount.service`) that waits until the TTL of a check for the service is observed to be bumped at least once during the life-time of the script. When the script observes a TTL bump, we can be sure that at least one of the gluster servers has its volume up. * `gluster volume status VOLUME_NAME detail | grep "^Online.*Y"` is used to check whether the volume is actually up. * Using consul's DNS feature to automatically pick an available server for the mount. dnsmasq is used to forward DNS queries to the *.consul domain to the consul agent. `allow_stale = false` is used to ensure that the DNS queries are not outdated. * Create `/etc/ssl/dhparam.pem` to avoid spurious Gluster warnings (see https://bugzilla.redhat.com/show_bug.cgi?id=1398237). * `consul-scripting-helper.py` received some fixes and extra loops to retry when Consul is down. This commit also switches to using `services.glusterfs.tlsSettings` as implemented in NixOS/nixpkgs#27340 which revealed a lot of the above issues.
So far, after a full reboot of all machines, some would sometimes have failed systemd units. Key changes: * A mount-only machine is added to test that this use case works. This made me find all the below troubles. * Fix SSH hang by using .mount unit instead of fstab converter. This apparently works around NixOS/nixpkgs#30348 for me. No idea why the fstab converter would have this problem. The nasty pam_systemd(sshd:session): Failed to create session: Connection timed out error would slow down SSH logins by 25 seconds, also making reboots slower (because nixops keys upload uses SSH). It would also show things like `session-1.scope` as failed in systemctl. * More robustly track (via Consul) whether the Gluster volume is already mountable from the client (that is, up and running on the servers). This has come a long way; to implement this, I've tried now * manual sessions, but those have 10 second min TTL which gets auto-extended even longer when rebooting, so I tried * script checks, which don't kill the subprocess even when you give a `timeout` and don't allow to set a TTL, so I tried * TTL checks + manual update script, and not even those set the check to failed when the TTL expires See my filed Consul bugs: * hashicorp/consul#3569 * hashicorp/consul#3563 * hashicorp/consul#3565 So I am using a more specific workaround now: A TTL check + manual update script, AND a script (`consul-scripting-helper.py waitUntilService --wait-for-index-change`) run by a service (`glusterReadyForClientMount.service`) that waits until the TTL of a check for the service is observed to be bumped at least once during the life-time of the script. When the script observes a TTL bump, we can be sure that at least one of the gluster servers has its volume up. * `gluster volume status VOLUME_NAME detail | grep "^Online.*Y"` is used to check whether the volume is actually up. * Using consul's DNS feature to automatically pick an available server for the mount. dnsmasq is used to forward DNS queries to the *.consul domain to the consul agent. `allow_stale = false` is used to ensure that the DNS queries are not outdated. * Create `/etc/ssl/dhparam.pem` to avoid spurious Gluster warnings (see https://bugzilla.redhat.com/show_bug.cgi?id=1398237). * `consul-scripting-helper.py` received some fixes and extra loops to retry when Consul is down. This commit also switches to using `services.glusterfs.tlsSettings` as implemented in NixOS/nixpkgs#27340 which revealed a lot of the above issues.
So far, after a full reboot of all machines, some would sometimes have failed systemd units. Key changes: * A mount-only machine is added to test that this use case works. This made me find all the below troubles. * Fix SSH hang by using .mount unit instead of fstab converter. This apparently works around NixOS/nixpkgs#30348 for me. No idea why the fstab converter would have this problem. The nasty pam_systemd(sshd:session): Failed to create session: Connection timed out error would slow down SSH logins by 25 seconds, also making reboots slower (because nixops keys upload uses SSH). It would also show things like `session-1.scope` as failed in systemctl. * More robustly track (via Consul) whether the Gluster volume is already mountable from the client (that is, up and running on the servers). This has come a long way; to implement this, I've tried now * manual sessions, but those have 10 second min TTL which gets auto-extended even longer when rebooting, so I tried * script checks, which don't kill the subprocess even when you give a `timeout` and don't allow to set a TTL, so I tried * TTL checks + manual update script, and not even those set the check to failed when the TTL expires See my filed Consul bugs: * hashicorp/consul#3569 * hashicorp/consul#3563 * hashicorp/consul#3565 So I am using a more specific workaround now: A TTL check + manual update script, AND a script (`consul-scripting-helper.py waitUntilService --wait-for-index-change`) run by a service (`glusterReadyForClientMount.service`) that waits until the TTL of a check for the service is observed to be bumped at least once during the life-time of the script. When the script observes a TTL bump, we can be sure that at least one of the gluster servers has its volume up. * `gluster volume status VOLUME_NAME detail | grep "^Online.*Y"` is used to check whether the volume is actually up. * Using consul's DNS feature to automatically pick an available server for the mount. dnsmasq is used to forward DNS queries to the *.consul domain to the consul agent. `allow_stale = false` is used to ensure that the DNS queries are not outdated. * Create `/etc/ssl/dhparam.pem` to avoid spurious Gluster warnings (see https://bugzilla.redhat.com/show_bug.cgi?id=1398237). * `consul-scripting-helper.py` received some fixes and extra loops to retry when Consul is down. This commit also switches to using `services.glusterfs.tlsSettings` as implemented in NixOS/nixpkgs#27340 which revealed a lot of the above issues.
Motivation for this change
This allows to easily setup glusterfs with TLS support. The user has to provide three files:
There are still some open points I would appreciate some feedback:
/etc/ssl
and creating the marker file/var/lib/glusterd/secure-access
. This is the same setup for Servers and Clients wanting to mount from a secure server. This PR only activates this for the server part. This means one can mount a TLS secured volume on a machine where the server is also running. To improve this the TLS option should probably be moved out of the service config, but I'm not sure how.Things done
(nix.useSandbox on NixOS,
or option
build-use-sandbox
innix.conf
on non-NixOS)
nix-shell -p nox --run "nox-review wip"
./result/bin/
)/cc @nh2