Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add package & NixOS module for journaldriver #42134

Merged
merged 2 commits into from Jun 20, 2018

Conversation

tazjin
Copy link
Member

@tazjin tazjin commented Jun 17, 2018

Motivation for this change

Add support for the journaldriver log forwarding agent, primarily for use on Google Cloud Platform instances.

This PR adds a derivation for building journaldriver, as well as a NixOS module that configures a user and service for running the log forwarder.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

We have deployed this module on a bunch of machines in our test environments (both on Google Cloud machines and VMs hosted elsewhere) and it works as expected.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can one know whether its a GCP machine or not?

And a nitpick: Only half of the option declarations have the = lined up nicely :)

{ lib, fetchFromGitHub, rustPlatform, pkgconfig, openssl, systemd }:

rustPlatform.buildRustPackage rec {
name = "journaldriver";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version should be appended to the name, separated by a dash

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

};

# Disable tests in nixpkgs
doCheck = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the tests being disabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly decided to do that due to the time it takes to build the entire dependency chain, which will happen twice for running tests & non-tests.

The release versions have already had tests run (otherwise they wouldn't have gone into upstream master) and since all dependencies are pinned it should be fine.

Normally I'd package a Rust tool by using the buildRustCrate (which doesn't run tests at all), but that's not resolving the dependency on libsystemd correctly for some reason and has no obvious (read: documented) way to override the dependencies it figures out.

If you think tests should be enabled I don't mind enabling them :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the tests work, I think they should be on by default. Hydra will have it in the cache anyways, so it doesn't really matter how long it takes for the end user. Usually the only good reason to disable tests is because the tests require network access or something else that Nix doesn't have in the sandbox.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Updating with tests enabled.

# Disable tests in nixpkgs
doCheck = false;

buildInputs = [ pkgconfig openssl systemd.dev ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildInputs should choose the dev outputs automatically, so I think it should also work with s/systemd.dev/systemd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, pkgconfig should go in nativeBuildInputs


config = mkIf cfg.enable {
users.extraUsers.journaldriver = {
isSystemUser = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this set to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the differentiation this is only to indicate that the user account is not used by a human, which results in the UID of the account being allocated in a different range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not entirely sure, there are a couple services that use this, but others set a fixed id in https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/misc/ids.nix#L36. I hope somebody else can comment on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at the top of ids.nix sounds to me like the preferred way is to not hardcode an ID unless necessary.

It may be possible to use the DynamicUser= option in systemd. Journaldriver needs to have a persistent file to store its cursor position in, this should be possible to achieve with StateDirectory= in systemd - it's not obvious how to grant the dynamic user membership in the journal access group though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:

This configuration seems to work without specifying a user at all:

      serviceConfig = {
        Restart        = "always";
        DynamicUser    = true;
        StateDirectory = "journaldriver";

        # This group is required for accessing journald.
        SupplementaryGroups = "systemd-journal";
      };

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I didn't know of that :o

};

systemd.services.journaldriver = {
inherit environment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move the section at the start here, and change it to something like this:

environment = {
  RUST_LOG = cfg.logLevel;
  LOG_NAME = cfg.logName;
  # ...
}

This works because environment allows null values which has the effect of not setting the vars at all already.


logLevel = mkOption {
type = types.string;
default = "info";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What valid values are there? You can change the type to types.enum [ "info" "debug" ... ] to only allow certain values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values aren't constrained by an enum in this case, Rust's logging library accepts pretty complex configuration via this environment variable (for example one could enable trace logging for the HTTP client used within while not touching the log-level of the rest of the service).

@tazjin
Copy link
Member Author

tazjin commented Jun 18, 2018

@infinisil Thanks for the review! Good feedback 👌

How can one know whether its a GCP machine or not?

Hmm, presumably people know who their hosting provider is ... I'd hope? Technically one could try to run some kind of detection by attempting to speak to Google's instance metadata service, but that should probably be a change in journaldriver and not something that Nix should do.

I'm pushing an updated version that addresses some of your points and will quickly test it on some test machines.

@tazjin
Copy link
Member Author

tazjin commented Jun 18, 2018

Update: Tests successful :)

};

# Disable tests in nixpkgs
doCheck = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the tests work, I think they should be on by default. Hydra will have it in the cache anyways, so it doesn't really matter how long it takes for the end user. Usually the only good reason to disable tests is because the tests require network access or something else that Nix doesn't have in the sandbox.

"LOG_NAME" = cfg.logName;
"LOG_STREAM" = cfg.logStream;
"GOOGLE_CLOUD_PROJECT" = cfg.googleCloudProject;
"GOOGLE_APPLICATION_CREDENTIALS" = cfg.applicationCredentials;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to not use the " here, RUST_LOG = cfg.logLevel works just fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Updated


config = mkIf cfg.enable {
users.extraUsers.journaldriver = {
isSystemUser = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not entirely sure, there are a couple services that use this, but others set a fixed id in https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/misc/ids.nix#L36. I hope somebody else can comment on this

@tazjin
Copy link
Member Author

tazjin commented Jun 18, 2018

Refactored to use the DynamicUser feature in systemd instead of allocating a user explicitly.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Another thing I'm unsure about is network.target vs network-online.target. I'd think that most services would require network-online.target, but there are vastly more services actually using network.target. I suspect that's just because of confusion regarding those two, network-online.target seems to be the right one here.

One last thing to change and then this is looking good to me :)

};

logLevel = mkOption {
type = types.string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost missed this: The type should be types.str instead:

nixpkgs/lib/types.nix

Lines 203 to 205 in 7296c26

# Deprecated; should not be used because it quietly concatenates
# strings, which is usually not what you want.
string = separatedString "";

The same for all other options that use types.string except applicationCredentials, which should be changed to a path: types.path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@tazjin
Copy link
Member Author

tazjin commented Jun 18, 2018

Regarding the network targets: I read up on that a bit before making the change and according to this article, the -online target should be the correct one.


# The directory is set up and permissions are configured by
# systemd via the `StateDirectory` configuration option.
CURSOR_POSITION_FILE = "/var/lib/private/journaldriver/cursor.pos";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading man systemd.exec on StateDirectory tells me that symlinks should be set up to point /var/lib/journaldriver -> /var/lib/private/journaldriver, so I think the private/ infix could be dropped.

@infinisil
Copy link
Member

Other than that, I'd squash the 3 module commits into 1, and then I got nothing more to say :)

@tazjin
Copy link
Member Author

tazjin commented Jun 18, 2018

@infinisil Upstream now uses /var/lib/journaldriver/ as the default folder and I've updated the sources, so the cursor position setting is removed from here completely.

I've squashed the commits together now. Thanks for your review, I learned a bunch of interesting things that aren't immediately obvious from the Nix docs :)

Adds a module for running the journaldriver log forwarding agent via
systemd.

The agent can be deployed on both GCP instances and machines hosted
elsewhere to forward all logs from journald to Stackdriver Logging.

Consult the module options and upstream documentation for more
information.

Implementation notes:

* The service unit is configured to use systemd's dynamic user feature
  which will let systemd set up the state directory and appropriate
  user configuration at unit launch time instead of hardcoding it.

* The module depends on `network-online.target` to prevent a situation
  where journaldriver is failing and restarting multiple times before
  the network is online.
@tazjin
Copy link
Member Author

tazjin commented Jun 18, 2018

Verified the final version in our test environment, good to go from my side!

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, nice!

Copy link
Member

@fpletz fpletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for the PR @tazjin and @infinisil for the thorough review. 🍻

@fpletz fpletz merged commit cac9f08 into NixOS:master Jun 20, 2018
@tazjin tazjin deleted the feat/journaldriver branch June 20, 2018 12:52
@tazjin
Copy link
Member Author

tazjin commented Jun 20, 2018

@fpletz Thank you! :)

tazjin added a commit to tazjin/nixpkgs that referenced this pull request Nov 15, 2018
Imports the `journaldriver` module into the top-level NixOS module
list to make it usable without extra work.

This went unnoticed in NixOS#42134 (mostly because my setup imports modules
explicitly from pinned versions).

Fixes NixOS#50390
Mic92 pushed a commit that referenced this pull request Dec 27, 2018
Imports the `journaldriver` module into the top-level NixOS module
list to make it usable without extra work.

This went unnoticed in #42134 (mostly because my setup imports modules
explicitly from pinned versions).

Fixes #50390

(cherry picked from commit d5ea097)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants