Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/buildkite-agents: support multiple buildkite agents #78373

Merged
merged 2 commits into from Feb 10, 2020

Conversation

yorickvP
Copy link
Contributor

@yorickvP yorickvP commented Jan 23, 2020

Motivation for this change

Continued from #68378, which was continued from #59358.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
cc

@flokli @mkaito @balsoft @infinisil

@mkaito
Copy link
Contributor

mkaito commented Jan 23, 2020

@GrahamcOfBorg test buildkite-agents

@mkaito
Copy link
Contributor

mkaito commented Jan 23, 2020

Oh you didn't rename the test attr

@GrahamcOfBorg test buildkite-agent

@yorickvP
Copy link
Contributor Author

(test attr renamed now)

enable = mkEnableOption "buildkite-agent";
buildkiteOptions = { name ? "", config, ... }: {
options = {
enable = mkOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the fact that it is declared assume that you want it enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Would be weird if you declared an agent and then you have to explicitly enable it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to have the option to then explicitly disable it? Seems weird, as we filter by enabled agents…

@balsoft
Copy link
Member

balsoft commented Jan 24, 2020

Do we not want to keep compatibility? Provide services.buildkite-agent, and if it's used then create a single agent in buildkite-agents (and perhaps show a warning that the option is soon to be removed)?

Comment on lines 47 to 51
dataDir = mkOption {
default = "/var/lib/buildkite-agent";
default = "/var/lib/buildkite-agent-${name}";
description = "The workdir for the agent";
type = types.str;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know usecases of where this should be somewhere else than at this location?

If we restrict it to /var/lib/buildkite-agent-${name}, we could just set StateDirectory=buildkite-agent=${name}, and DynamicUser=true, and don't need to setup a system-wide user at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use the buildkite directory to store some gc roots that are used for Continuous Delivery. We could change to a ReadWriteDirectory, but I don't think DynamicUser is really set up for communication with the outside world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our approach to CD is arguably horrible by abusing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'd recommend using some post-build-hook on the global nix-daemon to sign and upload to a binary cache, but hmm… I can understand you can't change this that quick.

Still, what about defaulting to use DynamicUser=true, and StateDirectory=buidlkite-agent-${name}, and only use a system-wide user if both a user attribute and that dataDir attribute is set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using some post-build-hook on the global nix-daemon to sign and upload to a binary cache

Been there, tried that :(. Builds become horribly slow and the host becomes almost unusable as it is uploading everything, even your user environment after nix-env -i (and that’s slow too).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the naive approach blocks the build loop, as documented in https://nixos.org/nix/manual/#chap-post-build-hook-caveats .

In your case, you might want to have the executed hook to fork away as fast as possible.

However, I see that discussion as not really related to this PR - it doesn't yet introduce DynamicUser.

nixos/tests/buildkite-agents.nix Outdated Show resolved Hide resolved
nixos/tests/buildkite-agents.nix Outdated Show resolved Hide resolved
@flokli
Copy link
Contributor

flokli commented Jan 24, 2020

Do we not want to keep compatibility? Provide services.buildkite-agent, and if it's used then create a single agent in buildkite-agents (and perhaps show a warning that the option is soon to be removed)?

I'm fine with some breaking changes, if they're documented in the release notes and remove some cruft. We did some other updates when moving to buildkite 3, so I'd expect people to take a look at the release notes anyway and update their configuration.

(mkRemovedOptionModule [ "services" "buildkite-agent" "openssh" "publicKey" ] "SSH public keys aren't necessary to clone private repos.")
(mkRemovedOptionModule [ "services" "buildkite-agent" "openssh" "publicKeyPath" ] "SSH public keys aren't necessary to clone private repos.")
(mkRenamedOptionModule [ "services" "buildkite-agent" "meta-data"] [ "services" "buildkite-agent" "tags" ])
(mkRemovedOptionModule [ "services" "buildkite-agent"] "services.buildkite-agent has been moved to an attribute set at services.buildkite-agents")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be relatively easy to make this backwards compatible with mkChangedOptionModule, which will also issue a warning when the old option is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but what should we make the default agent name? The state paths for it would still change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since people are expected to have used dataDir to refer to the path, I don't think it should be a problem. Also from this PR I can see that the service previously didn't even use dataDir to get its config, seemingly indicating that it was never fully supported anyways.

Also I correct myself: A simple

mkRenamedOptionModule [ "services" "buildkite-agent" ]
  [ "services" "buildkite-agents" "legacy" ]

should work (using the name legacy, or maybe single or nixos-single or so)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dataDir actually contains long-term state. People would have to move the contents of /var/lib/buildkite-agent to /var/lib/buildkite-agent-legacy, or get unexpected behavior. I think it's safer to have it break during the build, instead of during runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh good point, didn't think of that. Okay so a mkRenamedOptionModule won't work, but a mkChangedOptionModule should still be possible with a legacy attribute (or similar) and by defaulting its dataDir to /var/lib/buildkite-agent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think breaking old config and making sure the user did read the release notes is a better way forward here.

There have been quite some changes, and just silently assuming everything will still work as it did before is probably a bad idea anyways.

I also proposed to default to use DynamicUsers, and make the usage of a non-dynamic user, and custom StateDirectory the non-default case, which might cause nontrivial migrations aswell.

I think batching all these with the option syntax change is a reasonable path forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the "quite some changes"? From what I can see all this PR does is to add support for multiple agents as it says, which imo really doesn't warrant breaking people's configs, especially if it only takes a couple lines to make it automatic.

Copy link
Contributor

@flokli flokli Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infinisil see the earlier PRs and the corresponding release notes:

https://github.com/NixOS/nixpkgs/blob/master/nixos/doc/manual/release-notes/rl-2003.xml#L417

As for the "defaulting to DynamicUser": #78373 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any of those not backwards compatible? If so they should've added a warning/error themselves. This PR itself seems 100% backwards compatible with a mkChangedOptionModule. Unless backwards compat has to be broken I'd rather not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the module options themselves not, but the upgrade from buildkite 2 to 3 might have introduced some behavorial changes in how the bootstrap/hooks were configured: https://buildkite.com/docs/agent/v3/upgrading

@flokli
Copy link
Contributor

flokli commented Feb 10, 2020

@GrahamcOfBorg test buildkite-agent

@yorickvP
Copy link
Contributor Author

@GrahamcOfBorg test buildkite-agents

Copy link
Contributor

@flokli flokli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on IRC: https://logs.nix.samueldr.com/nixos-dev/2020-02-10#1581336556-1581341910;

Looks good to me now :)

@infinisil infinisil merged commit e3c5d29 into NixOS:master Feb 10, 2020
@grahamc
Copy link
Member

grahamc commented Feb 18, 2020

After this PR I get:

Options `services.buildkite-agents.r13y.hooksPath' and
`services.buildkite-agents.r13y.hooks.<name>' are mutually exclusive

but I am not specifying hooksPath, and am only specifying hooks.environment.


assertions = [
config.assertions = mapAgents (name: cfg: [
{ assertion = cfg.hooksPath == hooksDir || all (v: v == null) (attrValues cfg.hooks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ assertion = cfg.hooksPath == hooksDir || all (v: v == null) (attrValues cfg.hooks);
{ assertion = cfg.hooksPath == (hooksDir cfg) || all (v: v == null) (attrValues cfg.hooks);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants