Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDog Agent v6 #40399

Merged
merged 8 commits into from Aug 15, 2018
Merged

DataDog Agent v6 #40399

merged 8 commits into from Aug 15, 2018

Conversation

rvl
Copy link
Contributor

@rvl rvl commented May 12, 2018

Motivation for this change

I wanted to store the api_key setting in a file, saw that datadog-agent v6 supports using an environment variable, then foolishly attempted packaging datadog-agent v6.

They decided to rewrite most parts of the agent in Go. The new version is here: https://github.com/DataDog/datadog-agent

The NixOS module supports both versions using the service.dd-agent.package option.

Things done

It's hard to automatically test this because it depends on a proprietary service.
I have tested a basic setup using a nixops VM and a DataDog trial account and this works.
If someone could help test more elaborate configs, that would help.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

/cc @thoughtpolice @domenkozar @mbbx6spp @Fuuzetsu @bflyblue @coretemp

@Mic92
Copy link
Member

Mic92 commented May 12, 2018

Will the old version be supported for a while?

@rvl
Copy link
Contributor Author

rvl commented May 13, 2018

@Mic92 Yes I assume it will be supported by DataDog for a while because v6 still lacks features that v5 has (changes, missing_features). Also users might need time to update any custom Python check scripts that they have.
Edit: I have sent this question to DataDog support


# DataDog use paths relative to the agent binary, so fix these.
postPatch = ''
sed -e "s|PyChecksPath =.*|PyChecksPath = \"$bin/${python.sitePackages}\"|" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this to do the same and it is simpler (no escaping). Apply in multiple places.
sed -e 's|PyChecksPath =.*|PyChecksPath = "$bin/${python.sitePackages}"|' \

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think $bin would be expanded in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. In that case

sed -e 's|PyChecksPath =.*|PyChecksPath = "'"$bin"'/${python.sitePackages}"|' \

@coretemp
Copy link
Contributor

@rvl This looks exactly like what I had in mind when I opened the issue. I hope it will be merged soon.

description = "Datadog agent monitor";
} // (if newDog then {
preStart = ''
chown datadog /etc/dd-agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not done recursively?

serviceConfig = {
ExecStart = "${cfg.package}/bin/dogstatsd";
Type = "forking";
PIDFile = "/tmp/dogstatsd.pid";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp/ is a strange place for this. I hope statsd checks that no other user has created a malicious symlink there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are established locations for this stuff indeed. /tmp is not the right location.

For my use case this kind of security is not important, but for some people it might be.

@rvl
Copy link
Contributor Author

rvl commented May 20, 2018

OK thanks @Mic92 I have fixed those issues. Also I noticed that the delightful v5 dd-agent code apart from putting PID files in /tmp, also drops Python pickle files, which is reason enough to enable the systemd PrivateTmp option.

In no hurry to merge this until it gets some testing from datadog users.

@Mic92
Copy link
Member

Mic92 commented May 21, 2018

I think it is used in the core infrastructure of NixOS itself.

@tazjin
Copy link
Member

tazjin commented May 24, 2018

I haven't tested this on an actual machine yet, but from a superficial look I think this is missing the inclusion of the core integrations. The agent's go rewrite only covers some of the core functionality, most checks are still run via Python.

If you look at the derivation output of the V5 dd-agent you'll notice that the files from the integrations repository are included there. Without them the V6 agent will be able to collect basic metrics, but fail on configurations that require the integrations to be present.

Datadog's upstream packages (for Debian & co) include these in the agent package and they also note that they're included in the documentation.

I've previously read through the upstream build process that they use to create the deb & RPM packages and it's ... complicated, to say the least. I'm not really familiar with the Python ecosystem but if someone wants to take a look at this together, please ping me :)

@rvl
Copy link
Contributor Author

rvl commented May 27, 2018

Thanks @tazjin - I have added the python core integrations.

I have also decided to not make the dd-agent module work with both v5 and v6.
Instead, dd-agent will be for v5 and a new module datadog-agent will be for v6.
This allows us to make the configuration more extendable but not break the old dd-agent module API.

I have kept the v5 dd-agent temp files fix which @Mic92 pointed out.

@tazjin
Copy link
Member

tazjin commented May 27, 2018

@rvl Awesome work, I'll try it in one of our environments with a bunch of custom integration configs and report back. Probably won't get around to it before Tuesday though.

@xeji
Copy link
Contributor

xeji commented Jun 2, 2018

The merge of #41035 created a merge conflict, please rebase.

@rvl rvl force-pushed the datadog-agent branch 3 times, most recently from 7599676 to 53a1148 Compare June 2, 2018 10:43
@rvl
Copy link
Contributor Author

rvl commented Jun 2, 2018

OK thanks @xeji I have rebased.

Now that there are separate NixOS modules for v5 and v6, I am more keen to get this reviewed and merged. The testing I have done is with nixops and virtualbox, with the basic host dashboard metrics, and the postgresql integration.

@cransom
Copy link
Contributor

cransom commented Jun 8, 2018

I have a build failure when I build from the PR as well as via a nox-review:

Collecting requests (from datadog-checks-base==1.2.2)
  Could not find a version that satisfies the requirement requests (from datadog-checks-base==1.2.2) (from versions: )
No matching distribution found for requests (from datadog-checks-base==1.2.2)

I added requests to the datadog-checks-base derivation and it builds and works.

This is the new v6 version of datadog-agent.

The old v5 module is kept as dd-agent.
rm -f /etc/datadog-agent/auth_token
'';
script = ''
export DD_API_KEY=$(head -n1 ${cfg.apiKeyFile})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head -n 1 is POSIX correct. head -n1 is not (it just happens to work).

tazjin added a commit to tazjin/nixpkgs that referenced this pull request Aug 9, 2018
Refactors the datadog-agent (i.e. V6) module to let users configure
arbitrary checks, not just a limited set, without having to resort to
linking the files manually and updating the systemd unit.

Checks are now configured via a `services.datadog-agent.checks` option
which takes an attribute set in which the keys refer directly to
Datadog check names, and the values are attribute sets representing
Datadog's configuration structure.

With this mechanism users can configure arbitrary integrations, for
example for the `ntp`-check, simply by saying:

    services.datadog-agent.checks.ntp = {
      init_config = null;
      # ... other check configuration options as per Datadog
      # documentation
    };

The previous check-specific configuration options for non-default
checks have been removed. Disk & network check configuration options
have been kept rather than making them a `default`-value of the
`checks`-option because they will be overridden by user-configurations
in that case.

Relates to NixOS#40399.
tazjin added a commit to tazjin/nixpkgs that referenced this pull request Aug 9, 2018
Refactors the process used to build the Datadog core integrations to
be more easily extensible with integrations other than the ones built
and installed by default.

Documentation has been added in relevant parts of the module to
describe how the process works.

As a high-level overview:

The `datadog-integrations-core` attribute in the top-level package set
now accepts an extra parameter.

This parameter is an attribute set where each key is the name of a
Datadog integration as it appears in Datadog's integrations-core
repository[1], and the value is a function that receives the Python
package set and returns the required dependencies of this integration.

For example:

    datadog-integrations-core {
      ntp = (ps: [ ps.ntplib ]);
    };

This would build the default integrations and, additionally, the `ntp`
integration.

To support passing the modified Python environment to the
datadog-agent itself, the `python` key has been moved inside of the
derivation which means that it will be made overridable.

This relates to NixOS#40399.

[1]: https://github.com/DataDog/integrations-core
tazjin added a commit to tazjin/nixpkgs that referenced this pull request Aug 9, 2018
Introduces an option `services.datadog-agent.extraIntegrations` that
can be set to include additional Datadog agent integrations from the
integrations-core repository.

Documentation and an example is provided with the change.

Relates to NixOS#40399
@tazjin
Copy link
Member

tazjin commented Aug 9, 2018

I am force-pushing over the current tip (e9b06d0efcffff97a8427137451d5b814f05f183) on rvl/datadog-agent with a rebase, logging fixes, the change @coretemp pointed out, the makePrefix thing and my changes.

The configuration is tested on a few machines with both custom check config and extra integrations. From my side this should be done, but I haven't explicitly tested whether this is affected by #44362 - maybe @coretemp can do that, or maybe we can look at it later.

Refactors the datadog-agent (i.e. V6) module to let users configure
arbitrary checks, not just a limited set, without having to resort to
linking the files manually and updating the systemd unit.

Checks are now configured via a `services.datadog-agent.checks` option
which takes an attribute set in which the keys refer directly to
Datadog check names, and the values are attribute sets representing
Datadog's configuration structure.

With this mechanism users can configure arbitrary integrations, for
example for the `ntp`-check, simply by saying:

    services.datadog-agent.checks.ntp = {
      init_config = null;
      # ... other check configuration options as per Datadog
      # documentation
    };

The previous check-specific configuration options for non-default
checks have been removed. Disk & network check configuration options
have been kept rather than making them a `default`-value of the
`checks`-option because they will be overridden by user-configurations
in that case.

Relates to NixOS#40399.
Refactors the process used to build the Datadog core integrations to
be more easily extensible with integrations other than the ones built
and installed by default.

Documentation has been added in relevant parts of the module to
describe how the process works.

As a high-level overview:

The `datadog-integrations-core` attribute in the top-level package set
now accepts an extra parameter.

This parameter is an attribute set where each key is the name of a
Datadog integration as it appears in Datadog's integrations-core
repository[1], and the value is a function that receives the Python
package set and returns the required dependencies of this integration.

For example:

    datadog-integrations-core {
      ntp = (ps: [ ps.ntplib ]);
    };

This would build the default integrations and, additionally, the `ntp`
integration.

To support passing the modified Python environment to the
datadog-agent itself, the `python` key has been moved inside of the
derivation which means that it will be made overridable.

This relates to NixOS#40399.

[1]: https://github.com/DataDog/integrations-core
Introduces an option `services.datadog-agent.extraIntegrations` that
can be set to include additional Datadog agent integrations from the
integrations-core repository.

Documentation and an example is provided with the change.

Relates to NixOS#40399
@coretemp
Copy link
Contributor

@tazjin Looking at it later is the option that I prefer. It's just that I raised it, because the costs of fixing something earlier is generally lower. However, I consider getting it into 18.09 even with this minor bug to be even more important. I don't plan to fix it in the next 4 months for example.

@rvl
Copy link
Contributor Author

rvl commented Aug 13, 2018

@tazjin Sorry for the delay. I have been busy moving house.

I have just tested your fixes and refactors with nginx and postgres checks.

It looks good and works well.

Thanks.

@tazjin
Copy link
Member

tazjin commented Aug 13, 2018

@rvl No worries, hope the move went well! :) And thanks!

Alright, @Mic92 - is there anything missing to get this in?

@disassembler disassembler merged commit 67b1cbb into NixOS:master Aug 15, 2018
@disassembler
Copy link
Member

LGTM! Thanks for the contribution @rvl @tazjin!

@tazjin tazjin deleted the datadog-agent branch August 15, 2018 19:53
@tazjin
Copy link
Member

tazjin commented Aug 15, 2018

Thanks @disassembler! 🥇

@coretemp
Copy link
Contributor

Great to see this merged.

Is there a work flow to use this module on 18.03 that does not involve forking?

@tazjin
Copy link
Member

tazjin commented Aug 16, 2018

@coretemp Yes, you can do something roughly like this:

{ config, pkgs, ... }:

let

# Download nixpkgs from master (you may want to pin a commit, or maybe
# the unstable channel, instead):
unstableSrc = fetchTarball "https://github.com/NixOS/nixpkgs/archive/master.zip";

# The above gives is the raw (unimported) Nix source, which we'll need
# in a second, but we also need to import the actual package set:
unstable = import unstableSrc {};

in {
  # Import the module itself from unstable:
  imports = [
    "${unstableSrc}/nixos/modules/services/monitoring/datadog-agent.nix"
  ];

  config = {
    # Pick the relevant packages into your local overrides:
    nixpkgs.config.packageOverrides = base: base // {
      inherit (unstable) datadog-agent datadog-integrations-core;
    };

    # And use the module:
    services.datadog-agent = {
      enable     = true;
      apiKeyFile = whatever;
      tags       = [ "my-env" ];
    };
  };
}

I wrote this from memory so it may be slightly off, but that's roughly how we do it atm.

@rvl
Copy link
Contributor Author

rvl commented Aug 16, 2018

Thanks all!

@coretemp
Copy link
Contributor

Thanks @tazjin.

I am using the following (I don't like a moving target):

  # repository containing datadog package version 6
  src_datadog_6 = bootstrap.fetchFromGitHub {
    owner = "NixOS";
    repo  = "nixpkgs";
    rev = "b7e50d52f018d0670614a58851b4fc2c0f93818b";
    sha256 = "0gwgflq4v9zyflpfrqdnzqwk9sfvn7aqy39m0vwbrhvlz3h8dwb2";
  };

@coretemp
Copy link
Contributor

@tazjin What do you do about?

 (datadog_agent.go:129 in LogMessage) | (disk.py:105) | Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission denied: '/sys/kernel/debug/tracing'

Also, I am not sure why, but despite the module running fine, the Host map in Datadog doesn't show the machine anymore after the upgrade.

@coretemp
Copy link
Contributor

@tazjin I found a more serious limitation:

I added datadog-agent to environment.systemPackages, but I am not able to execute

datadog-agent per https://docs.datadoghq.com/agent/troubleshooting/#send-a-flare. (i.e. it's not on the PATH). I think the package should be added to environment.systemPackages in the module with some additional modifications (perhaps the "core" part).

@coretemp
Copy link
Contributor

@tazjin When I use /nix/store/*-datadog-agent-6.1.4-bin/bin/agent flare sending a flare also doesn't work. What is the interface you have in mind for sending such flares?

@tazjin
Copy link
Member

tazjin commented Aug 16, 2018

@coretemp Working on something else right now, can take a look at this tomorrow, but some small comments:

the Host map in Datadog doesn't show the machine anymore after the upgrade

Depending on how your hosts were set up they may end up with new hostnames. This doesn't happen in my environments because we explicitly configure the hostnames used in Datadog, so I'm not sure if there is a difference in behaviour between this and the V5 module. Will check tomorrow.

I think the package should be added to environment.systemPackages in the module

I'm not sure if it's common or expected for modules to install CLI tools system-wide, but I guess it makes sense here because of flares and retrieving the agent status. This should be a tiny change.

sending a flare also doesn't work

What sort of error do you get?

@coretemp
Copy link
Contributor

@tazjin There is no error. I am certain that if you try to send a flare, that it will also not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants