New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datadog: package update + module refactoring #26350
Conversation
There's a small gotcha w.r.t. Edit: I should also mention that I don't use the JMX stuff so I have 0 idea if that part still works on 5.13.x actually. |
Another comment: actually JMX is used on one of our 5.13.2 boxes and it seems to Just Work™ so should be good on that front too. I have not configured it explicitly there (must have got triggered by some other integration) so FWIW the JMX service might be useless now too, just like |
Seems more rework is needed to accommodate their whole new |
OK, all the changes I wanted to make are done and in this PR I think. The main difference is that we now use the blessed way to run datadog, i.e. through I have again ported the API compatibility patch on top in case we want to support that still. Please let me know what you think of the changes in general and whether we're keen on keeping the old API layer. |
Hm, something might not be quite right with metrics reporting on this PR. Will investigate and fix before merging. General feedback/review appreciated regardless. |
Have found out what the issue was and updated PR to include the fixes. |
Just to update, been running a version of this PR (without compat patch) on multiple machines for few days and seems to work in all aspects. |
@@ -15,102 +15,48 @@ let | |||
collector_log_file: /var/log/datadog/collector.log | |||
forwarder_log_file: /var/log/datadog/forwarder.log | |||
dogstatsd_log_file: /var/log/datadog/dogstatsd.log | |||
pup_log_file: /var/log/datadog/pup.log | |||
jmxfetch_log_file: /var/log/datadog/jmxfetch.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a variable for the /var/log/datadog directory. Having a default is fine.
@@ -0,0 +1,73 @@ | |||
[ { goPackagePath = "github.com/cihub/seelog"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is automatically generated, I suppose? Please put a reference to the generator used to create this file in the output of the generator. If it's not your tool, please open an issue for it to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was written by hand as I couldn't get go2nix to work on their setup as they use Rake or something and I'm not a Go guy to try and figure it out.
Needed to support datadog application tracing.
Starting things manually has been deprecated for many versions and recently even removed for some things (datadogstatsd). Further, with this approach we don't have to worry about starting things ourselves, we just have to patch the supervisor config so that it knows what to use and it should do the work for us.
This allows for explicit supported integration configurations as in the 5.11.2 version that we had before while allowing the user to specify arbitrary integrations at the same time. It should be pretty clear that we have a redundant specification mechanism: in both cases the user has to be explicit in what they want with the exception that now we force `network` and `disk` because that's what the previous version did. Specific integration configurations take precedence over `integrations` (but only if actually specified) as I think that's the least surprising behaviour.
}; | ||
|
||
buildInputs = [ | ||
python | ||
unzip | ||
makeWrapper | ||
pythonPackages.requests | ||
pythonPackages.psycopg2 | ||
pythonPackages.psutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that defining the inclusion of package dependencies only in the configuration of nixos modules goes against the design of nix. Nix doesn't exist only in the context of nixos and never has. For those not using nixos (such as myself), the removal of this dependency effectively breaks the dd-agent package since the basic components such as network and core no longer work. That has the effect of breaking a working package for existing users. I would ask that this not be merged without finding a way not to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind the removal of these was that they are not strictly necessary. They were simply the "defaults". NixOS service happens to provide these defaults but there's nothing stopping the user from adding these and any other packages back into dd-agent with an override whereas it's much more difficult to opt out of these from the service side of things.
Having said that maybe dd-agent expression itself should be more configurable with these things enabled by default and the service could flip things that way. Seems ugly to me though…
pythonPackages.supervisor | ||
pythonPackages.tornado_3_2_2 | ||
pythonPackages.uptime | ||
] ++ extraBuildInputs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR seems to be the direction dd-agent is going for updates, I wanted to let you know that I discovered a problem with the existing package which could be fixed here.
dd-agent has proxy functionality which is currently not working because it relies on pycurl, which is not in the package dependencies. I've patched it for my install, but official support by adding the pycurl dependency here would fix a bug in the current package.
Another thing which is broken with the current package of dd-agent is the "Processes Memory Usage" report. I've found that it depends on the gohai package, which for some reason datadog decided not to include in their source-based installation process. gohai is a binary which simply needs to be on the path, so it could be a propagatedBuildInput, if there were a nix gohai package. I haven't tried creating one, so I don't know the difficulty level, but I thought I would share my findings. |
@binaryphile w.r.t. gohai: yeah, I saw it complain about it in logs but could not determine the cause. w.r.t. everything else; this code was written for a multiple machine NixOS deployment; we are now however in process of moving into kubernetes with containers and so this became a much lower piority: we still have NixOS boxes using it but not as many. If someone is interested in getting these changes landed sooner than later (with any amends), please don't hesitate to pick up this work. If not, I may eventually get around to addressing any comments myself though the priority for that is much lower now. |
@Fuuzetsu understandable, thanks for your reply and thanks for your work. |
If anyone here wants to take a shot at doing an agent V6 upgrade and refactoring the available NixOS options, please ping me. I've started making a Nix package, however the build process for the agent is quite frankly insane and I've lost a whole bunch of soul fragments to it so far. We have production boxes using Datadog and the current agent version (5) is starting to cause issues for us due to lacking features and whatnot. |
Personally I wrote this at a client a year-ish ago and no longer care about it. I'm closing but encourage others to pick it uo independently. |
Motivation for this change
I have previously updated and changed the API of the datadog service. As per request, I have reverted that and I'm submitting it as a pull request instead. Additionally to previous work, I'm also providing an additional patch that brings back the old API without sacrificing any of the new one. Personally I think users should simply upgrade to (what I think is) a much more flexible API.
Things done
The upgrade itself is deployed on actual machines running now and works great. The backwards compatibility patch was only tested for evaluation with
nixos-rebuild dry-run
.