Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog: package update + module refactoring #26350

Closed
wants to merge 4 commits into from

Conversation

Fuuzetsu
Copy link
Member

@Fuuzetsu Fuuzetsu commented Jun 3, 2017

Motivation for this change

I have previously updated and changed the API of the datadog service. As per request, I have reverted that and I'm submitting it as a pull request instead. Additionally to previous work, I'm also providing an additional patch that brings back the old API without sacrificing any of the new one. Personally I think users should simply upgrade to (what I think is) a much more flexible API.

Things done

The upgrade itself is deployed on actual machines running now and works great. The backwards compatibility patch was only tested for evaluation with nixos-rebuild dry-run.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 3, 2017

There's a small gotcha w.r.t. jmx; it's not actually an integration so something like { name = "jmx"; } would not work as it would not find a default configuration. However { name "jmx"; config = "..."; } should work just fine.

Edit:

I should also mention that I don't use the JMX stuff so I have 0 idea if that part still works on 5.13.x actually.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 3, 2017

Another comment: actually JMX is used on one of our 5.13.2 boxes and it seems to Just Work™ so should be good on that front too. I have not configured it explicitly there (must have got triggered by some other integration) so FWIW the JMX service might be useless now too, just like datadogd one was. They now use Python's supervisor thing to manage that sort of stuff.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 3, 2017

Seems more rework is needed to accommodate their whole new supervisord approach… Working on that separately from this PR for now, just heads up about more changes in future.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 3, 2017

OK, all the changes I wanted to make are done and in this PR I think. The main difference is that we now use the blessed way to run datadog, i.e. through supervisor. This means datadog itself takes care of starting stuff, we just tell it what to run and start the supervisor. I also added datadog-trace-agent which allows people to make use of application tracing on their hosts.

I have again ported the API compatibility patch on top in case we want to support that still. Please let me know what you think of the changes in general and whether we're keen on keeping the old API layer.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 4, 2017

Hm, something might not be quite right with metrics reporting on this PR. Will investigate and fix before merging. General feedback/review appreciated regardless.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 4, 2017

Have found out what the issue was and updated PR to include the fixes.

@domenkozar domenkozar changed the title Datadog update Datadog: package update + module refactoring Jun 5, 2017
@domenkozar domenkozar self-assigned this Jun 5, 2017
@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Jun 6, 2017

Just to update, been running a version of this PR (without compat patch) on multiple machines for few days and seems to work in all aspects.

@@ -15,102 +15,48 @@ let
collector_log_file: /var/log/datadog/collector.log
forwarder_log_file: /var/log/datadog/forwarder.log
dogstatsd_log_file: /var/log/datadog/dogstatsd.log
pup_log_file: /var/log/datadog/pup.log
jmxfetch_log_file: /var/log/datadog/jmxfetch.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a variable for the /var/log/datadog directory. Having a default is fine.

@@ -0,0 +1,73 @@
[ { goPackagePath = "github.com/cihub/seelog";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is automatically generated, I suppose? Please put a reference to the generator used to create this file in the output of the generator. If it's not your tool, please open an issue for it to be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was written by hand as I couldn't get go2nix to work on their setup as they use Rake or something and I'm not a Go guy to try and figure it out.

Needed to support datadog application tracing.
Starting things manually has been deprecated for many versions and
recently even removed for some things (datadogstatsd). Further, with
this approach we don't have to worry about starting things ourselves, we
just have to patch the supervisor config so that it knows what to use
and it should do the work for us.
This allows for explicit supported integration configurations as in the
5.11.2 version that we had before while allowing the user to specify
arbitrary integrations at the same time. It should be pretty clear that
we have a redundant specification mechanism: in both cases the user has
to be explicit in what they want with the exception that now we force
`network` and `disk` because that's what the previous version did.
Specific integration configurations take precedence over
`integrations` (but only if actually specified) as I think that's the
least surprising behaviour.
};

buildInputs = [
python
unzip
makeWrapper
pythonPackages.requests
pythonPackages.psycopg2
pythonPackages.psutil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that defining the inclusion of package dependencies only in the configuration of nixos modules goes against the design of nix. Nix doesn't exist only in the context of nixos and never has. For those not using nixos (such as myself), the removal of this dependency effectively breaks the dd-agent package since the basic components such as network and core no longer work. That has the effect of breaking a working package for existing users. I would ask that this not be merged without finding a way not to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind the removal of these was that they are not strictly necessary. They were simply the "defaults". NixOS service happens to provide these defaults but there's nothing stopping the user from adding these and any other packages back into dd-agent with an override whereas it's much more difficult to opt out of these from the service side of things.

Having said that maybe dd-agent expression itself should be more configurable with these things enabled by default and the service could flip things that way. Seems ugly to me though…

pythonPackages.supervisor
pythonPackages.tornado_3_2_2
pythonPackages.uptime
] ++ extraBuildInputs;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR seems to be the direction dd-agent is going for updates, I wanted to let you know that I discovered a problem with the existing package which could be fixed here.

dd-agent has proxy functionality which is currently not working because it relies on pycurl, which is not in the package dependencies. I've patched it for my install, but official support by adding the pycurl dependency here would fix a bug in the current package.

@binaryphile
Copy link

Another thing which is broken with the current package of dd-agent is the "Processes Memory Usage" report. I've found that it depends on the gohai package, which for some reason datadog decided not to include in their source-based installation process.

gohai is a binary which simply needs to be on the path, so it could be a propagatedBuildInput, if there were a nix gohai package.

I haven't tried creating one, so I don't know the difficulty level, but I thought I would share my findings.

@Fuuzetsu
Copy link
Member Author

Fuuzetsu commented Aug 18, 2017

@binaryphile w.r.t. gohai: yeah, I saw it complain about it in logs but could not determine the cause.

w.r.t. everything else; this code was written for a multiple machine NixOS deployment; we are now however in process of moving into kubernetes with containers and so this became a much lower piority: we still have NixOS boxes using it but not as many. If someone is interested in getting these changes landed sooner than later (with any amends), please don't hesitate to pick up this work. If not, I may eventually get around to addressing any comments myself though the priority for that is much lower now.

@binaryphile
Copy link

@Fuuzetsu understandable, thanks for your reply and thanks for your work.

@tazjin
Copy link
Member

tazjin commented Apr 27, 2018

If anyone here wants to take a shot at doing an agent V6 upgrade and refactoring the available NixOS options, please ping me. I've started making a Nix package, however the build process for the agent is quite frankly insane and I've lost a whole bunch of soul fragments to it so far.

We have production boxes using Datadog and the current agent version (5) is starting to cause issues for us due to lacking features and whatnot.

@Fuuzetsu
Copy link
Member Author

Personally I wrote this at a client a year-ish ago and no longer care about it. I'm closing but encourage others to pick it uo independently.

@Fuuzetsu Fuuzetsu closed this Apr 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants