nixos/prometheus: support v2 #56037

eonpatapon · 2019-02-19T08:25:27Z

Motivation for this change

Be able to use prometheus v2 with the prometheus nixos module.

Since the alertmanager is configured differently between v1 and v2, there is different options for v1 and v2. Added assertions to check if the correct option is used depending on prometheus version.

Things done

* Added `alerting` option for prometheus v2 * Added test for prometheus v2

eonpatapon · 2019-02-19T15:01:57Z

ping @fpletz @Ma27 @andir as you contributed to this module

andir · 2019-02-19T15:22:32Z

I do not fully recall the last conversation but shouldn't a user be able to run v1 and v2 (of prometheus, not really alertmanager) in order to still have access to historical data for some amount of time?

bachp · 2019-02-19T21:35:52Z

There is already #49802 but it was never completed.

I think the main issue why it was done as a seprate service there is that there is no migration path from prometheus 1 to 2.

Ma27 · 2019-02-19T22:02:57Z

In case it helps, there's at least a draft for v2 in the mayflower fork: https://github.com/mayflower/nixpkgs/blob/master/nixos/modules/services/monitoring/prometheus/2.nix

I can do a more detailed review tomorrow, but I'm fairly certain that @fpletz and @andir are way more qualified here, so before considering a merge I'd like to have an ack from them :)

fpletz · 2019-02-19T22:31:06Z

Since the 1.x branch doesn't seem to be maintained upstream anymore and isn't available for download via the official website I'd say we remove 1.x support completely instead of making our module even more complex than it already is. People will lose their data. There is just no way around it.

eonpatapon · 2019-02-20T08:00:13Z

Since there is no easy migration path my idea was to not break existing setups for this first step but still allow users to setup prometheus v2 with the nixos module. We could add some deprecation messages also to indicate that support of v1 will be drop. In a second step we can remove it.

bachp · 2019-02-20T08:08:24Z

@fpletz I would be fine with completely removing v1.

griff · 2019-02-20T14:44:35Z

I just stumbled across this PR while looking for prometheus 2 support and thought I would chime in with my personal preferences as a user of NixOS for breaking upgrades like this.

I would prefer that Prometheus 2 support got its own module that can run side by side with existing Prometheus 1 setups and then that a warning was added to the old module that it is deprecated and will be removed in the future. At a later date like a year from now we then remove the old module leaving only the new module.

I prefer this because it gives me fair warning that I need to take action but also gives me several options for what action I can take. It allows me to just nuke the old data like @fpletz prefers. It allows me to keep running the old version but with only historic data for the deprecation period and the new version with recent data. Or it allows me to just copy the deprecated module and take on supporting that module myself.

The one thing I never want though is for me to just lose data on an OS upgrade with no warning.

bachp · 2019-02-20T18:12:40Z

@griff I can understand your point. So maybe the best approach ist to take a copy-past approach where we copy the prometheus module as prometheus2. This way it can be improved independently of the prometheus module, which can be left as is.

griff · 2019-02-21T00:27:28Z

@bachp I think that would be the best approach

azazel75 · 2019-02-21T11:22:04Z

Since the 1.x branch doesn't seem to be maintained upstream anymore and isn't available for download via the official website I'd say we remove 1.x support completely instead of making our module even more complex than it already is. People will lose their data. There is just no way around it.

My patch started as a complete removal of the prometheus 1 code as I thought there was an upgrading path from version 1 to 2, but there isn't any (at least for the storage). I then changed the code to have it in parallel, also convinced by the fact that the only partial upgrade path suggested by the prometheus guys is to run a non scraping version of the 1.x instance to keep historical data and sharing it with the 2.x instance.

Unfortunately I now haven't so much time to spend on it, but If you want to trash the Prometheus1 configuration, a Prometheus2-only patch is much smaller and I've it running on my server. From a distribution pow I think that while I don't use prometheus 1, having a previous distribution with the 1.x configuration only and the next with the 2.x only feels like the wrong thing to do to me.

bachp · 2019-02-21T11:28:33Z

From a distribution pow I think that while I don't use prometheus 1, having a previous distribution with the 1.x configuration only and the next with the 2.x only feels like the wrong thing to do to me.

@azazel75 Sorry, I don't understand what you want to say with that. Would having two modules prometheus and prometheus2 be an acceptable way?

eonpatapon · 2019-02-21T11:41:30Z

I also agree that we shouldn't drop prometheus 1.x out of the blue. Existing configuration should continue to work.

My proposal was to have a working module for v1 or v2 but not both at the same time. Then v1 support could be droped after some deprecation period. This is enough for me but it looks like some people want to be able to run v1 and v2 in // to keep the old history, or to be able to scrape it in the new instance.

In this case I don't think there is any other choice than to provide 2 different modules, with different state path etc... like it was done on the other PR.

In either case we agree on I can work on it.

azazel75 · 2019-02-21T11:53:47Z

>>>> "Pascal" == Pascal Bach ***@***.***> writes:

Pascal> @azazel75 Sorry, I don't understand what you want to say with Pascal> that. Would having two modules `prometheus` and `prometheus2` be an Pascal> acceptable way? Hi Pascal, I'm sorry for my choice of words. Yes I think that having the two configuration modules running in parallel and independently in at least 19.03 (or another distribution release) would be the best. I'm currently spending some time on the patch and I'm rebasing it to current master, just in the case someone it's still interested on it.

As the configuration for the exporters and alertmanager is unchanged between the two major versions this patch tries to minimize duplication while at the same time as there's no upgrade path from 1.x to 2.x, it allows running the two services in parallel. See also NixOS#56037

azazel75 · 2019-02-21T14:56:58Z

I've completed the rebasing if someone cares and wants to test it out

Ma27

Just had a short look at the code. As I'm not a prometheus maintainer, I'm not sure how we shall proceed.

First of all I'm not sure if and when we should deprecate Prometheus 1.x support. The docs recommend to host a legacy Prometheus instance for legacy data. However the last release was in 2017 and I'm not sure how long we want to keep this in nixpkgs and when to start the deprecation phase.

An idea would be to split this into two modules, deprecate v1 support for 19.03 and replace the v1 module with a minimalistic setup which only serves data, but doesn't scrape anymore on unstable after the release.

But as I said, I'm not a prometheus maintainer and I don't have a strong opinion about this, the above is just an idea :)

Ma27 · 2019-02-25T16:35:47Z

nixos/tests/prometheus-2.nix

+              - name: test
+                rules:
+                  - record: testrule
+                    expr: count(up{job="prometheus"})


It's IMHO more readable when you do something like this:

{ rules = [ (builtins.toJSON { groups = [ { name = "test"; # and so on } ]; }) ]; }

I'm also wondering if this could be implemented directly in the module (just as an attribute set, not as a submodule)

Ma27 · 2019-02-25T17:28:49Z

nixos/modules/services/monitoring/prometheus/default.nix

+        "-alertmanager.timeout=${toString cfg.alertmanagerTimeout}s"
+        (optionalString (cfg.alertmanagerURL != []) "-alertmanager.url=${concatStringsSep "," cfg.alertmanagerURL}")
+        (optionalString (cfg.webExternalUrl != null) "-web.external-url=${cfg.webExternalUrl}")
+      ];


This can probably be simplified by not duplicating options that are used on both v1 and v2:

{ cmdlineArgs = cfg.extraFlags ++ [ # ... ] ++ (if v1 then [ /* ... */ ] else [ /* ... */ ]); }

It's probably just personal preference, but IMHO the expression currently looks heavier than it actually is.

Ma27 · 2019-02-25T17:39:25Z

nixos/tests/prometheus-2.nix

@@ -0,0 +1,35 @@
+import ./make-test.nix {
+  name = "prometheus-2";


Any reason you didn't set meta.maintainers here?

eonpatapon · 2019-03-02T11:42:27Z

To move forward there must be a consensus on the way to go, this PR allows to run v1 or v2 (not both), or PR #49802 that allows to run v1 and v2 side by side. I don't know how this kind of choice is usually made in nixpkgs.

bachp · 2019-03-02T17:08:46Z

@eonpatapon I think we should allow to run both in parallel as one of the main points of keeping both around is to allow users to implement what is recommended in the prometehus migration guide.

To retain access to your historic monitoring data we recommend you run a non-scraping Prometheus instance running at least version 1.8.1 in parallel with your Prometheus 2.0 instance [...]

If this is not possible we don't need to spend the effort and can directly replace 1 with 2.

eonpatapon · 2019-03-03T08:36:40Z

Ok, I'm closing this in favour of #49802 then

As the configuration for the exporters and alertmanager is unchanged between the two major versions this patch tries to minimize duplication while at the same time as there's no upgrade path from 1.x to 2.x, it allows running the two services in parallel. See also NixOS#56037

@eonpatapon

As the configuration for the exporters and alertmanager is unchanged between the two major versions this patch tries to minimize duplication while at the same time as there's no upgrade path from 1.x to 2.x, it allows running the two services in parallel. See also NixOS#56037 Make it pass a minimal test Fix alertmanager service definition. Thanks to @eonpatapon Rollback versionning of services.prometheus.{exporters, alertmanager}. Prometheus2: --web.external-url need two dash. Use same user for both prometheus 1 and 2. Use StateDirectory. nixos/prometheus: use ExecStart instead of a shell script This uses fewer lines of code and one less process. nixos/prometheus: get rid of empty arguments Previously the prometheus.service file looked like: ExecStart=/nix/store/wjkhfw3xgkmavz1akkqir99w4lbqhak7-prometheus-1.8.2-bin/bin/prometheus -storage.local.path=/var/lib/prometheus/metrics \ -config.file=/nix/store/zsnvzw51mk3n1cxjd0351bj39k1j6j27-prometheus.yml-check-config-checked \ -web.listen-address=0.0.0.0:9090 \ -alertmanager.notification-queue-capacity=10000 \ -alertmanager.timeout=10s \ \ Restart=always Now it's: ExecStart=/nix/store/wjkhfw3xgkmavz1akkqir99w4lbqhak7-prometheus-1.8.2-bin/bin/prometheus \ -storage.local.path=/var/lib/prometheus/metrics \ -config.file=/nix/store/zsnvzw51mk3n1cxjd0351bj39k1j6j27-prometheus.yml-check-config-checked \ -web.listen-address=0.0.0.0:9090 \ -alertmanager.notification-queue-capacity=10000 \ -alertmanager.timeout=10s Restart=always nixos/tests: register the prometheus2 test

As the configuration for the exporters and alertmanager is unchanged between the two major versions this patch tries to minimize duplication while at the same time as there's no upgrade path from 1.x to 2.x, it allows running the two services in parallel. See also NixOS#56037

As the configuration for the exporters and alertmanager is unchanged between the two major versions this patch tries to minimize duplication while at the same time as there's no upgrade path from 1.x to 2.x, it allows running the two services in parallel. See also NixOS#56037 (cherry picked from commit 11b8972)

eonpatapon added 2 commits February 18, 2019 20:40

prometheus_2: 2.6.0 -> 2.7.1

67e0411

nixos/prometheus: support v2

020e024

* Added `alerting` option for prometheus v2 * Added test for prometheus v2

eonpatapon requested a review from infinisil as a code owner February 19, 2019 08:25

GrahamcOfBorg added 6.topic: nixos 8.has: module (update) 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 labels Feb 19, 2019

Ma27 reviewed Feb 25, 2019

View reviewed changes

eonpatapon closed this Mar 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nixos/prometheus: support v2 #56037

nixos/prometheus: support v2 #56037

eonpatapon commented Feb 19, 2019

eonpatapon commented Feb 19, 2019

andir commented Feb 19, 2019

bachp commented Feb 19, 2019

Ma27 commented Feb 19, 2019

fpletz commented Feb 19, 2019 •

edited

eonpatapon commented Feb 20, 2019 •

edited

bachp commented Feb 20, 2019

griff commented Feb 20, 2019

bachp commented Feb 20, 2019

griff commented Feb 21, 2019

azazel75 commented Feb 21, 2019

bachp commented Feb 21, 2019

eonpatapon commented Feb 21, 2019

azazel75 commented Feb 21, 2019 via email

azazel75 commented Feb 21, 2019

Ma27 left a comment

Ma27 Feb 25, 2019

Ma27 Feb 25, 2019

Ma27 Feb 25, 2019

eonpatapon commented Mar 2, 2019

bachp commented Mar 2, 2019

eonpatapon commented Mar 3, 2019

		@@ -0,0 +1,35 @@
		import ./make-test.nix {
		name = "prometheus-2";

nixos/prometheus: support v2 #56037

nixos/prometheus: support v2 #56037

Conversation

eonpatapon commented Feb 19, 2019

Motivation for this change

Things done

eonpatapon commented Feb 19, 2019

andir commented Feb 19, 2019

bachp commented Feb 19, 2019

Ma27 commented Feb 19, 2019

fpletz commented Feb 19, 2019 • edited

eonpatapon commented Feb 20, 2019 • edited

bachp commented Feb 20, 2019

griff commented Feb 20, 2019

bachp commented Feb 20, 2019

griff commented Feb 21, 2019

azazel75 commented Feb 21, 2019

bachp commented Feb 21, 2019

eonpatapon commented Feb 21, 2019

azazel75 commented Feb 21, 2019 via email

azazel75 commented Feb 21, 2019

Ma27 left a comment

Choose a reason for hiding this comment

Ma27 Feb 25, 2019

Choose a reason for hiding this comment

Ma27 Feb 25, 2019

Choose a reason for hiding this comment

Ma27 Feb 25, 2019

Choose a reason for hiding this comment

eonpatapon commented Mar 2, 2019

bachp commented Mar 2, 2019

eonpatapon commented Mar 3, 2019

fpletz commented Feb 19, 2019 •

edited

eonpatapon commented Feb 20, 2019 •

edited