Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus 2 service in parallel with 1.x version #49802

Closed
wants to merge 3 commits into from

Conversation

azazel75
Copy link
Contributor

@azazel75 azazel75 commented Nov 5, 2018

Motivation for this change

Cleaned up version of #46069. The reason for this is that there's no clean upgrade path from 1.x to 2.x, the suggested solution by the Prometheus team is that of let them run in parallel an use the 1.x instance as source for the 2.x.

in the original PR someone suggested a more copy and paste approach. I don't know, I don't really need Prometheus 1.x. So is up to you to decide

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

promUser = "prometheus";
promGroup = "prometheus";
prom2User = "prometheus2";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it would be acceptable (preferable, even?) to have prometheus and prometheus2 share the same UID/GID, unless there's a reason that won't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benley It's centainly preferable to have just one user. As of now the data dir /var/lib/prometheus(2) is also the home of the user and that's is used to create the dir and to ensure that the dir is automatically owned by it. I have yet to find out what's the best way to proceed if the data dirs are two. What you advise me to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, good question. I suppose you could put prometheus 2 data in a subdirectory of prometheus 1's data, but that introduces some danger when someone goes to delete their old prometheus 1 data dir once they're finished migrating. Maybe the systemd unit could create the new data dir before starting the prometheus daemon?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see systemd.tmpfiles.rules in other modules to create dirs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also create the directory with

serviceConfig =  { StateDirectory = "prometheus2" };

in the systemd config


cmdlineArgs2 = cfg2.extraFlags ++ [
"--storage.tsdb.path=${cfg2.dataDir}/data/"
"--config.file=${prometheusYml}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"--config.file=${prometheusYml}"
"--config.file=${prometheus2Yml}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some test of your new module - seems to be a error here - new prometheus2 configuration will not be generated. Thanks for your work! BR Erik

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is was integrated

@ivan
Copy link
Member

ivan commented Feb 12, 2019

I would like to test this, could you please update it for master?

@bachp bachp mentioned this pull request Feb 19, 2019
10 tasks
As the configuration for the exporters and alertmanager is unchanged
between the two major versions this patch tries to minimize
duplication while at the same time as there's no upgrade path from 1.x
to 2.x, it allows running the two services in parallel. See also NixOS#56037
@azazel75
Copy link
Contributor Author

@ivan I've rebased the patch and integrated all the changes that happened on master, can/will you test it out?

@eonpatapon
Copy link
Contributor

eonpatapon commented Feb 21, 2019

There is some syntax error in the alertmanager module

syntax error, unexpected '=', expecting ';', at /.../nixpkgs/nixos/modules/services/monitoring/prometheus/alertmanager.nix:140:14

@azazel75
Copy link
Contributor Author

@eonpatapon should be fixed now

@griff
Copy link
Contributor

griff commented Feb 21, 2019

@azazel75 why did you also make 2 services for each of the exporters? They look to be the exact same services but configured to be under services.prometheus.exportes and services.prometheus2.exportes. Just because I switch to running both prometheus 1 & 2 I would not need to run e.g. both services.prometheus.exportes.node and services.prometheus2.exportes.node so why have them as separate services?

@azazel75
Copy link
Contributor Author

Sorry @griff but I don't understand precisely what your arguments against two set of exporters configurations are. their configuration modules access their own configuration stored as services.prometheus.exporters.foo or services.prometheus2.exporters.foo. Maybe for the "node" exporter can't be more than one exporter configured in a system, but generally speaking maybe in a situation where both prometheus1 and 2 are running there could be the need for concurrent running of two exporters of the same type. I don't have a concrete example though.

@griff
Copy link
Contributor

griff commented Feb 21, 2019

@azazel75 I may have missunderstod how the exporters work but let me try and explain my point better with my understanding.

As such exporters are http servers that will get called by the prometheus server at an interval. My understanding is that there is no problem with two prometheus servers calling the same exporter at different intervals. i.e. they don't disturb each other and state is not reset on a scrape. If this is correct running the same exporter twice is just a waste of CPU and RAM since just one can be scraped from multiple servers and having multiple services that allow you to run the same exporter twice signals that you need to run them twice if running both prometheus 1 & 2.

@azazel75
Copy link
Contributor Author

azazel75 commented Feb 21, 2019 via email

@bachp
Copy link
Member

bachp commented Feb 21, 2019

@azazel75 I think a clean copy and past of prometheus would be easier to maintain if we say that we want to get rid of prometheus 1 in the long run.

I'm with @griff for the exporters. From my experience they are completly independet of the prometheus version you are running. So I would not add additional services for them.

@griff
Copy link
Contributor

griff commented Feb 22, 2019

@azazel75 I am sorry if I wrongly gave the impression that I am any kind of authority on this subject or that I have merge permissions. I am just a normal user like yourself that wants NixOS to be as great as possible and therefore offered my personal opinions.

As for your response of wanting to run different exporters with different parameters the question I have is why then only allow 2? The prometheus exporter format is becoming pretty standard and supported by other metrics solutions (like telegraf and DataDog to mention just 2) so if the argument is that we want to support different scrapers hitting the same exporter type run with different arguments why not allow me to run the same exporter 3 times or as many times as I want?

@benley
Copy link
Member

benley commented Feb 22, 2019

why not allow me to run the same exporter 3 times or as many times as I want?

This is an issue that affects variety of services in NixOS, and probably not one we should try to solve in this PR. The existing pattern of services.<thing>.enable doesn't lend itself well to supporting multiple instances of a given service, and I'm not sure what the best approach is to improve upon it.

@eonpatapon
Copy link
Contributor

There is still an issue with alertmanager module, here's the patch to fix it:

diff --git a/nixos/modules/services/monitoring/prometheus/alertmanager.nix b/nixos/modules/services/monitoring/promethe
index f19dd709a21..e9127068175 100644
--- a/nixos/modules/services/monitoring/prometheus/alertmanager.nix
+++ b/nixos/modules/services/monitoring/prometheus/alertmanager.nix
@@ -152,7 +152,7 @@ let
           after    = [ "network.target" ];
           script = ''
             ${amCfg.package}/bin/alertmanager \
-              ${concatStringsSep " \\\n  " cmdlineArgs}
+              ${concatStringsSep " \\\n  " (mkCmdlineArgs amCfg)}
           '';
           serviceConfig = {
             User = amCfg.user;

@azazel75
Copy link
Contributor Author

Thanks @eonpatapon. @griff there's no need to worry. It's just that I don't have the power to merge this either but to answer your example of running more that two sets of exporters, it seems out of scope. Anyway I'm running out of time that I want to spend on this, feel free to take what you need and propose your own or directly modify this ( If I can do something about that, just tell me and I wil do It)

@@ -4,8 +4,10 @@ with lib;

let
cfg = config.services.prometheus.exporters;
cfg2 = config.services.prometheus2.exporters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two sets of exports does not make any sense. Please remove these changes.

in {
options = {
services.prometheus.alertmanager = {
cfg2 = config.services.prometheus2.alertmanager;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second instance/version of alertmanager does not make sense. Prometheus sends queries to alertmanager to trigger alerts, so one instance is enough.

@@ -339,6 +339,7 @@
rss2email = 312;
cockroachdb = 313;
zoneminder = 314;
prometheus2 = 315;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One user for both prometheus instances would be enough. Especially because we are going to remove prometheus1 soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@jbgi jbgi mentioned this pull request Mar 20, 2019
10 tasks
@disassembler
Copy link
Member

@azazel75 are you able to finish up the requested changes? If not, @jbgi can take this over as IOHK needs this PR as well for a project we're working on.

@azazel75
Copy link
Contributor Author

@disassembler, @jbgi please go for it. The only detail you have to pay attention to is the creation of the working directory if you are going to use just one user for both the services in such case, one directory is created as home of the user, the other needs to be created when the service is enabled. The proposed solutions involve the use of systemd-tmpfiles subsystem or the StateDirectory service config entry. From my pow the former while effective seems like using something that was intented for other purposes ( see man tmpfiles.d) and the latter while elegant and effective need for the working directory to be located under /var/lib.

@arianvp
Copy link
Member

arianvp commented Mar 25, 2019

I would in both cases use StateDirectory instead of relying on home. System users in general shouldn't have a home directory unless for compatibility reasons (the service is hard coded to look stuff up in $HOME). I'm pretty sure prometheus doesn't do this so not allocating a home directory should be safe.

This also makes it possible to later switch to DynamicUser instead.

@jbgi
Copy link
Contributor

jbgi commented Mar 25, 2019

Added some more commits in #58255 to address the points raised.

@basvandijk
Copy link
Member

Let's close this PR in favour of #58255 which includes the commits of this PR and fixes the remaining issues.

@basvandijk basvandijk closed this Apr 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet