Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/geoip-updater: new service #16027

Closed
wants to merge 1 commit into from
Closed

Conversation

bjornfor
Copy link
Contributor

@bjornfor bjornfor commented Jun 6, 2016

The GeoIP Legacy databases from MaxMind have no stable URLs and change
every month (or so). Our current method of packaging these database in
Nix and playing catch-up with the ever-changing file hashes is a bad
idea. For instance, it makes it impossible to realize old NixOS
configurations.

This patch adds a NixOS service that periodically updates the GeoIP
databases in /var/lib/geoip-databases. Future patches will move users
over.

@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @edolstra, @offlinehacker and @lethalman to be potential reviewers

@bjornfor
Copy link
Contributor Author

bjornfor commented Jun 6, 2016

I guess there is little point in this service without any of the geolite-legacy users (packages) moving over to it. Hence this is WIP for now; it needs testing, community feedback and moving some packages over to use this runtime dependency, instead of using the current "geolite-legacy" Nix attribute (which breaks every month).

@danbst
Copy link
Contributor

danbst commented Jun 7, 2016

My suggestions:

  1. make it possible geoip = pkgs.geoip.override { geoipDatabase = config.services.geoip-databases.databaseDir; }
  2. fetch&unpack geoip2 dbs too, GeoLite2-Country.mmdb.gz and GeoLite2-City.mmdb.gz. I am using the v2 now in my own update script.
  3. maybe package some geoip2 tooling? libmaxminddb, mmdblookup. The former is a better library when you want to create new applications using geoip API. I have a working expression for it:
  libmaxminddb = stdenv.mkDerivation rec {
    name = "libmaxminddb-${version}";
    version = "1.2.0";
    src = fetchurl {
      url = "https://github.com/maxmind/libmaxminddb/releases/download/${version}/libmaxminddb-${version}.tar.gz";
      sha256 = "0dxdyw6sxxmpzk2a96qp323r5kdmw7vm6m0l5a8gr52gf7nmks0z";
    };
  };

@rardiol
Copy link
Contributor

rardiol commented Jun 23, 2016

What about nix outside nixos?

@bjornfor
Copy link
Contributor Author

I guess we can expose the updater script to the user (make a package), so it can be run manually.

@mbrgm
Copy link
Member

mbrgm commented Jan 2, 2017

@bjornfor Is missing adoption the only argument holding this back from merging? What about the use cases where the geolite DB is referenced in extraConfiguration by the user, not by the package itself. I would really like to see this merged and I think adoption will follow afterwards.

@bjornfor
Copy link
Contributor Author

bjornfor commented Jan 2, 2017

If people find this useful as is, I'm fine with merging (of course).

@mbrgm
Copy link
Member

mbrgm commented Jan 2, 2017

I could use this with the graylog module, for example.

@bjornfor bjornfor force-pushed the geoip-service branch 4 times, most recently from 41171ba to 0e53d08 Compare January 6, 2017 21:44
@bjornfor
Copy link
Contributor Author

bjornfor commented Jan 6, 2017

I noticed that the service would fail started at boot. It seems to be running before the network is up. I looked at man systemd.special and tried to fix it with

-      after = [ "network.target" ];
+      after = [ "network-online.target" "nss-lookup.target" ];
+      wants = [ "network-online.target" ];

It doesn't make a difference, it still fails. Also had a look at https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ (suggested by the man page). But not liking the options.

Any ideas?

I guess either I'll add a "try a few times" loop or dig into the systemd special targets and how they are implemented on NixOS. (Perhaps we have a network-online.target dependency bug?)

(I added some error handling so that we'll know when the service fails.)

@Mic92
Copy link
Member

Mic92 commented Jan 6, 2017

@bjornfor I think network-online.target will become only useful, if you use systemd-networkd:

$ systemctl cat systemd-networkd-wait-online.service
# /nix/store/v5mly2b8568snc2swxs1br7dbcwn7p9f-systemd-231/example/systemd/system/systemd-networkd-wait-online.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Wait for Network to be Configured
Documentation=man:systemd-networkd-wait-online.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
Requisite=systemd-networkd.service
After=systemd-networkd.service
Before=network-online.target

[Service]
Type=oneshot
ExecStart=/nix/store/v5mly2b8568snc2swxs1br7dbcwn7p9f-systemd-231/lib/systemd/systemd-networkd-wait-online
RemainAfterExit=yes

[Install]
WantedBy=network-online.target

dhcpcd does not fit into this.

@bjornfor bjornfor force-pushed the geoip-service branch 2 times, most recently from 3113df6 to 45082c9 Compare January 7, 2017 11:54
{ description = "GeoIP Updater Service Timer";
partOf = [ "geoip-updater.service" ];
wantedBy = [ "timers.target" ];
timerConfig.OnCalendar = cfg.interval;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid peaks on maxmind's server backend, we should use a random delay here:

RandomizedDelaySec=3600

or an Interval between updates instead of fixed dates:

OnBootSec=15min
OnUnitActiveSec=1w

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I added RandomizedDelaySec=3600.

@bjornfor bjornfor force-pushed the geoip-service branch 2 times, most recently from f721512 to 666e1c4 Compare January 7, 2017 12:48
@bjornfor
Copy link
Contributor Author

bjornfor commented Jan 7, 2017

Hm, looks like I introduced a regression when adding syslog priority levels with "logger". systemd doesn't understand which unit the messages come from. So "systemctl status geoip-updater" doesn't show any messages. But the journal contains all messages (with proper error levels).

I'll see if I can make it work with "logger", or else find another solution. I really want to have "journalctl -b -perr".

@bjornfor
Copy link
Contributor Author

bjornfor commented Jan 7, 2017

Doh, it's as simple as echo "<prio>message" :-)

@danbst: Do you mind if I squash all commits into one? I'll write a small changelog in the commit message and describe your changes.

@Mic92
Copy link
Member

Mic92 commented Jan 7, 2017

@bjornfor systemd-cat is probably what you are looking for.

@bjornfor
Copy link
Contributor Author

bjornfor commented Jan 7, 2017

Looks like systemd-cat is for when you want to log from a program running outside of systemd. We're a systemd service, so our output is already directed to the log. Only the priority was missing (now fixed).

@danbst
Copy link
Contributor

danbst commented Jan 7, 2017

@bjornfor i'm ok

@bjornfor bjornfor force-pushed the geoip-service branch 3 times, most recently from 780bba3 to 3ed23ef Compare January 8, 2017 01:31
@bjornfor
Copy link
Contributor Author

Updated again. From commit message:

   Changes v5 -> v6:
      - Update database files atomically (per DB)
      - If a database is removed from the configuration, it'll be removed
        from /var/lib/geoip-databases too (on next run).
      - Add NixOS module assertion so that if user inputs non- .gz or .xz
        file there will be a build time error instead of runtime.
      - Run updater as user "nobody" instead of "root".
      - Rename NixOS service from "geoip-databases" to "geoip-updater".
      - Drop RemainAfterExit, or else the timer won't trigger the unit.
      - Bring back "curl --fail", or else we won't catch and log curl
        failures.

@danbst
Copy link
Contributor

danbst commented Jan 10, 2017

Perhaps we should bring back proxy service, that has wantedBy = multi-user.target and performs update-script with parameter "--update-if-not-exists". The parameter "--update-if-not-exists" won't download DB if file is already downloaded.

@bjornfor bjornfor closed this Jan 10, 2017
@bjornfor bjornfor reopened this Jan 10, 2017
@bjornfor bjornfor force-pushed the geoip-service branch 2 times, most recently from a8869ac to 240db1e Compare January 10, 2017 19:30
@bjornfor
Copy link
Contributor Author

Another update:

Changes v6 -> v7:
      - Add --skip-existing flag to geoip-updater, which skips updating
        existing database files. Pass that flag when we run the service on
        boot (and on any NixOS configuration change).
        (IMHO, this is somewhat a workaround for systemd persistent timers
        not being triggered immediately when a timer has never expired
        before. But it does have the nice side effect of ensuring that the
        installed databases always correspond to the configured ones, since
        the service is now always run after configuration changes.)

@bjornfor bjornfor changed the title WIP: nixos/geoip-databases: new service nixos/geoip-updater: new service Jan 10, 2017
@danbst
Copy link
Contributor

danbst commented Jan 12, 2017

Hm, I'm doing some testing and get this on rebuild

$ nixops deploy
building all machine configurations...
jolly.roger> copying closure...
fleet> closures copied successfully
jolly.roger> updating GRUB 2 menu...
jolly.roger> activating the configuration...
jolly.roger> setting up /etc...
jolly.roger> the following new units were started: geoip-updater-setup.service
jolly.roger> activation finished successfully
fleet> deployment finished successfully

This line the following new units were started: geoip-updater-setup.service is printed on every redeploy, even if nothing changed.

@danbst
Copy link
Contributor

danbst commented Jan 12, 2017

RemainAfterExit is required in setup service, so it won't be started on every activation, only when service changed.

Btw, really nice log output!

-- Logs begin at Thu 2016-09-01 10:06:58 UTC, end at Thu 2017-01-12 15:33:20 UTC. --
Jan 12 15:32:50 jolly.roger systemd[1]: Starting GeoIP Updater Setup...
Jan 12 15:32:50 jolly.roger systemd[1]: Started GeoIP Updater Setup.
Jan 12 15:32:50 jolly.roger geoip-updater[11112]: Option --skip-existing is set: not updating existing databases
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Server is reachable (try 1)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Starting update of GeoIP databases in /var/lib/geoip-databases
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Skipping existing file: /var/lib/geoip-databases/GeoLite2-Country.mmdb
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoIPASNum.dat (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoIPASNumv6.dat (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoIP.dat (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoIPv6.dat (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoLite2-City.mmdb (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoLiteCity.dat (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Removed GeoLiteCityv6.dat (not listed in services.geoip-updater.databases)
Jan 12 15:32:51 jolly.roger geoip-updater[11112]: Completed GeoIP database update in /var/lib/geoip-databases

@danbst
Copy link
Contributor

danbst commented Jan 12, 2017

RemainAfterExit works really strange. On first activation geoip-updater-setup.service is started. On timer trigger it is stopped (because it conflicts with geoip-updater.service), so it will be started again on next activation. I hope this is minor issue.

Another strange consequence of setting RandomizedDelaySec=3600 is it affects interval in a non-obvious way when you want to test this service. For example, if I set interval to minutely, I don't get update every minute (every random 0-3600 seconds). So automated test (which is to be written) should unset RandomizedDelaySec to make test not time out.

@bjornfor
Copy link
Contributor Author

I noticed some timer weirdness too, with short interval.

Just pushed an update with RemainAfterExit=true. I guess it's better that way. (Still, the design with the two services and the "conflict" feels a bit odd...)

Is having a test for this service really needed? I was under the impression that tests couldn't use (external) networking, so either I'm wrong and we can write a test (if wanted) or else we have to write the test so that it creates a fake maxmind.com server.

@danbst
Copy link
Contributor

danbst commented Jan 12, 2017

The only thing we can do with timers and short intervals is to specify in interval option description, that it adds randomized delay to that interval, so it will be inprecise for short intervals.

As for test, I think you're right, it will be hard to write such test. ping @kampfschlaefer for his opinion on using fake maxmind server.

@Mic92
Copy link
Member

Mic92 commented Jan 13, 2017

(closed by accident)

Copy link
Contributor

@danbst danbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using this module now on staging machine, let's see in a week how it goes. I've tested basic features (setting interval, changing database list, booting VM, rebooting VM) and all of these now work just like advertised.

@bjornfor
Copy link
Contributor Author

@Mic92:

if RandomizedDelaySec is not a good fit, I would suggest instead (this has an even better distribution as a it depends on the boot of

I personally would prefer to do the update at a fixed time, preferably at night.

@bjornfor
Copy link
Contributor Author

MaxMind say the (open source) databases are updated on the first tuesday of the month. So we actually only need to download once a month.

Because I don't like that the current service run on every boot, I've been thinking of this:

  • One service is triggered by timer, once a month.
  • One service is triggered by the /var/lib/geoip-database directory not existing (for initial setup).

Pro:

  • No needless running the service when there is nothing to do

Con:

  • A service configuration change will not cause a (re)run. I.e. if you change the list of databases, it won't be reflected in /var/lib/geoip-databases until next timer interval. (Or you trigger it yourself.)

I really like /var/lib/geoip-databases to be as accurate to the configuration as possible. But even with the current scheme (run every week, run on every boot), there is no guarantee that things are OK. You still have to check if the service ran successfully. It's doing impure IO, after all :-) Hence, I'm thinking it might be OK to just run the initial setup service once. If it fails that time, then oh well, look at "systemctl --failed" and restart it.

Thoughts?

The GeoIP databases from MaxMind have no stable URLs and change every
month (or so). Our current method of packaging these database in Nix and
playing catch-up with ever-changing file hashes is a bad idea. For
instance, it makes it impossible to realize old NixOS configurations.

This patch adds a NixOS service that periodically updates the GeoIP
databases in /var/lib/geoip-databases. Moving NixOS modules over can be
done in later patches.

I tried adding MD5 check, but not all databases have them, so i skipped
it. We are downloading over HTTPS though, it should be good. I also
tried adding zip support, but the first zip file I extracted had a
different filename inside than the archive name, which breaks an
assumption in this service, so I skipped that too.

Changes v9 -> v10:
  - Pass "--max-time" to curl to set upper bound on downloads (ensures
    no indefinite hanging if there's problem with networking).
    Timeout for network connectivity check: 60s.
    Timeout for geoip database (each): 15m.

Changes v8 -> v9:
  - Mention the random timer delay in the documentation for the
    'interval' option.

Changes v7 -> v8:
  - Add "RemainAfterExit=true" for the setup service, so it won't be
    restarted needlessly. (Thanks @danbst!)

Changes v6 -> v7:
  - Add --skip-existing flag to geoip-updater, which skips updating
    existing database files. Pass that flag when we run the service on
    boot (and on any NixOS configuration change).
    (IMHO, this is somewhat a workaround for systemd persistent timers
    not being triggered immediately when a timer has never expired
    before. But it does have the nice side effect of ensuring that the
    installed databases always correspond to the configured ones, since
    the service is now always run after configuration changes.)

Changes v5 -> v6:
  - Update database files atomically (per DB)
  - If a database is removed from the configuration, it'll be removed
    from /var/lib/geoip-databases too (on next run).
  - Add NixOS module assertion so that if user inputs non- .gz or .xz
    file there will be a build time error instead of runtime.
  - Run updater as user "nobody" instead of "root".
  - Rename NixOS service from "geoip-databases" to "geoip-updater".
  - Drop RemainAfterExit, or else the timer won't trigger the unit.
  - Bring back "curl --fail", or else we won't catch and log curl
    failures.

Changes v4 -> v5:
  - Add "GeoLite2-City.mmdb.gz" to default database list.

Changes v3 -> v4:
  - Remove unneeded geoip-updater-setup.service after adding
    'wantedBy = [ "multi-user.target" ]' directly to
    geoip-updater.service
  - Drop unneeded "Service" name from service descriptions.

Changes v2 -> v3:
  - Network may be down when starting from a cold boot, so try a few
    times. Possibly, if using systemd-networkd, it'll pass on the first
    try. But with default DHCP on NixOS, the service is started before
    hostnames can be resolved and thus we need a few extra seconds.
  - Add error handling and mark service as failed if fatal error.
  - Add proper syslog log levels.
  - Add RandomizedDelaySec=3600 to the timer to not put high load on the
    MaxMind servers. Suggested by @Mic92.
  - Set RemainAfterExit on geoip-updater.service instead of
    geoip-updater-setup.service. (The latter is only a proxy that pulls
    in the former service).

Changes v1 -> v2:
From Данило Глинський (Danylo Hlynskyi) <abcz2.uprola@gmail.com>:
  nixos/geoip-databases: add `databases` option and fix initial setup

  There were two great issues when using this service:
  - When you just enable service, databases aren't downloaded, they are
    downloaded when timer triggers. Fixed this with automatic download on
    first system activation.
  - When there is no internet, updater outputs nothing to logs, which is
    IMO misbehavior. Fixed this with removing `--fail` option, better be
    explicit here.
@danbst
Copy link
Contributor

danbst commented Feb 12, 2017

Can this be merged as-is, and tunings/thoughts separated to another issue? I find this service useful in it's current form (and I use it on-top of 16.09).

@bjornfor
Copy link
Contributor Author

Applied to master: 824d82f

I see no problem with cherry-picking this to release-16.09. I'll do that later if nobody objects.

@bjornfor bjornfor closed this Feb 12, 2017
@bjornfor bjornfor deleted the geoip-service branch February 12, 2017 14:09
@danbst
Copy link
Contributor

danbst commented Feb 12, 2017

Awesome! btw, much thanks @bjornfor for your initiative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants