Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/postgresql-wal-receiver: add module #63799

Merged
merged 1 commit into from Aug 11, 2019

Conversation

pacien
Copy link
Contributor

@pacien pacien commented Jun 26, 2019

Motivation for this change

This PR adds a service module for pg_receivewal, a program which streams the transaction log from a PostgreSQL cluster into a local directory for PITR.

To do
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@pacien pacien requested a review from infinisil as a code owner June 26, 2019 04:57
@pacien
Copy link
Contributor Author

pacien commented Jun 26, 2019

Note: I'm not sure whether the services defined in this module should use DynamicUser instead of creating a dedicated user. It is desirable for the generated backup files to have a stable owner after all.

@danbst
Copy link
Contributor

danbst commented Jul 21, 2019

I have a few nitpicks for module itself, but biggest concern is that test is missing. Can you provide a test file for this?

@pacien pacien force-pushed the postgresql-wal-receiver branch 2 times, most recently from 507962d to 4a2520e Compare July 23, 2019 08:53
@pacien
Copy link
Contributor Author

pacien commented Jul 23, 2019

Can you provide a test file for this?

Test added.

in mkIf (receivers != { }) {
users = {
groups.postgresql-wal-receiver = { };
users.postgresql-wal-receiver = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflow says better stick to short usernames. Something like walreceiver would be fine, what do you think?

And also, if it is possible to implement master-slave replication using this module, maybe the running user should be configurable? In test you explicitly do chown -R postgres:postgres, but it may not strictly required.

Copy link
Contributor Author

@pacien pacien Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflow says better stick to short usernames. Something like walreceiver would be fine, what do you think?

Indeed. I've modified the user and group accordingly. The user's description should be enough.

And also, if it is possible to implement master-slave replication using this module, maybe the running user should be configurable?

This program and module are solely for proper WAL archiving.
Making the user and group configurable might be useful but I cannot think of a real use-case.

In test you explicitly do chown -R postgres:postgres, but it may not strictly required.

Indeed, but on a real setup, the WAL receiver would run on a separate machine and the restoration would have used a copy of the log.

@danbst
Copy link
Contributor

danbst commented Jul 23, 2019

Niiice! Looking forward on future module for master-slave setup using NixOS!

Most of my comments are stylistic, and I'm fine with merging this as-is if you can demonstrate it works with all PGs.

@pacien pacien force-pushed the postgresql-wal-receiver branch 3 times, most recently from 1284a5d to ee4ef75 Compare July 24, 2019 04:05
@pacien
Copy link
Contributor Author

pacien commented Jul 24, 2019

@danbst wrote:

Most of my comments are stylistic, and I'm fine with merging this as-is if you can demonstrate it works with all PGs.

Thanks for your review. I've modified the test to run it against all the available PostgreSQL package versions.

@danbst
Copy link
Contributor

danbst commented Jul 24, 2019

Still, pg 9.4 doesn't pass test. 9.5, 9.6, 10 and 11 work.

If it is complicated to fix test for pg9.4, then maybe we should skip it.

@pacien
Copy link
Contributor Author

pacien commented Jul 25, 2019

@danbst wrote:

Still, pg 9.4 doesn't pass test. 9.5, 9.6, 10 and 11 work.

I just fixed an issue with the absence of partial segments. It should work now.

@danbst
Copy link
Contributor

danbst commented Jul 25, 2019

@pacien oh, there is one more issue. You create a new user walreceiver, which stores some state on disk. Without settting uid/gid explicitly, the uid/gid of this user may change between deployments (if it is not specified, it is allocated from shared pool of uids)

https://github.com/NixOS/nixpkgs/blob/release-19.03/nixos/modules/misc/ids.nix

@pacien
Copy link
Contributor Author

pacien commented Jul 25, 2019

@danbst wrote:

@pacien oh, there is one more issue. You create a new user walreceiver, which stores some state on disk. Without settting uid/gid explicitly, the uid/gid of this user may change between deployments (if it is not specified, it is allocated from shared pool of uids)

https://github.com/NixOS/nixpkgs/blob/release-19.03/nixos/modules/misc/ids.nix

That file says "don't use uids above 399!" and we're dangerously approaching this limit.
Maybe should we just use the existing postgres user for this module to avoid taking another UID and GID?

@danbst
Copy link
Contributor

danbst commented Jul 25, 2019

@pacien yeah, it would be the best

@pacien
Copy link
Contributor Author

pacien commented Jul 25, 2019

Is it actually possible to define the same user in multiple modules?
It must be possible to use this module without enabling PostgreSQL itself.
I could use mkDefault for users but I think this would be an ugly workaround.

@danbst
Copy link
Contributor

danbst commented Jul 25, 2019

yes:

$ nix-instantiate nixos --arg configuration '
   { boot.isContainer = true; 
     imports = [ 
       { users.users.someuser.isSystemUser = true; } 
       { users.users.someuser.isSystemUser = true; }  
   ]; }' -A system

the idea is that if we assign same option same value, it would be merged correctly. It won't work only for types.unique type AFAIK.

@pacien
Copy link
Contributor Author

pacien commented Jul 25, 2019

It seems like the UID and GID attributes aren't merged when both modules are enabled:

error: The unique option `users.users.postgres.uid' is defined multiple times, in `/home/kea/code/nix/nixpkgs/nixos/modules/services/backup/postgresql-wal-receiver.nix' and `/home/kea/code/nix/nixpkgs/nixos/modules/services/databases/postgresql.nix'.

@danbst
Copy link
Contributor

danbst commented Jul 25, 2019

@pacien oh shit.

    int = mkOptionType rec {
        name = "int";
        description = "signed integer";
        check = isInt;
        merge = mergeOneOption;
      };

merge function should really be mergeEqualOption (like it is for bool option). I'll open a PR, but I can't guarantee it will be merged...

@danbst
Copy link
Contributor

danbst commented Jul 25, 2019

@pacien

I could use mkDefault for users but I think this would be an ugly workaround.

seems it is the simplest solution. We can also introduce internal option in postgresql.nix module, which enables user only. This is slightly more invasive, but removes duplication.

@infinisil
Copy link
Member

@danbst See #60732 (comment) (and later comments) for the running out of uid/gid thing

@pacien
Copy link
Contributor Author

pacien commented Jul 29, 2019

Rebased and amended.

testScript = ''
# make an initial base backup
$machine->waitForUnit('postgresql');
$machine->waitForUnit('postgresql-wal-receiver-main');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pg 9.4 still didn't work. I figured out, that it was because walreceiver polls PG every 5 seconds, and it happens that stop postgresql is done during this period "between healthchecks". Timing issue, not sure why it is fine on 9.5+

I see two solutions:

  • add
+      # WAL receiver healthchecks PG every 5 seconds, so let's be sure they have connected each other
+      # required only for 9.4
+      $machine->sleep(5);

after this line

  • or wait until some file occurs in WAL backup dir before stopping PG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the suggested delay and amended.

I couldn't reproduce the timing issue with pg 9.4 on my machine in the first place.
Could you confirm that this solves the issue on yours?

@danbst danbst merged commit 4ff9a48 into NixOS:master Aug 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants