Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/postgresql: add statements option to replace the ensureDatabases and ensureUsers options #84146

Closed
wants to merge 2 commits into from

Conversation

aanderse
Copy link
Member

@aanderse aanderse commented Apr 3, 2020

Motivation for this change

I have been brainwashed convinced in a reasonably fashion by @grahamc that the ensure* options aren't a good fit for NixOS. I'm seeking to remove these options from both the mysql and postgresql modules, but still leave other module authors who wish to provision databases with a reasonable way to do so. My plan for postgresql and mysql was as follows:

  • remove the ensureDatabases and ensureUsers options
  • add a statements (this is a horrible name, please someone come up with a better one) option which module authors (and NixOS users) can use to write idempotent sql statements which will be executed as the mysql/postgresql super admin user to do things like provision local accounts, databases, etc... (ie. replace the ensureDatabases and ensureUsers options with a new option that provides the same functionality, but with even more flexibility)
  • replace all usage of ensureDatabases and ensureUsers in NixOS with the new statements option
  • strongly caution NixOS users against carelessly using the statements option, very explicitly explaining how it leads to systems which aren't reproducible and why that is bad
  • separate the execution of the sql in the statements option from the postgresql/mysql systemd service so a full restart of postgresql/mysql is not required every time someone alters statements (this step would be done in a separate PR immediately following the acceptance and merging of this PR)

I wrote some code for the mysql module to prove this idea works, and I was relatively happy with the results. Not being a postgresql guy myself, I decided I needed to prove this idea translated adequately into postgresql. I was surprised to learn postgresql doesn't provide a very convenient way to write idempotent statements for database and user creation like mysql does 😱 You can't write create database if not exists! I searched around and found what appears to be the "best" solution and adopted that method for this PR. For example, creating a database: select 'create database nextcloud owner nextcloud' where not exists (select from pg_database where datname = 'nextcloud')\gexec

You might ask why should we get rid of the ensure* options?
Aside from promoting systems which aren't reproducible, the ensure* options have a few other major problems with postgresql, mainly dealing with encoding and extensions (which @Ma27 ended up being the victim of, see the recent matrix-synapse fiasco for details on that 😞). Currently all attempts to extend the ensureDatabase options have led to scenarios where a users configuration.nix could very easily not match the reality of what the database actually looks like. After investigating what it would take to make the ensure* options truly reproducible and actually enforce what they describe I came to the conclusion this is simply a bad idea because any database changes we might make can lead to data corruption and require a backup before making them - something we don't want to do every time the user does a rebuild.

I am very interested in feedback from postgresql users to see how they feel about this PR. I'm guessing that some of you will argue that users can simply add sql statements to the .postStart of the postgresql systmed unit - while I'm not completely opposed to that idea, I do see it as somewhat hacky, provides no documentation, and causes postgresql restarts with no easy and user-facing-api friendly way correct that.

If you have the capacity to please take a few moments to consider this proposal and comment with your thoughts. I'm very interested in improving the database ecosystem we have here, but will need to rely on the postgresql experts because I live in a mysql (mariadb) world.

Also note that this is a draft PR and the code is very sloppy proof of concept. I'll get around to cleaning it up after I get some feedback to see if I'm heading in the right direction here.

Thanks 🎉

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

cc @flokli

@danbst
Copy link
Contributor

danbst commented Apr 3, 2020

👍 , also your idea about "reloading script" is implementation of last section of #65296

I also agree that without separate reloader script this statements is only slightly better than patching .postStart.

@flokli flokli requested review from globin, fpletz and etu April 3, 2020 14:55
@symphorien
Copy link
Member

Random thought:

I don't speak SQL at all. The ensureDatabase option was nice because it enabled me not to lookup the right SQL syntax and also hid some differences between SQL dialects. Also note that you found it difficult to discover the right idempotent postgresql incantation--think about SQL noobs like me. My conclusion is that the casual user may be better off with a solution which does not involve setting the statements option themselves.
If you are to remove ensureDatabase because it does not fit all scenarios, is it possible to provide a function in lib for example which takes arguments like ensureDatabase and evaluates to the right value of the statements option ? This would only be limited to easy cases, like ensureDatabase today.

@flokli
Copy link
Contributor

flokli commented Apr 4, 2020

@symphorien I'd assume most nixos modules will still provide some sql statements out of the box, if your database is local.

@aanderse
Copy link
Member Author

aanderse commented Apr 4, 2020

@flokli exactly.
@symphorien see nixos/modules/services/misc/gitea.nix as an example. Also keep in mind that we don't want to encourage the use of this option, especially for "SQL noobs" like you. Why? The ensureDatabases options has given you a very false sense of security. It isn't reproducible and if we extend the option any further there is serious potential using that option could end up eating your data.

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work!

I have to admit though that the more I think about it, the more I think that people should take care of their databases rather than modules, but this is IMHO the right place to discuss this issue 😅

add a statements (this is a horrible name, please someone come up with a better one)

probably idempotentStatements? (I know, I'm very bad at naming, but this would make it at least a bit clearer).

replace all usage of ensureDatabases and ensureUsers in NixOS with the new statements option

How about opening e.g. a discourse thread as soon as we've discussed the basics of this change? Would like to hear opinions of a few more maintainers.

which @Ma27 ended up being the victim of, see the recent matrix-synapse fiasco for details on that

For the record, we're talking about #80447.

It isn't reproducible and if we extend the option any further there is serious potential using that option could end up eating your data.

Correct me if I'm missing something, but this usually shouldn't happen with ensure*, right? The main issue is that it gives false promises: with those options you can declare that e.g. a user X should exist with a given configuration, but this is not ensured if X exists already. In that case, the DB user X is not touched.

nixos/modules/services/databases/postgresql.nix Outdated Show resolved Hide resolved
nixos/modules/services/databases/postgresql.nix Outdated Show resolved Hide resolved
</itemizedlist>
<para>
See https://stackoverflow.com/a/18389184 for suggestions on writing idempotent sql
statements for PostgreSQL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I'm wrong, but this wouldn't fix issues like changed collation requirements for databases (as it happened with matrix-synapse), right?

Or in other words, what would happen, if

  • the collation of a database has to be changed
  • the module-author of an arbitrary service modifies the sql statement in services.postgresql.statements
  • a user deploys this configuration to a server with an existing database

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is not to fix the situation you described, but instead push volatile db work off of nixos and onto the sysadmin.

To answer your question: release notes explaining to users what they need to do should be issued. A special mention of backups should also be mentioned.

@aanderse
Copy link
Member Author

@Ma27 thanks for your comments. The "eating your data" comment is something that would be more of a concern if we extend the current ensure* options to include things like locale. Upstream documentation states databases should always be backed up immediately before operations like changing locale.

I agree we need more people in this conversation. Who should we ping?

@datafoo
Copy link
Contributor

datafoo commented May 13, 2020

I avoid using these options because of their lack of flexibility.

Instead I uses a Sqitch project that contains all the things I need to ensure. I have created a sqitch NixOS module to run the deployments:

{ config, lib, pkgs, ... }:
with lib;
let

  cfg = config.services.sqitch;

  deploymentOptions = {
    options = {
      bundle = mkOption {
        type = types.package;
        example = "pkgs.mybundle";
        description = ''
          PostgreSQL Sqitch bundle to deploy.
        '';
      };

      targetURI = mkOption {
        type = types.str;
        example = "db:pg://postgres@localhost:5432/postgres";
        description = ''
          Database database connection URI of the target. Only PostgreSQL targets are supported.
        '';
      };

      user = mkOption {
        type = types.str;
        default = "postgres";
        description = ''
          User running the Sqitch deployment.
        '';
      };

      group = mkOption {
        type = types.str;
        default = "postgres";
        description = ''
          Group running the Sqitch deployment.
        '';
      };

      requires = mkOption {
        default = [];
        type = types.listOf types.str;
        description = ''
          systemd `Requires=` dependencies to add to the Sqitch deployment service.
        '';
      };

      after = mkOption {
        default = [];
        type = types.listOf types.str;
        description = ''
          systemd `After=` dependencies to add to the Sqitch deployment service.
        '';
      };
    };
  };

in

{
  options = {

    services.sqitch = {

      enable = mkOption {
        type = types.bool;
        default = true;
        description = ''
          Whether to run Sqitch deployments.
        '';
      };

      deployments = mkOption {
        default = { };
        type = with types; attrsOf (submodule deploymentOptions);
        description = ''
          Attribute set of deployments to run. Creates
          <literal>sqitch-''${deployment}.service</literal> systemd units for
          each deployments defined here. Other services can add dependencies
          to those services if they rely on the deployments.
        '';
      };
    };
  };

  config = mkIf (cfg.enable && cfg.deployments != { }) {

    assertions = let
      deployments = (mapAttrsToList (deployment: options: options) cfg.deployments);
    in [
      {
        assertion = all (deployment: hasPrefix "db:pg://" deployment.targetURI) deployments;
        message = ''
          Options `service.sqitch.deployments.<name>.targetURI` must start with "db:pg://" because only PostgreSQL is supported.
        '';
      }
    ];

    environment.systemPackages = [
      pkgs.sqitchPg
    ];

    systemd.services =
    let
      toServicePair = deployment: options:
        let
          deploymentService = {
            description = "Sqitch deployment for ${deployment}";

            requires = options.requires;
            after = options.after;

            path = [
              config.services.postgresql.package
              pkgs.sqitchPg
            ];

            script = ''
              sqitch --cd ${options.bundle} deploy --target ${options.targetURI}
            '';

            serviceConfig = {

              Type = "oneshot";

              RemainAfterExit = true;

              User = options.user;
              Group = options.group;

              PrivateTmp = true;

              ProtectSystem = "strict";
              ProtectHome = true;

              ProtectKernelTunables = true;
              ProtectKernelModules = true;
              ProtectControlGroups = true;

              SyslogIdentifier = "sqitch-${deployment}";
            };
          };
        in {
          name = "sqitch-${deployment}";
          value = deploymentService;
        };
      toServices = deployments: listToAttrs (mapAttrsToList toServicePair deployments);
    in
      toServices cfg.deployments;
  };
}

@aanderse
Copy link
Member Author

@datafoo thanks for sharing that. I'm sure some people would find that very useful.

The ensureDatabase options continue to grow into more and more of a problem. Now a zabbix update is being held back as the default ensureDatabase encoding isn't compatible (for mysql). The ensureDatabase options aren't powerful enough, but if we extend them any further they aren't reproducible. I really want to proceed with this PR, but feel like more feedback, or at least people nodding in agreement (or even just giving this comment 👍) is required.

Anyone?

@jtojnar
Copy link
Contributor

jtojnar commented May 23, 2020

I agree that the options are currently problematic and I ended up creating a custom postStart solution that handles the functionality I needed (just the initial creation of the databases with extensions, apps will handle schema migrations).

I have minor preference for statements with proper disclaimers over insufficient ensure options.

By the way, there was a PostgreSQL patch trying to add CREATE DATABASE … IF NOT EXISTS but it seems to have fizzled out: https://www.postgresql.org/message-id/flat/CAFcNs%2BpQZ4QzGF-mBy%3D6pkyw__E62inX9DubWu7y2goahvT9sA%40mail.gmail.com

@aanderse
Copy link
Member Author

@jtojnar thanks for your comments. My thought would be to eliminate and replace the ensureDatabase and ensureUsers options from both mysql and postgresql, as opposed to supplementing them. I think the ensure* options are fundamentally a less than favourable match for NixOS.

@peterhoeg
Copy link
Member

👍 for getting rid of ensureDatabases and friends. I have also had to do custom scripts for most things anyway.

@aanderse
Copy link
Member Author

Thanks @peterhoeg! Would you see any value in a statements option, or would you encourage users to append to the service postStart option?

@peterhoeg
Copy link
Member

peterhoeg commented May 26, 2020 via email

@aanderse
Copy link
Member Author

@peterhoeg if we're looking to cover simple cases and give a reasonable amount of flexibility to users, with warning around breaking reproducibility, I think the statement option has some value... but I agree for a number of use cases something custom still might be required.

So is the statements option worth it in your opinion, even if not something you would personally use?

@aanderse
Copy link
Member Author

ping @peterhoeg

So is the statements option worth it in your opinion, even if not something you would personally use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants