Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydron: init at 2018-07-11 #43308

Merged
merged 2 commits into from Jul 18, 2018
Merged

hydron: init at 2018-07-11 #43308

merged 2 commits into from Jul 18, 2018

Conversation

Chiiruno
Copy link
Contributor

@Chiiruno Chiiruno commented Jul 10, 2018

Motivation for this change

A nice and performant image tagger and sorter.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

@Chiiruno
Copy link
Contributor Author

Don't merge yet, depends on #43362

@Chiiruno
Copy link
Contributor Author

Ready for merging again. Having a service makes this package much more useful.
cc @Mic92 @matthewbauer

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These options seem a bit arbitrary, do a lot of users of hydron regularly add/remove certain tags and remove ids?

It might be better to just provide a single option that allows the user to easily execute other commands before start. Although that can be done by using systemd.services.hydron.script = mkBefore "some command"; already iirc.


src = fetchgit {
inherit rev;
url = "https://github.com/bakape/hydron";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fetchFromGitHub instead


serviceConfig = {
PermissionsStartOnly = true;
Type = "simple";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"simple" is the default, can be removed

options.services.hydron = {
enable = mkEnableOption "hydron";

baseDir = mkOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more common to use dataDir

listenAddress = mkOption {
type = types.nullOr types.str;
default = null;
description = "Listen on a specific IP address and port.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation of which format this is expected to be in? <ip>:<port> I assume?

@Chiiruno
Copy link
Contributor Author

I would like to keep the granularity of the options currently present, if possible.
I feel it makes for easier and more "clear" configuration.

@infinisil
Copy link
Member

So you're currently using such a config? I'm asking because I really don't know how these options could be useful in the long run. They seem to be something one would run once in a terminal, not every time the service starts.

addTags and removeTags seem to be tag white and blacklist, which seems to be needed because setting fetchTags = true will override your local db with the fetched one? This should be handled at the application level and not by a NixOS module. I suspect that eventually this will be implemented in hydron.

import and importPaths seem weird, because they only import when the service starts. This should really be either some directory watcher service (in hydron itself optimally), or a systemd timer that runs regularly.

removeIds seems really weird and something that should definitely be handled by just running a command. I wouldn't want to rebuild my whole system just to remove an id from my library.

In my opinion, above mentioned options should be removed, they don't really fit into a declarative configuration. Note that users can still have such functionality by customizing the service as described above.

A systemd timer that regularly imports files from specified directories seems fine though, if you want to add that.

@Chiiruno
Copy link
Contributor Author

They seem to be something one would run once in a terminal, not every time the service starts.

I know, and it's something I would like to fix upstream, but unfortunately because hydron doesn't check for new or changed images, so it needs to check every start via import, or perhaps like you suggest, on a timer so it doesn't slow down the boot process.

addTags and removeTags I can probably remove without a problem. fetchTags just fetches tags for images their hash from gelbooru. fetching tags should be done after every (re)import.

directory watcher service

As I mentioned, that's an upstream issue and I'll remove the periodic imports once it is. For now, let's go with a systemd timer.

removeIDs seems really weird...

Good point, thanks for the tip. I'll remove it.

@Chiiruno
Copy link
Contributor Author

@infinisil How does this look?

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that looks a lot better!

Hydron sounds very interesting, it's exactly what I need, will probably run it myself later :O

};

interval = mkOption {
type = types.string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be types.str


importPaths = mkOption {
type = types.nonEmptyListOf types.path;
default = [ (cfg.dataDir + "/images") ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use types.listOf types.path, default = [], and always try to import dataDir + "/images". if the user doesn't want to use it, they can just not put pictures in there, so this option is reserved for user specified directories.


preStart = ''
# Ensure folder exists and permissions are correct
mkdir -p ${cfg.dataDir}/images ${concatStringsSep " " cfg.importPaths}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with the importPaths suggestion, this should be just mkdir -p "${cfg.dataDir}/images", the user managed directories shouldn't be touched.

# Ensure folder exists and permissions are correct
mkdir -p ${cfg.dataDir}/images ${concatStringsSep " " cfg.importPaths}
chmod 750 ${cfg.dataDir}
chown -R hydron:hydron ${cfg.dataDir}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: " around ${cfg.dataDir}, will make bash happy with some weird directory names.

description = "Import paths into hydron and fetch tags";

script = ''
${pkgs.hydron}/bin/hydron import ${concatStringsSep " " cfg.importPaths}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with the importPaths suggestion, you can add "${dataDir}/imports" here. Also: You can use lib.escapeShellArgs cfg.importPaths here instead to handle the escaping. (lib.escapeShellArg could be used as well for the mkdir above, maybe a bit excessive though)


script = ''
${pkgs.hydron}/bin/hydron import ${concatStringsSep " " cfg.importPaths}
'' + optionalString (cfg.fetchTags) ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Parens aren't needed


groups.hydron = {
gid = config.ids.gids.hydron;
members = [ "hydron" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work without specifying this, since the user is declared to have hydron as the primary group already above.

goDeps = ./deps.nix;

src = fetchFromGitHub {
inherit rev;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Place the rev definition here, because it's not needed as a drv attribute.

@Chiiruno
Copy link
Contributor Author

@infinisil Glad to see you like hydron, I've been using it and it's great. Especially being able to drag and drop images.
Should be ready for building and merging again.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only thing to do is squash all commits into one for the package and one for the module

@bakape
Copy link

bakape commented Jul 17, 2018

@infinisil

addTags and removeTags seem to be tag white and blacklist, which seems to be needed because setting fetchTags = true will override your local db with the fetched one?

No, addTags and removeTags only affects user-specified tags. Hydron maintains a separate set of tags depending on who set them (user, Gelbooru, Hydrus, etc.). fetch_tags only affects gelbooru-sourced tags. All tags are intersected on data request. Only the tag source can modify tags set by it.

import

That's a one time copy-based CLI commnad. It copies the matched file(s) to an internal database on request and populates metadata for them.

@Chiiruno

fetching tags should be done after every (re)import.

It really should not. You should import with the -f flag. Import + fetch_tags only makes sense on the initial import, which tends to be very large.

@Chiiruno
Copy link
Contributor Author

Chiiruno commented Jul 17, 2018

@bakape

-f    Fetch tags from gelbooru.com for imported files.
        NB: This will notably slow down importing large amounts of files.
        Consider using import, followed by fetch_tags!

I thought it was the opposite?
I'll fix it, then.

@bakape
Copy link

bakape commented Jul 17, 2018

large amounts of files

That means large amounts of files that are not imported (thousands). Files that you try to import, but are already imported, will be hash-compared and ignored.

@Chiiruno
Copy link
Contributor Author

Okay, that should do it.

@infinisil
Copy link
Member

Hmm, but I mean, if I were to try out hydron, I'd first start the service, it would create the dataDir/images, and touch the once file (in your current implementation). And then I will put my 2000 files in there, it imports it via -f and is slow because of that.

So this current solution certainly isn't very good. Why does it support 2 ways of doing that anyways (import -f and import; fetch seem to do the same)? If it's really the case that the user has to decide what to use based on amount of imports, maybe it would be best to choose one of them based on how many new files the import directory contains?

@Chiiruno
Copy link
Contributor Author

Chiiruno commented Jul 17, 2018

The current solution has hydron check if the once file exists, and then if it doesn't, it uses fetch_tags and creates the once file, it it does, it incrementally fetches tags via the -f flag.

@Chiiruno
Copy link
Contributor Author

Chiiruno commented Jul 17, 2018

Also, the hydron-fetch service runs every hour, or per specified interval, so that's not a problem since we're not having the main hydron service do this.
If you want to import and fetch tags right after adding your importPaths, but before the hour chimes, just start hydron-fetch.

@infinisil
Copy link
Member

Yeah, but the folder hydron uses to import only exists once it has started, so the chance of the once file technique actually working as intended is rather small, since it probably always exists once the user actually imports files. And when the user later adds his own directories with importPaths, they won't get imported via import; fetch (the fast way?) because the once file already exists.

@Chiiruno
Copy link
Contributor Author

As @bakape said, the fetch_tags is really only for the first time, after that, -f is fine.

@bakape
Copy link

bakape commented Jul 17, 2018

@infinisil
fetch_tags fetches tags for ALL of the images in your DB. -f only does this for the freshly imported ones.

@bakape
Copy link

bakape commented Jul 17, 2018

@Chiiruno
Honestly, just always use -f. import+fetch_tags is an optimisation that can be made at the user's discretion and should not be done automatically, imo.

hydron-fetch service runs every hour

Why the fuck? Why would you ever DoS gelbooru like that? Imagine you have 10k images. Why would you refetch tags for all of them every hour?

@Chiiruno
Copy link
Contributor Author

DoS

Well, that wasn't really my intention. Perhaps daily instead?

@Chiiruno
Copy link
Contributor Author

Also why is this a problem if we're using -f?

@infinisil
Copy link
Member

infinisil commented Jul 17, 2018

But I mean for initial runs, -f is the same as fetch, because your database was empty, so added pictures = all pictures.

So I also feel like always using -f should be fine, will make the service simpler as well.

Edit: And less stateful and more predictable

@Chiiruno
Copy link
Contributor Author

Yeah, using -f would be preferred.
What about the "DoS" problem? Is this still a problem if we're using -f? @bakape

@infinisil
Copy link
Member

Imo that shouldn't be part of our problem: hydron should expect users to import a lot of files in one swoop and handle (if it has that) gelboorus rate limits, or implement rate limiting itself. E.g. nix-index handles this by limiting the number of active requests to be always <100.

@bakape
Copy link

bakape commented Jul 17, 2018

@Chiiruno

Well, that wasn't really my intention. Perhaps daily instead?

Weekly would be more apt.

Also why is this a problem if we're using -f?

It's not a problem, if you use -f. -f only fetches for freshly imported files.

@infinisil
Yes, it is. The only difference is that your imports block the limited amount of workers on the HTTP fetch from Gelbooru. This makes importing not only bound to CPU and disk I/O, but also to the Gelbooru JSON API. It could make a huge difference, if you have many thousands of images to import.

@bakape
Copy link

bakape commented Jul 17, 2018

Imo that shouldn't be part of our problem: hydron should expect users to import a lot of files in one swoop and handle (if it has that) gelboorus rate limits, or implement rate limiting itself. E.g. nix-index handles this by limiting the number of active requests to be always <100.

That is not exactly what I meant. Hydron already limits to 8 concurrent API requests. What I meant is, if I'm the Gelbooru admin and see that every hour some faggot makes 10k API requests, I will block him. I might even have scripts to flag this as abuse and block him automatically. The end user should not be aware of this possibility.

@infinisil
Copy link
Member

@bakape I see, but that won't be a problem with -f, right? If so, I think always using -f (even on the initial run) should be preferred.

@bakape
Copy link

bakape commented Jul 17, 2018

@infinisil
Yes, that is my stance too. If the user wants to, they can optimise manually.

@Chiiruno
Copy link
Contributor Author

@infinisil Ready again.

@infinisil
Copy link
Member

Looking good to me! One last thing: Commit message for the module should be "nixos/hydron: init"

@bakape Also looking good to you?

@bakape
Copy link

bakape commented Jul 18, 2018 via email

@infinisil
Copy link
Member

Classic ;)

@Chiiruno
Copy link
Contributor Author

@infinisil Am I finally free?

@infinisil
Copy link
Member

You are finally free, thanks a lot!

@infinisil infinisil merged commit 810f91f into NixOS:master Jul 18, 2018
@Chiiruno Chiiruno deleted the dev/hydron branch July 18, 2018 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants