New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search: limit results to 50, default to 10 #853
Conversation
This search query is pretty heavy. Defaulting to 500 has caused Hydra's web UI to appear to be down. Since 500 can take it down, users probably shouldn't be allowed t ask for that many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think 50 is a reasonable amount. Ideally we would have a way to make the search faster but that isn't going to happen as long as we search for '%$query%'.
Thanks, @andir, I agree. Maybe there is a way we can make that faster if we were more precise about what the user was searching for. For example, don't search drvpath with ILIKE with a flexible prefix, since they presumably know the hash or they don't. If they don't a searc on nixname would cover the rest, I think? |
Not sure how helpful it can be, but the IRC logger does fast searches using some PostgreSQL feature I don't really grok. It's quick, it runs on the lowest-tier vultr VPS. Though results are not always precise. Maybe there are other similar PostgreSQL features we could use to index? |
It might be useful, but I think we would need to go a bit further. Right now the fulltext search features don't work very well on our inputs:
|
|
Experimenting a bit, I created an index like so:
and it yielded results for the following query in 4ms: SELECT me.id, me.finished, me.timestamp, me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince FROM builds me LEFT JOIN buildoutputs buildoutputs ON buildoutputs.build = me.id WHERE ( to_tsvector('english', regexp_replace(buildoutputs.path, '[\./-]', ' ', 'g')) @@ websearch_to_tsquery('english', regexp_replace('pkgs.chromium', '[\./-]', ' ', 'g')) ) ORDER BY id desc LIMIT 10; However note it yielded no results, since I then tried a fragment of a store path:
and this is taking a very long time. I'll report back when it finishes. |
I canceled it after a while. Interestingly, "poppler 0.16" is producing good results too:
that query took about 400ms:
|
I'm playing with this diff: diff --git a/src/lib/Hydra/Controller/Root.pm b/src/lib/Hydra/Controller/Root.pm
index 66aba9e5..42a3ea15 100644
--- a/src/lib/Hydra/Controller/Root.pm
+++ b/src/lib/Hydra/Controller/Root.pm
@@ -444,6 +444,7 @@ sub search :Local Args(0) {
$c->model('DB')->schema->txn_do(sub {
$c->model('DB')->schema->storage->dbh->do("SET LOCAL statement_timeout = 20000");
+
$c->stash->{projects} = [ $c->model('DB::Projects')->search(
{ -and =>
[ { -or => [ name => { ilike => "%$query%" }, displayName => { ilike => "%$query%" }, description => { ilike => "%$query%" } ] }
@@ -461,7 +462,14 @@ sub search :Local Args(0) {
{ order_by => ["project", "name"], join => ["project"] } ) ];
$c->stash->{jobs} = [ $c->model('DB::Builds')->search(
- { "job" => { ilike => "%$query%" }
+ {
+ -and => [
+ \[
+ "to_tsvector('english', regexp_replace(job, '[\./-]', ' ', 'g'))"
+ . " @@ websearch_to_tsquery('english', regexp_replace('?', '[\./-]', ' ', 'g'))",
+ $query,
+ ],
+ ]
, "project.hidden" => 0
, "jobset.hidden" => 0
, iscurrent => 1
@@ -471,16 +479,35 @@ sub search :Local Args(0) {
} )
];
+ # CREATE INDEX expIndexBuildOutputsPathGin ON BuildOutputs USING GIN(to_tsvector('english', regexp_replace(path, '[\./-]', ' ', 'g')));
+ # CREATE INDEX expIndexBuildsJobGin ON Builds USING GIN(to_tsvector('english', regexp_replace(job, '[\./-]', ' ', 'g')));
+ # CREATE INDEX expIndexBuildsDrvPathGin ON Builds USING GIN(to_tsvector('english', regexp_replace(drvpath, '[\./-]', ' ', 'g')));
+
+
# Perform build search in separate queries to prevent seq scan on buildoutputs table.
$c->stash->{builds} = [ $c->model('DB::Builds')->search(
- { "buildoutputs.path" => { ilike => "%$query%" } },
- { order_by => ["id desc"], join => ["buildoutputs"]
+ {
+ -and => [
+ \[
+ "to_tsvector('english', regexp_replace(buildoutputs.path, '[\./-]', ' ', 'g'))"
+ . " @@ websearch_to_tsquery('english', regexp_replace('?', '[\./-]', ' ', 'g'))",
+ $query,
+ ],
+ ]
+ , order_by => ["id desc"], join => ["buildoutputs"]
, rows => $c->stash->{limit}
} ) ];
$c->stash->{buildsdrv} = [ $c->model('DB::Builds')->search(
- { "drvpath" => { ilike => "%$query%" } },
- { order_by => ["id desc"]
+ {
+ -and => [
+ \[
+ "to_tsvector('english', regexp_replace(drvpath, '[\./-]', ' ', 'g'))"
+ . " @@ websearch_to_tsquery('english', regexp_replace('?', '[\./-]', ' ', 'g'))",
+ $query,
+ ],
+ ]
+ , order_by => ["id desc"]
, rows => $c->stash->{limit}
} ) ]; |
This search query is pretty heavy. Defaulting to 500 has caused
Hydra's web UI to appear to be down. Since 500 can take it down, users
probably shouldn't be allowed t ask for that many.