Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: limit results to 50, default to 10 #853

Merged
merged 1 commit into from Jan 30, 2021
Merged

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Jan 30, 2021

This search query is pretty heavy. Defaulting to 500 has caused
Hydra's web UI to appear to be down. Since 500 can take it down, users
probably shouldn't be allowed t ask for that many.

This search query is pretty heavy. Defaulting to 500 has caused
Hydra's web UI to appear to be down. Since 500 can take it down, users
probably shouldn't be allowed t ask for that many.
Copy link
Member

@andir andir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think 50 is a reasonable amount. Ideally we would have a way to make the search faster but that isn't going to happen as long as we search for '%$query%'.

@grahamc
Copy link
Member Author

grahamc commented Jan 30, 2021

Thanks, @andir, I agree. Maybe there is a way we can make that faster if we were more precise about what the user was searching for. For example, don't search drvpath with ILIKE with a flexible prefix, since they presumably know the hash or they don't. If they don't a searc on nixname would cover the rest, I think?

@grahamc grahamc merged commit 72e237f into master Jan 30, 2021
@grahamc grahamc deleted the search-limit-reqs branch January 30, 2021 13:58
@samueldr
Copy link
Member

Not sure how helpful it can be, but the IRC logger does fast searches using some PostgreSQL feature I don't really grok.

It's quick, it runs on the lowest-tier vultr VPS. Though results are not always precise. Maybe there are other similar PostgreSQL features we could use to index?

@grahamc
Copy link
Member Author

grahamc commented Jan 30, 2021

It might be useful, but I think we would need to go a bit further. Right now the fulltext search features don't work very well on our inputs:

hydra=> SELECT to_tsvector('english', '/nix/store/aoeusthaoeutho-foo-1.2.3.drv');
-[ RECORD 1 ]--------------------------------------------
to_tsvector | '/nix/store/aoeusthaoeutho-foo-1.2.3.drv':1

hydra=> SELECT to_tsvector('english', 'pkgs.chromium');
-[ RECORD 1 ]------------------
to_tsvector | 'pkgs.chromium':1

@grahamc
Copy link
Member Author

grahamc commented Jan 30, 2021

hydra=> SELECT * from to_tsvector('english', regexp_replace('/nix/store/aoeusthaoeutho-foo-1.2.3.drv pkgs.chromium', '[\./-]', ' ', 'g'));
-[ RECORD 1 ]----------------------------------------------------------------------------------------------
to_tsvector | '1':5 '2':6 '3':7 'aoeusthaoeutho':3 'chromium':10 'drv':8 'foo':4 'nix':1 'pkgs':9 'store':2

@grahamc
Copy link
Member Author

grahamc commented Jan 30, 2021

Experimenting a bit, I created an index like so:

2021-01-30 22:36:10.039 UTC [16668] LOG:  duration: 5280553.983 ms  statement: CREATE INDEX expIndexBuildOutputsPathGin ON BuildOutputs USING GIN(to_tsvector('english', regexp_replace(path, '[\./-]', ' ', 'g')));

and it yielded results for the following query in 4ms:

SELECT me.id, me.finished, me.timestamp, me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince FROM builds me LEFT JOIN buildoutputs buildoutputs ON buildoutputs.build = me.id WHERE ( to_tsvector('english', regexp_replace(buildoutputs.path, '[\./-]', ' ', 'g')) @@  websearch_to_tsquery('english', regexp_replace('pkgs.chromium', '[\./-]', ' ', 'g')) ) ORDER BY id desc LIMIT 10;

However note it yielded no results, since pkgs.chromium becomes pkgs & chromium which isn't a thing.

I then tried a fragment of a store path:

hydra=> select * from buildoutputs limit 1;
  build  | name |                                                    path                                                    
---------+------+------------------------------------------------------------------------------------------------------------
 2521878 | out  | /nix/store/19x3icywkiagmmvjvkv5afisvdy1alg3-poppler-0.18.4
(1 row)

hydra=>  SELECT me.id, me.finished, me.timestamp, me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince FROM builds me LEFT JOIN buildoutputs buildoutputs ON buildoutputs.build = me.id WHERE ( to_tsvector('english', regexp_replace(buildoutputs.path, '[\./-]', ' ', 'g')) @@  websearch_to_tsquery('english', regexp_replace('ywkiagmm', '[\./-]', ' ', 'g')) ) ORDER BY id desc LIMIT 10;

and this is taking a very long time. I'll report back when it finishes.

@grahamc
Copy link
Member Author

grahamc commented Jan 30, 2021

I canceled it after a while. Interestingly, "poppler 0.16" is producing good results too:

hydra=>  SELECT me.id, me.finished, me.timestamp, me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince FROM builds me LEFT JOIN buildoutputs buildoutputs ON buildoutputs.build = me.id WHERE ( to_tsvector('english', regexp_replace(buildoutputs.path, '[\./-]', ' ', 'g')) @@  websearch_to_tsquery('english', regexp_replace('poppler 0.16', '[\./-]', ' ', 'g')) ) ORDER BY id desc LIMIT 7;

   id    | finished | timestamp  | project | jobset | jobset_id |    job     |    nixname     |           description            |                            drvpath                             |    system     | license |            homepage             | maintainers | maxsilent | timeout | ischannel | iscurrent | priority | globalpriority | starttime  |  stoptime  | iscachedbuild | buildstatus | size | closuresize | releasename | keep | notificationpendingsince 
---------+----------+------------+---------+--------+-----------+------------+----------------+----------------------------------+----------------------------------------------------------------+---------------+---------+---------------------------------+-------------+-----------+---------+-----------+-----------+----------+----------------+------------+------------+---------------+-------------+------+-------------+-------------+------+--------------------------
 1595210 |        1 | 1323082410 | nixpkgs | trunk  |       355 | poppler    | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/7zd5y4dm46xirls26fcnsyag88f2vzra-poppler-0.16.7.drv | x86_64-darwin | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323082409 | 1323082410 |             0 |           2 |    0 |           0 |             |    0 |                         
 1594567 |        1 | 1323080495 | nixpkgs | trunk  |       355 | poppler    | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/as75h6fd4szdr062d5sm3kzlvyzkk1sv-poppler-0.16.7.drv | x86_64-linux  | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323080495 | 1323080495 |             1 |           0 |    0 |           0 |             |    0 |                         
 1593648 |        1 | 1323080385 | nixpkgs | trunk  |       355 | popplerQt4 | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/vbzfrs561vdn6qpiah296nr467iqx160-poppler-0.16.7.drv | x86_64-linux  | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323080385 | 1323080385 |             1 |           0 |    0 |           0 |             |    0 |                         
 1593438 |        1 | 1323250084 | nixpkgs | trunk  |       355 | poppler    | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/mk75vpv6xd44xwfmn51wbfy7f4y0jgm3-poppler-0.16.7.drv | i686-freebsd  | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323250084 | 1323250084 |             0 |           2 |    0 |           0 |             |    0 |                         
 1587816 |        1 | 1323079693 | nixpkgs | trunk  |       355 | poppler    | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/6rqmkj2c5pljlphy4dbnl1hp6w56j1wi-poppler-0.16.7.drv | i686-linux    | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323079693 | 1323079693 |             1 |           0 |    0 |           0 |             |    0 |                         
 1587604 |        1 | 1323084567 | nixpkgs | trunk  |       355 | poppler    | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/67q95wwpmk1a6q2pzqyc1qxl0y2vpp6k-poppler-0.16.7.drv | i686-cygwin   | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323084399 | 1323084567 |             0 |           2 |    0 |           0 |             |    0 |                         
 1584306 |        1 | 1323079196 | nixpkgs | trunk  |       355 | popplerQt4 | poppler-0.16.7 | Poppler, a PDF rendering library | /nix/store/dmb41x262xpkdk89zk5digm1522rl4z9-poppler-0.16.7.drv | i686-linux    | GPLv2   | http://poppler.freedesktop.org/ |             |      3600 |   36000 |         0 |         0 |        0 |              0 | 1323079196 | 1323079196 |             1 |           0 |    0 |           0 |             |    0 |                         
(7 rows)

that query took about 400ms:

 Limit  (cost=370.22..370.24 rows=7 width=381) (actual time=357.151..357.154 rows=7 loops=1)
   Output: me.id, me.finished, me."timestamp", me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince
   ->  Sort  (cost=370.22..370.27 rows=21 width=381) (actual time=357.151..357.152 rows=7 loops=1)
         Output: me.id, me.finished, me."timestamp", me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince
         Sort Key: me.id DESC
         Sort Method: top-N heapsort  Memory: 32kB
         ->  Nested Loop  (cost=100.73..369.82 rows=21 width=381) (actual time=355.785..357.017 rows=231 loops=1)
               Output: me.id, me.finished, me."timestamp", me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince
               Inner Unique: true
               ->  Bitmap Heap Scan on public.buildoutputs  (cost=100.16..189.53 rows=21 width=4) (actual time=355.763..356.027 rows=231 loops=1)
                     Output: buildoutputs.build, buildoutputs.name, buildoutputs.path
                     Recheck Cond: (to_tsvector('english'::regconfig, regexp_replace(buildoutputs.path, '[\./-]'::text, ' '::text, 'g'::text)) @@ '''poppler'' & ''0'' & ''16'''::tsquery)
                     Heap Blocks: exact=210
                     ->  Bitmap Index Scan on expindexbuildoutputspathgin  (cost=0.00..100.16 rows=21 width=0) (actual time=355.732..355.732 rows=231 loops=1)
                           Index Cond: (to_tsvector('english'::regconfig, regexp_replace(buildoutputs.path, '[\./-]'::text, ' '::text, 'g'::text)) @@ '''poppler'' & ''0'' & ''16'''::tsquery)
               ->  Index Scan using builds_pkey on public.builds me  (cost=0.57..8.59 rows=1 width=381) (actual time=0.004..0.004 rows=1 loops=231)
                     Output: me.id, me.finished, me."timestamp", me.project, me.jobset, me.jobset_id, me.job, me.nixname, me.description, me.drvpath, me.system, me.license, me.homepage, me.maintainers, me.maxsilent, me.timeout, me.ischannel, me.iscurrent, me.priority, me.globalpriority, me.starttime, me.stoptime, me.iscachedbuild, me.buildstatus, me.size, me.closuresize, me.releasename, me.keep, me.notificationpendingsince
                     Index Cond: (me.id = buildoutputs.build)
 Planning Time: 0.393 ms
 Execution Time: 357.188 ms

@grahamc
Copy link
Member Author

grahamc commented Jan 30, 2021

I'm playing with this diff:

diff --git a/src/lib/Hydra/Controller/Root.pm b/src/lib/Hydra/Controller/Root.pm
index 66aba9e5..42a3ea15 100644
--- a/src/lib/Hydra/Controller/Root.pm
+++ b/src/lib/Hydra/Controller/Root.pm
@@ -444,6 +444,7 @@ sub search :Local Args(0) {
 
     $c->model('DB')->schema->txn_do(sub {
         $c->model('DB')->schema->storage->dbh->do("SET LOCAL statement_timeout = 20000");
+
         $c->stash->{projects} = [ $c->model('DB::Projects')->search(
             { -and =>
                 [ { -or => [ name => { ilike => "%$query%" }, displayName => { ilike => "%$query%" }, description => { ilike => "%$query%" } ] }
@@ -461,7 +462,14 @@ sub search :Local Args(0) {
             { order_by => ["project", "name"], join => ["project"] } ) ];
 
         $c->stash->{jobs} = [ $c->model('DB::Builds')->search(
-            { "job" => { ilike => "%$query%" }
+            {
+            -and => [
+                \[
+                    "to_tsvector('english', regexp_replace(job, '[\./-]', ' ', 'g'))"
+                    . " @@ websearch_to_tsquery('english', regexp_replace('?', '[\./-]', ' ', 'g'))",
+                    $query,
+                ],
+            ]
             , "project.hidden" => 0
             , "jobset.hidden" => 0
             , iscurrent => 1
@@ -471,16 +479,35 @@ sub search :Local Args(0) {
             } )
         ];
 
+        # CREATE INDEX expIndexBuildOutputsPathGin ON BuildOutputs USING GIN(to_tsvector('english', regexp_replace(path, '[\./-]', ' ', 'g')));
+        # CREATE INDEX expIndexBuildsJobGin ON Builds USING GIN(to_tsvector('english', regexp_replace(job, '[\./-]', ' ', 'g')));
+        # CREATE INDEX expIndexBuildsDrvPathGin ON Builds USING GIN(to_tsvector('english', regexp_replace(drvpath, '[\./-]', ' ', 'g')));
+
+
         # Perform build search in separate queries to prevent seq scan on buildoutputs table.
         $c->stash->{builds} = [ $c->model('DB::Builds')->search(
-            { "buildoutputs.path" => { ilike => "%$query%" } },
-            { order_by => ["id desc"], join => ["buildoutputs"]
+            {
+            -and => [
+                \[
+                    "to_tsvector('english', regexp_replace(buildoutputs.path, '[\./-]', ' ', 'g'))"
+                    . " @@ websearch_to_tsquery('english', regexp_replace('?', '[\./-]', ' ', 'g'))",
+                    $query,
+                ],
+            ]
+            , order_by => ["id desc"], join => ["buildoutputs"]
             , rows => $c->stash->{limit}
             } ) ];
 
         $c->stash->{buildsdrv} = [ $c->model('DB::Builds')->search(
-            { "drvpath" => { ilike => "%$query%" } },
-            { order_by => ["id desc"]
+            {
+            -and => [
+                \[
+                    "to_tsvector('english', regexp_replace(drvpath, '[\./-]', ' ', 'g'))"
+                    . " @@ websearch_to_tsquery('english', regexp_replace('?', '[\./-]', ' ', 'g'))",
+                    $query,
+                ],
+            ]
+            , order_by => ["id desc"]
             , rows => $c->stash->{limit}
             } ) ];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants