Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Taskcluster] Only schedule the sink-task for pull requests #24511

Merged
merged 8 commits into from Jul 9, 2020

Conversation

stephenmcgruer
Copy link
Contributor

We don't need the sink-task for master pushes/etc, and this is a quick
workaround for the fact that Taskcluster doesn't support >100 data
dependencies (and we now schedule more than that for daily epoch runs!)

We don't need the sink-task for master pushes/etc, and this is a quick
workaround for the fact that Taskcluster doesn't support >100 data
dependencies (and we now schedule more than that for daily epoch runs!)
@Hexcles
Copy link
Member

Hexcles commented Jul 8, 2020

The 100 limit sounds so familiar somehow...

sink_task['command'] += ' {0}'.format(' '.join(depends_on_ids))
task_id_map['sink-task'] = create_tc_task(event, sink_task, taskgroup_id, depends_on_ids)
is_pr, _ = get_triggers(event)
if is_pr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can specify this in the yml using

use:
  - trigger-pr

and then just make this conditional on the sink task existing.

@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-24511 July 8, 2020 14:23 Inactive
@jgraham
Copy link
Contributor

jgraham commented Jul 8, 2020

is if name != "sink-task" on line 286 (i.e. before calling add_task) enough to fix this?

tools/ci/tc/decision.py Outdated Show resolved Hide resolved
tools/ci/tc/decision.py Outdated Show resolved Hide resolved
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-24511 July 8, 2020 15:30 Inactive
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-24511 July 8, 2020 16:59 Inactive
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-24511 July 8, 2020 17:22 Inactive
@jgraham
Copy link
Contributor

jgraham commented Jul 8, 2020

Restating my comment from IRC; I'm not a huge fan of preserving order here. The model I had was that you define a set of tasks and the code figures out the right ordering to build the graph (roughly a topological sort). Preserving the order changes this model somewhat and I feel like allowing the code to rely on order makes things more fragile, not less.

@Hexcles
Copy link
Member

Hexcles commented Jul 8, 2020

@jgraham first of all, the fix itself is not relying on the order (it special cases on the name "sink-task" and adds it at the end), which is to say the fix itself isn't fragile.

Now I decided to keep the ordering throughout for two reasons:

  1. The direct cause: it makes testing much easier. Surely I could just add another assertion to check that the sink task is at the end without caring about the order of other tasks. However, since the task creation process is fundamentally synchronous and sequential, I'd argue that it makes sense to check the ordering in tests, too. As you said, the important part here is to ensure a topological ordering, but I think it's even better to make sure the behaviour is deterministic and it's always the same topological ordering.
  2. More philosophically: before this change, there was a mix of ordered and unordered things. The YAML file itself is clearly a list (you could argue that the semantics are not, but I'm not sure how easy you could document that to avoid the wrong impression); the final output is explicitly an ordered dict and scanning the code I saw lots of simple for loops; yet it turned out some steps did not keep the ordering (e.g. $map with uses of set and dict). As someone who only had a vague idea of this code, I looked at the input and output and immediately assumed the ordering was (or should be) kept throughout, until I ran into issues in unit tests. I strongly prefer consistency and intuitiveness. And back to the $map operation, a somewhat relevant example is how ECMAScript decided to preserve the order in Array.map and Map.

I have an alternative: we could explicitly sort all the tasks except the sink task alphabetically in the final step, and assert that order in the tests. This has the benefits of being deterministic and making it clear that the ordering in YAML will not be kept. WDYT?

@jgraham
Copy link
Contributor

jgraham commented Jul 8, 2020

I don't see how the final sort works when there are other tasks with dependents; tasks need to be scheduled after their dependents. The sink task is only special in the sense that we don't explictly specify that it depends on all other scheduled tasks, we implement that with code.

My general feeling is that the input is accidentially ordered of the medium in which it's expressed (in the same way that a spreadsheet has an ordered list of rows and a haskell file is an ordered list of lines of code) and the output is necessarily ordered (because if A depends on B it has to be scheduled after B). So I see how that can lead to the impression that the whole thing is ordered. But it isn't intended to be the actual model here.

Anyway I don't feel super-strongly about this but I think it would be more correct to make it explcit in the code that the input order isn't relevant rather than trying to preserve that through more of the code.

@Hexcles
Copy link
Member

Hexcles commented Jul 8, 2020

I don't see how the final sort works when there are other tasks with dependents; tasks need to be scheduled after their dependents. The sink task is only special in the sense that we don't explictly specify that it depends on all other scheduled tasks, we implement that with code.

Sorry. I meant to say sorting each layer of the DAG alphabetically, a deterministic behaviour which I'd still strongly favour over allowing non-deterministic ordering and writing a topological traversal in unit tests.

Anyway I don't feel super-strongly about this but I think it would be more correct to make it explcit in the code that the input order isn't relevant rather than trying to preserve that through more of the code.

I agree with that though. Again, I do NOT intend to fix the issue by relying on having the sink task at the end of the YAML file; preserving the input order isn't for that purpose. However, I'm not sure how we can make it explicit in code that the input order isn't critical but only incidental (e.g. when it comes to unit tests); of course we can add some comments in the YAML file.

@jgraham
Copy link
Contributor

jgraham commented Jul 8, 2020

Well we can also add comments in the code to make explicit what the invariants are and aren't. Or convert to sets where possible rather than lists.

If you want to make the unittests more robust that's fine.

@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-24511 July 8, 2020 21:56 Inactive
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-24511 July 9, 2020 14:51 Inactive
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m still not really convinced by changing more things into ordered dicts internally where the order explictly doesn't matter, but OK.

@Hexcles Hexcles merged commit 821f177 into master Jul 9, 2020
@Hexcles Hexcles deleted the smcgruer/sink-only-prs branch July 9, 2020 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants