New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Taskcluster] Only schedule the sink-task for pull requests #24511
Conversation
We don't need the sink-task for master pushes/etc, and this is a quick workaround for the fact that Taskcluster doesn't support >100 data dependencies (and we now schedule more than that for daily epoch runs!)
The 100 limit sounds so familiar somehow... |
tools/ci/tc/decision.py
Outdated
sink_task['command'] += ' {0}'.format(' '.join(depends_on_ids)) | ||
task_id_map['sink-task'] = create_tc_task(event, sink_task, taskgroup_id, depends_on_ids) | ||
is_pr, _ = get_triggers(event) | ||
if is_pr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can specify this in the yml using
use:
- trigger-pr
and then just make this conditional on the sink task existing.
is |
Restating my comment from IRC; I'm not a huge fan of preserving order here. The model I had was that you define a set of tasks and the code figures out the right ordering to build the graph (roughly a topological sort). Preserving the order changes this model somewhat and I feel like allowing the code to rely on order makes things more fragile, not less. |
@jgraham first of all, the fix itself is not relying on the order (it special cases on the name "sink-task" and adds it at the end), which is to say the fix itself isn't fragile. Now I decided to keep the ordering throughout for two reasons:
I have an alternative: we could explicitly sort all the tasks except the sink task alphabetically in the final step, and assert that order in the tests. This has the benefits of being deterministic and making it clear that the ordering in YAML will not be kept. WDYT? |
I don't see how the final sort works when there are other tasks with dependents; tasks need to be scheduled after their dependents. The sink task is only special in the sense that we don't explictly specify that it depends on all other scheduled tasks, we implement that with code. My general feeling is that the input is accidentially ordered of the medium in which it's expressed (in the same way that a spreadsheet has an ordered list of rows and a haskell file is an ordered list of lines of code) and the output is necessarily ordered (because if A depends on B it has to be scheduled after B). So I see how that can lead to the impression that the whole thing is ordered. But it isn't intended to be the actual model here. Anyway I don't feel super-strongly about this but I think it would be more correct to make it explcit in the code that the input order isn't relevant rather than trying to preserve that through more of the code. |
Sorry. I meant to say sorting each layer of the DAG alphabetically, a deterministic behaviour which I'd still strongly favour over allowing non-deterministic ordering and writing a topological traversal in unit tests.
I agree with that though. Again, I do NOT intend to fix the issue by relying on having the sink task at the end of the YAML file; preserving the input order isn't for that purpose. However, I'm not sure how we can make it explicit in code that the input order isn't critical but only incidental (e.g. when it comes to unit tests); of course we can add some comments in the YAML file. |
Well we can also add comments in the code to make explicit what the invariants are and aren't. Or convert to sets where possible rather than lists. If you want to make the unittests more robust that's fine. |
d7be323
to
1442697
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I"m still not really convinced by changing more things into ordered dicts internally where the order explictly doesn't matter, but OK.
We don't need the sink-task for master pushes/etc, and this is a quick
workaround for the fact that Taskcluster doesn't support >100 data
dependencies (and we now schedule more than that for daily epoch runs!)