Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0085] NixOS Release Stabilization: ZHF on master, new timeline #85

Merged
merged 13 commits into from Apr 6, 2021

Conversation

jonringer
Copy link
Contributor

@jonringer jonringer commented Jan 26, 2021

In similar spirit to #80, this RFC tries to change the nature in which we conduct the stabilization of a NixOS release. Without Desktop managers being a major risk (due to #80), the release should be free to focus on stabilizing package builds. This should enable us to perform ZHF mostly on master with minor adjustments to staging and a change to policies regarding certain "critical" or "high-risk" packages.

Overall, the goal of this RFC to reduce the pain associated with ZHF (release stabilization) at a slight expense to the staging branch workflow during the stabilization window.

I eluded to this RFC in #80 (comment)

"Freeze" may be a little strong, as non-breaking changes are allowed. But I didn't want to use a term like "half-frozen" or "semifrozen" "Restricted" has been chosen as the term to describe disallowing breaking change updates.

Related discussions:
- https://discourse.nixos.org/t/20-09-release-retrospective/9851
- https://discourse.nixos.org/t/what-should-stable-nixos-prioritize/9646

cc @worldofpeace @FRidh @ryantm

rendered: https://github.com/NixOS/rfcs/blob/master/rfcs/0085-nixos-release-stablization.md

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/20-09-release-retrospective/9851/31

@jonringer
Copy link
Contributor Author

Added defintions section, and improved timeline clarity.

Updated new render

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea 👍


Normal staging development is disrupted for two weeks. (1 normal iteration)
Changes to "critical packages" may have to wait for a month to be merged into staging.
Master development will be uninterrupted, assuming all changes don't introduce new regressions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assumption is too optimistic. However it's IMHO not such a big deal if regressions don't stem from critical packages (such as the ones you've listed above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly from my observation that the week prior to the 20.09 scheduled release date saw ZHF PRs slow down significantly. The "easy to fix" packages, usually have quick turn around, and the more involved package fixes were specific to the 20.09 branch; so non of the more active contributors really wanted to "pick up" the work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in practice, the branches diverge quite a bit if a particular package (or package set) had a series of updates from branch-off. It was sometimes not obvious what series of changes allowed for unstable to work, but the beta branch to still be broken.


Actual release schedule may adjust dates slightly due to unforeseen events.

Release team will retain the right to refine what constitutes "critical packages", as this may
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Release team" mean? I know that a few people (including you IIRC) plan to "define" such a team rather than having two RMs with most of the workload, but I'm not sure if it's clear who's part of this at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it means the release managers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to eventually create a "release team", as many people have expressed interest in being a part of a release; but it's much bigger of a ask to be "the release manager".

- "Critical packages"
- Important packages which have many dimensions of build or runtime behavior.
- Generally these will be packages which are a part of `stdenv`, or bootstrapping a system (e.g. systemd).
- Initially these will be: `stdenv.cc`,`binutils`, and `systemd` for this RFC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I guess at least glibc should be listed here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Generally these will be packages which are a part of stdenv, or bootstrapping a system (e.g. systemd).

Separate building/bootstrapping stdenv from that of a (NixOS) system? I'm of the opinion all stdenv deps, including stdenv itself should freeze at this point.

Copy link
Contributor Author

@jonringer jonringer Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was implied that this would include everything in stdenv.cc, but I could make it more explicit.

nix-repl> stdenv.cc.libc.name
"glibc-2.32-25"

nix-repl> stdenv.cc.bintools.name
"binutils-wrapper-2.35.1"

EDIT: wording

[definitions]: #definition

- "Freezing" a branch or package.
- Disallow breaking changes to branch or package. In SemVer terminology, disallow major version bumps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should define an exception here: breaking changes can be allowed if those are necessary to fix security issues or fix a critical bug (i.e. something that corrupts your data or crashes your system).

Note: while backporting single patches is the preferred way in that situation, those can be "breaking" as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, and that's the exception I make for the release channel. But only if patching isn't a reasonable alternative.


- "Freezing" a branch or package.
- Disallow breaking changes to branch or package. In SemVer terminology, disallow major version bumps.
- A release branch can be thought of being permanently "frozen" in this regard. Stable may be a better term, but it's already overloaded with a previous release branch or channel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should define such a detail here: from my observation this isn't the status quo and IMHO also a fairly debatable, but out-of-scope topic.

Copy link
Contributor Author

@jonringer jonringer Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was more meant to help convey that freezing only applied to breaking changes. No one expects the stable channel to receive 0 updates, but people do expect the stable channel to rarely cause upgrade issues.

@Ericson2314
Copy link
Member

I have historically been skeptical of slowing down master/staging/staging-next. But I now think the painfulness of the release process might justify it. Perhaps this we need to get good at making the deadlines and not killing ourselves doing things this way, hopefully improve CI so there's less "fixing things after they break" in general, and then we re-earn the privilege of not needing to slow down master/staging/staging-next.

@jonringer
Copy link
Contributor Author

hopefully improve CI so there's less "fixing things after they break" in general,

I agree with this in general, but there's also issues like what we encounter with plasma which will be really hard to construct. This was a big factor for me to push DM packaging so far from the release date, we really do need the time on unstable for those scenarios to be stabilized.

@ryantm
Copy link
Member

ryantm commented Feb 4, 2021

I nominate myself to shepherd.

@garbas
Copy link
Member

garbas commented Feb 6, 2021

I nominate myself to shepherd as well.

@jonringer
Copy link
Contributor Author

Just a heads-up on the timeline, in a few weeks gnome should be releasing the gnome-40 rc, which would be the beginning of this schedule.

The only "risk" would be systemd 248 or gcc 11 (+binutils) being released during the stabilization period. However, those PRs usually take several weeks to months to complete; so unless either package gets a new release in the next week or two, the stabilization needed for those updates to just target master would likely go past the current branch-off date. The biggest change would be that they couldn't be crammed in right before branch-off.

The real ask is delay the critical updates for 4 weeks (likely not to be a concern for this 21.05 release), and staging-targeted breaking changes for 2 weeks.

Either way, with all the issues that we have had with previous releases, I think we should attempt this new more "aggressive" schedule. In the worst case scenario, we may have to delay the release a week to backport a fix. In the best case scenario, most people won't notice a change to unstable workflow.

@Mic92
Copy link
Member

Mic92 commented Feb 18, 2021

I nominate myself to shepherd as well.

@Mic92
Copy link
Member

Mic92 commented Feb 18, 2021

@ryantm do you want to be shepherd leader?

@alyssais
Copy link
Member

alyssais commented Feb 18, 2021 via email

@7c6f434c
Copy link
Member

@alyssais

why can't we just revert problematic updates after branch-off

As someone who also does not use the releases, I would assume that reverting on-branch means that some ZHF fixes have to be different on the release branch compared to the mainline, and so you cannot run a unified ZHF sprint without a mess of some things happenning only on one branch or something like that.

@jonringer
Copy link
Contributor Author

jonringer commented Feb 18, 2021

First (and maybe this is very naive of me), why can't we just revert
problematic updates after branch-off, like the problematic systemd bump
from last time?

It can be very hard to capture all the changes that are related to a given bump.
For staging updates especially, where there may be many leaf packages that
are broken, it's sometimes not obvious how to capture all of the changes. Just
reverting a bump is not usually sufficient. Also the size of most systemd update
PRs is pretty large, so reverting the 30-100 commits can also be very error proned
and tedious.

It's not clear to me what this means. Is a git bump (for example) a
breaking update or a non-breaking update? Maybe this could be added to
the terminology section.

"Disallow breaking changes to branch or package. In SemVer terminology, disallow major version bumps."
If the change is git: 2.30.0 -> 2.30.1 or git: 2.30.0 -> 2.31.0, then it's fine. Just worried about
git: 2.30 -> 3.0.0, in which there's likely to be some breaking changes which
would need a longer stabilization time to full fix everything.

@ryantm
Copy link
Member

ryantm commented Feb 18, 2021

@ryantm do you want to be shepherd leader?

Sure, I can be leader.

@alyssais
Copy link
Member

alyssais commented Feb 18, 2021 via email

@andir
Copy link
Member

andir commented Feb 18, 2021

I am not entirely sold on the idea yet. If it solves the stable problems I am all for it. If it works in practice remains to be tested.

The one change that should probably be made to this RFC is remove any listing of packages outside of the Movation section. Those should be soley defined in the specific instance of ZHF. We could make a different RFC specifying a common set of packages with reasoning for each of them but this should really just specify the concept of ZHF on master without going too much into the direction of which packages are actually "critical" and which are not.

Another worry I have is: Communication
How do we inform all the maintainers/commiters in the project about the ZHF and the current policies towards what they are allowed/not allowed to merge?
Other projects have a ML for all the developers where each and everyone is subscribed for annoucements. The best we have is discourse which is pretty much mixed content and noisy for such information.

@jonringer
Copy link
Contributor Author

Maybe "breaking change" should be a top-level "definitions" entry rather than defined under "frozen". :)

The best term would have been stable, but as I also mentioned, that term is already overloaded in the nixpkgs as it generally refers to the current release branch, not the upcoming release. For ZHF, I will probably use the phrase frozen for breaking changes to help emphasize that minor changes and new packages are still welcome.

If someone can think of a better term, I will gladly change it.

If it works in practice remains to be tested.

Agreed, lets test it.

remove any listing of packages outside of the Movation section. Those should be soley defined in the specific instance of ZHF.

I disagree. Sure, not every "critical" package may be applicable to a release. But I would also like to avoid the situation of, "some people got in a room together and came up with this list without significant community involvement". I think a more appropriate time to revisit the list would be the release retrospective. This at least allows the community to participate, and major stakeholders can express their opinions. I tried to do one for 20.09, which this RFC was one of the action items. Hopefully the next release will have more participation, and I still want to thank those who showed up.

I could also argue that creating an RFC to facilitate changes to the release process is only to involve the community more. However, just saying, "I think this would be better" without any input isn't a healthy way to do process changes. And this particular change should have community involvement since I would like to alter some merge criteria on unstable leading up to the release.

We could make a different RFC specifying a common set of packages with reasoning for each of them but this should really just specify the concept of ZHF on master without going too much into the direction of which packages are actually "critical" and which are not.

I would like to avoid RFCs as much as possible. The process is slow (for good reason), and takes a fair amount of time and energy for the RFC committee to process. I think a retrospective would be a better format as the pain of the release would still likely be fresh, and certain packages would be easier to identify as being problematic. The list of critical packages can just be maintained in the release wiki.

How do we inform all the maintainers/commiters in the project about the ZHF and the current policies towards what they are allowed/not allowed to merge?
Other projects have a ML for all the developers where each and everyone is subscribed for annoucements. The best we have is discourse which is pretty much mixed content and noisy for such information.

I mention in future work that I will be creating a discourse thread. And will likely ping someone to pin it for a while. I was probably also going to do a reddit post and discord announcement just referring to the discourse thread as well, but since reddit or discord isn't an official platform, I didn't mention them.

But NixOS communication problems aren't unique to this RFC, and slightly off-topic.

For enforce-ability, I will be quite active (and hopefully the other RM, still TBD) and able to enforce it for most PRs. Most staging-targeted PRs take a while to review anyway, so I'm not too concerned that something will be merged while I was sleeping or away. I mentioned here that it's still somewhat of a desire to have a "release team", which would help distribute these responsibilities and own certain release issues.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/21-05-call-for-release-manager/10204/16

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/21-05-call-for-release-manager-and-release-members/10204/17

staging. In addition, I also propose to stabilize the release on master
instead of the release branch, thus avoiding the need to backport every pull
request, and take advantage of the stabilization that unstable normally
receives.
Copy link
Member

@Mic92 Mic92 Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also describe what this means for new packages/modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New packages are welcome. Changes to modules are welcome, assuming it doesn't cause regressions, and the module changes will probably need to have documentation so it's not additional work trying to curate the release notes.

I will modify the definitions section to say that "freezing" will still allow new package and module additions.

@Mic92
Copy link
Member

Mic92 commented Feb 23, 2021

@ryantm do you want to be shepherd leader?

Sure, I can be leader.

@ryantm can you setup a meeting

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/rfc-0085-fcp-nixos-release-stabilization-zhf-on-master-new-timeline/12105/1

Copy link
Member

@davidak davidak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

rfcs/0085-nixos-release-stablization.md Outdated Show resolved Hide resolved
rfcs/0085-nixos-release-stablization.md Outdated Show resolved Hide resolved
rfcs/0085-nixos-release-stablization.md Outdated Show resolved Hide resolved
rfcs/0085-nixos-release-stablization.md Outdated Show resolved Hide resolved
| Weeks from Release | Branches Affected | Events |
| --- | --- | --- |
| -8 Weeks | | Gnome and Plasma(YY.11) packaging begins |
| -6 Weeks | `staging-next`, `staging` | Restrict breaking changes to Release Critical Packages |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Restrict breaking changes to Release Critical Packages" is phrased ambiguously can be parsed in to opposing ways:

  • "restrict X to Y" in the sense of "limit application of X, so that it's only applied to Y", i.e., exclude everything from X except Y
    Only Release Critical Packages may receive breaking changes. (Other packages must not.)
  • "changes to Y" in the sense of "changes that are made to Y"
    ⇒ Release Critical Packages must not receive breaking changes. (Other packages may.)

The phrasing should be changed so that it's clear which one is intended (probably the latter). Note that the definition of adjective/participle "restricted" above only halfway helps here, because the current phrasing uses the verb "(to) restrict" instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we defined Restricted above, I think we can just say:

Suggested change
| -6 Weeks | `staging-next`, `staging` | Restrict breaking changes to Release Critical Packages |
| -6 Weeks | `staging-next`, `staging` | Release Critical Packages are Restricted |

rfcs/0085-nixos-release-stablization.md Outdated Show resolved Hide resolved
@jonringer
Copy link
Contributor Author

jonringer commented Apr 2, 2021

I have historically been skeptical of slowing down master/staging/staging-next. But I now think the painfulness of the release process might justify it. Perhaps this we need to get good at making the deadlines and not killing ourselves doing things this way, hopefully improve CI so there's less "fixing things after they break" in general, and then we re-earn the privilege of not needing to slow down master/staging/staging-next.

For clarification, this has no implication on master. The changes to master should be able to be captured for each PR, and we can effectively move "green to green". These restrictions are more in regard to staging, where most PRs we may verify by building just a downstream package and/or run a test or two to verify it's not "absolutely broken"; but these often cause other packages to break.

Also breakages introduced through staging PRs will likely need a staging PR to fix them, which makes the iteration for correcting such breakages much longer than PRs which target master, in which "follow-up" PRs will also likely target master and the fixes can be merged "immediately".

rfcs/0085-nixos-release-stablization.md Outdated Show resolved Hide resolved
@ryantm
Copy link
Member

ryantm commented Apr 4, 2021

The FCP comment period has ended. There has been some discussion and changes but I believe they are immaterial to the original intent/spirit of the RFC, so I do not think it is necessary for @garbas and @Mic92 to vote again on accepting the RFC.

We offer this RFC to @NixOS/rfc-steering-committee for acceptance and merging!

@Mic92 Mic92 merged commit aada139 into NixOS:master Apr 6, 2021
@jonringer jonringer deleted the 85-ZHF-on-master branch April 6, 2021 15:09
@jonringer
Copy link
Contributor Author

Awesome, I'll do some announcements :)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/03-nixos-weekly/12391/1

@jtrakk
Copy link

jtrakk commented Apr 10, 2021

Rendered link is 404

@jonringer
Copy link
Contributor Author

Rendered link is 404

Updated now that it has been merged

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/21-05-has-been-released/13407/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/what-should-stable-nixos-prioritize/9646/81

@shlevy
Copy link
Member

shlevy commented May 18, 2022

The nixpkgs manual should be updated in light of this. https://nixos.org/manual/nixpkgs/stable/#submitting-changes-staging-branch and https://nixos.org/manual/nixpkgs/stable/#submitting-changes-staging-next-branch do not indicate any freezes relating to the release cycle.

KAction pushed a commit to KAction/rfcs that referenced this pull request Apr 13, 2024
…ixOS#85)

Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
Co-authored-by: Ryan Mulligan <ryan@ryantm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet