Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0020] Security On Call #20

Closed
wants to merge 6 commits into from
Closed

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Oct 30, 2017

@grahamc grahamc changed the title . [RFC 0020] Secucrity On Call Oct 30, 2017
@grahamc grahamc changed the title [RFC 0020] Secucrity On Call [RFC 0020] Security On Call Oct 30, 2017
@grahamc
Copy link
Member Author

grahamc commented Oct 30, 2017

Updated: RFC-0020 Security On Call

- debian
- redhat
- suse
- gentoo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add Arch here. They are the only ones that are preferring upstream releases instead of patches because they're a rolling distro. This should also give us notifications for new upstream releases with security fixes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@tilpner
Copy link
Member

tilpner commented Oct 30, 2017

A shared mailbox can lead to "I thought you were handling it" situations by accidentally marking messages read without feeling responsible for them. A system that requires more action to claim an issue would be better, ideally displaying who claimed it.

That sounds similar to GH issues, like the alternatives mention. It shouldn't be too hard to create an issue per mail on a separate GH repo, and it would nicely integrate into nixos/nixpkgs issue tracking.

This would not work with a public repo for patching-in-secret, like oss-security demands...

@grahamc
Copy link
Member Author

grahamc commented Oct 30, 2017

Perhaps we could use pagerduty's (or a similar service's or tool's) email-to-page trigger for ensuring every email is assigned and handled.

@grahamc
Copy link
Member Author

grahamc commented Oct 30, 2017

cc @peti, who I'd love to have as a co-author :)

@shlevy
Copy link
Member

shlevy commented Oct 30, 2017

Overall looks good to me, though I agree that a shared mailbox by itself isn't enough and every incoming message should be assigned in some visible way to whoever is on call at the time it comes in. Pagerduty seems like a reasonable way to address that.

@zimbatm
Copy link
Member

zimbatm commented Oct 30, 2017

The overall proposal looks reasonable to me. The security team should be fairly autonomous and be able to organize itself. If the majority of the team members agree with the RFC then I think that it can be merged.

Regarding the on-call rotation, it might be better to figure out everybody's timezone and create a pager rotation in such way that all members can contribute when they are awake. Maybe we can get a free plan from pagerduty or opsgenie as OSS contributors? (EDIT: just asked both companies, waiting on response)

@grahamc
Copy link
Member Author

grahamc commented Oct 30, 2017

@zimbatm I think having 24hr shifts is better, so it is less frequent. Issues don't need to be addressed immediately (no waking people up) but within, say, 24hrs of it being assigned.

@7c6f434c
Copy link
Member

(repeating from IRC)

@grahamc maybe some kind of indication of expected reaction times is a nice idea. And these should be divided by situation: «A post to oss-sec from a well-known researcher with a 1-line patch acknowledged by upstream» is not the same as «Someone reported a few AFL-generated crashes».

And of course, 24h is not always enough, as for large rebuilds there may be a need to check later for failures in reverse dependencies.

And I guess there should be a procedure for long-term tracking of an issue: it is not that rare that there are a few fuzzer-generated crashing inputs, and neither reporter nor upstream nor major distros nor us have any idea how to localise where the actual bug lives. It would be nice to patch such things once there is a patch, but that requires a separate long-term process; on-call handling of such an issue ends with ensuring that the long-term process has started.

@zimbatm
Copy link
Member

zimbatm commented Oct 30, 2017

PagerDuty: We currently do not have a free version, or open source project, of PagerDuty. However, I can pass along your feedback to our Product team.

@grahamc
Copy link
Member Author

grahamc commented Oct 31, 2017

All the on-call management tools I've found start at $15/mo, and I'm not aware of any suitable open source projects. Looking further in to it.

@7c6f434c
Copy link
Member

@grahamc Maybe you could list the feature subset that you care about most (it might be a plugin for Redmine that just happens to not be called on-call, after all)?

I guess it is not surprising that commercial tools start at $15 per month if they include automated SMS or phone calls; an open source solution could do that with adb (or gnokii for old phones), but with a global team it won't be necessarily cheaper with subscription for phones in different countries (although depends on the incident count per month)

@shlevy
Copy link
Member

shlevy commented Nov 1, 2017

Do we really even need phone calls?

@grahamc
Copy link
Member Author

grahamc commented Nov 1, 2017 via email

## 1. Security Posture, Prompt Patching, and Documented Process

Our process for security issues is currently fairly well executed, but
porly defined. How to participate and how to do the job is very
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

porly -> poorly

1. Monitor a well defined list of mailing lists for new issues.
2. Ensure each issue is triaged and addressed if needed.

#2 is a bit vaguely worded, as the person is not required to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"# 2" ==> "Responsibility # 2" to not trigger a markdown header.

#2 is a bit vaguely worded, as the person is not required to
_actually_ fix the issue. They are allowed to delegate the patching to
other people. However, they _are_ responsible for ensuring the issue
is _fixed_.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a MAINTAINERS file, we could delegate security issues in non-critical packages (let's say not in nixos-small) to the respective maintainers by default.

@moretea
Copy link

moretea commented Nov 1, 2017

I agree with having one e-mail address to receive those messages on.

As a workflow, what do you think about use BugZilla + their email ingestion script to manage incoming advisories? This will also allow us to mark different security updates as duplicates of each other in BugZilla, and having just one relevant security issue in the nixpkgs repo on GitHub.

The advantages of this setup compared to doing it on GitHub are that:

  1. We will not have a continuous stream of potentially irrelevant issues on GitHub, lots of duplicates etc.
  2. It would allow is to experiment with a private issue tracker for getting access to the distro security list.

@moretea
Copy link

moretea commented Nov 1, 2017

At my $JOB, we're using a simple Google Drive Spreadsheet to track who is providing support for one of our clients. Might be good enough to start with, until practicing this for a while clarifies what we want a automated solution to actually do for us.

@moretea
Copy link

moretea commented Nov 1, 2017

Once we have a maintainers file, we could delegate security issues in non-critical packages (let's say not in nixos-small + set of commonly used internet facing tools) to the respective maintainers by default, potentially via a bot on the issue tracker.


The patching team will not handle issues under embargo.

## Patching Team
Copy link
Member

@peti peti Nov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not refer to that group of people as "patching team". I'd rather call them "incident managers" or something like that. The way I see it, the primary role of those volunteers is not to fix the actual security issue. Their role is to ensure that:

  • no security issue is overlooked,
  • everything is recorded properly in some bug tracking system, so that others can check the current state of affairs there,
  • the process of fixing the issue is delegated to the package's maintainer,
  • maintainers are notified (and reminded) of these issues.

While I think it's fine for members of that on-call team to fix security issues themselves, I believe that the documentation part of the job is equally or even more important.

2. A review process for advisories
3. Guidelines for backporting vs. separate patches when fixing a
package for Stable
4. A tool for handling the On Call schedule asignments and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All team members should be on a shared channel (mailing list, IRC, wire.com, Hangouts, whatever), and at the end of their shift the currently responsible team member should ping their successor and remind them about the upcoming shift.

peti and others added 2 commits November 2, 2017 19:29
Extend the section about the team's responsibilities
@grahamc
Copy link
Member Author

grahamc commented Jan 10, 2019

While I do think this is an important step we will eventually take, I'm not ready yet to take this forward and finish it. At the previous NixCon we made progress on some alternative plans, and I look forward to exploring those further. We can reopen this later, but closing for now until there is reason to progress on it.

@grahamc grahamc closed this Jan 10, 2019
@timokau
Copy link
Member

timokau commented Jan 10, 2019

Is there anything online on those alternative plans?

infinisil referenced this pull request in nixpkgs-architecture/rfc-140 Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants