-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Nasty sketch] Stop crashing High Sierra #1617
Conversation
This compiles so ship it, right? 😄 |
vector<struct kinfo_proc> proc_list(length); | ||
|
||
newlength = length; | ||
err = sysctl((int *)name, 4, (void *)proc_list.data(), &newlength, NULL, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally going to check if length == newlength
but then couldn't see a reason to...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a const_cast
here with a comment reassuring us that the name
argument won't be modified based on some manpage or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yessir!
This is a nasty approach and I don't like it, but given that the other `kill(-1, SIGKILL)` approach crashes macOS 10.13 and we don't know when Apple will fix their kernel, we're left with shitty options.
bd4370a
to
e69a111
Compare
int err = 0; | ||
size_t length = 0, newlength = 0; | ||
|
||
// XXX: Should this be KERN_PROC_RUID? EUID vs. real UID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be identical for normal programs run by the user, so whichever one is set to the nixbld user for a setuid program, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// @grahamc points out https://utcc.utoronto.ca/~cks/space/blog/unix/ProcessKillingTrick | ||
for (auto proc : proc_list) { | ||
kill(proc.kp_proc.p_pid, SIGSTOP); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that between listing the processes and the SIGSTOP, the process dies and the PID is reused by another user. After the SIGSTOP, we can take a look to see if the PID is owned by the build user we're wanting to kill. If it is our builder, SIGKILL it. If not, SIGCONT it.
@dezgeg was kind enough to point out SIGCONT to a SIGSTOP'd process, which was SIGSTOP'd before we got to it is rude... which is true, but jeeze, that means the processes exited, the pid recycled, started, and SIGSTOP'd within such a brief period of time. Still rude, but we tried (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should do the kill
's as the nixbld user instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if it fails due to EPERM, we just continue on our way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copumpkin Probably this whole loop should just be done as the nixbld
user (excluding the current process, of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be really helpful if we could SIGKILL -1 and only kill things we own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll run as subprocess again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually, if I run as subprocess, we need to start worrying about our siblings killing us again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if our siblings kill us we know there's a rogue
tries--; | ||
|
||
// First call with NULL to get number of processes. Yeah yeah TOCTOU is fun, but we check | ||
// that it hasn't changed later and keep going until it's 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't check that it hasn't changed later, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't. Should I? My first thought was that I'd need to, but then I thought some more and convinced myself I wouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, shouldn't be necessary, just need to update the comment
if (err) | ||
throw SysError("calling sysctl to get process list"); | ||
|
||
// @grahamc points out https://utcc.utoronto.ca/~cks/space/blog/unix/ProcessKillingTrick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief description of why this helps inline would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my minor nits, 🚢
Thanks for all the feedback! Will fix when train wifi wakes up again or tomorrow |
I think you’re also going to need to handle this SIGKILL: Line 1013 in 1dd29d7
|
@matthewbauer I was never able to get it to die when killing a process group. Were you? I left it running for a while but only ever saw crashes from |
@copumpkin Ok maybe not... I guess I was thinking more had to be done based on 3cc1b57 still leading to crashes for some people. That was before we saw the issue with plain old |
@copumpkin I've got this branch installed and running on High Sierra. So far so good 💯. Will report back if I get a system lock again. |
Nice! |
|
||
// First call with NULL to get number of processes. Yeah yeah TOCTOU is fun, but we check | ||
// that it hasn't changed later and keep going until it's 0 | ||
err = sysctl((int *)name, 4, NULL, &length, NULL, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work for you @copumpkin ?
Trying this little program is giving really big length values:
#include <sys/sysctl.h>
#include <stdio.h>
#include <string.h>
#define UID 30001
int main() {
size_t length = 0;
int errno = 0;
static const int name[4] = { CTL_KERN, KERN_PROC, KERN_PROC_UID, UID };
errno = sysctl((int *)name, 4, NULL, &length, NULL, 0);
printf("uid:%i length:%zu err:%s\n", UID, length, strerror(errno));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I screwed it up! I haven't actually run this yet 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha that's fine. I'll go through it and see whats works. Before now I had hoped apple would just fix the issue but 10.13.1 still has the issue. Google has the same line here though so maybe my c program is what's wrong:
https://chromium.googlesource.com/chromium/src/base/+/master/process/process_iterator_mac.cc#27
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well each process entry is actually a struct so that might be fairly large. How big are the values you're seeing? Also, now that I think about it, my vector might be too big, since I should probably divide it by the size of the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that looks more like it (ceacd3e):
Although it looks like it's the same regardless of the user:
euid:501 uid:501 length:344 err:Undefined error: 0
euid:501 uid:30001 length:344 err:Undefined error: 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, maybe we'll just need to filter by hand based on the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe try KERN_PROC_RUID
Hey @copumpkin, So far so good. Going to use it for a little while until I say it's worth merging but haven't had the hanging yet. (lot's of weird High Sierra stuff though- but like I've said I'd blame it on High Sierra's bugginess first). Testing directions
|
This is a nasty approach and I don't like it, but given that the other
kill(-1, SIGKILL)
approach crashes macOS 10.13 and we don't know when Apple will fix their kernel, we're left with shitty options.I threw this together rather hastily to get people thinking about it. It's untested (I don't have a High Sierra or even a multi-user install handy right now) and I'm on a train and can't do much, but I figured I'd throw up the sketch and see what y'all thought of the approach.
cc @shlevy @edolstra @LnL7 @grahamc @matthewbauer