Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nasty sketch] Stop crashing High Sierra #1617

Closed
wants to merge 1 commit into from

Conversation

copumpkin
Copy link
Member

This is a nasty approach and I don't like it, but given that the other kill(-1, SIGKILL) approach crashes macOS 10.13 and we don't know when Apple will fix their kernel, we're left with shitty options.

I threw this together rather hastily to get people thinking about it. It's untested (I don't have a High Sierra or even a multi-user install handy right now) and I'm on a train and can't do much, but I figured I'd throw up the sketch and see what y'all thought of the approach.

cc @shlevy @edolstra @LnL7 @grahamc @matthewbauer

@copumpkin
Copy link
Member Author

This compiles so ship it, right? 😄

vector<struct kinfo_proc> proc_list(length);

newlength = length;
err = sysctl((int *)name, 4, (void *)proc_list.data(), &newlength, NULL, 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally going to check if length == newlength but then couldn't see a reason to...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a const_cast here with a comment reassuring us that the name argument won't be modified based on some manpage or whatever?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir!

This is a nasty approach and I don't like it, but given that the other
`kill(-1, SIGKILL)` approach crashes macOS 10.13 and we don't know when
Apple will fix their kernel, we're left with shitty options.
int err = 0;
size_t length = 0, newlength = 0;

// XXX: Should this be KERN_PROC_RUID? EUID vs. real UID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be identical for normal programs run by the user, so whichever one is set to the nixbld user for a setuid program, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// @grahamc points out https://utcc.utoronto.ca/~cks/space/blog/unix/ProcessKillingTrick
for (auto proc : proc_list) {
kill(proc.kp_proc.p_pid, SIGSTOP);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that between listing the processes and the SIGSTOP, the process dies and the PID is reused by another user. After the SIGSTOP, we can take a look to see if the PID is owned by the build user we're wanting to kill. If it is our builder, SIGKILL it. If not, SIGCONT it.

@dezgeg was kind enough to point out SIGCONT to a SIGSTOP'd process, which was SIGSTOP'd before we got to it is rude... which is true, but jeeze, that means the processes exited, the pid recycled, started, and SIGSTOP'd within such a brief period of time. Still rude, but we tried (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should do the kill's as the nixbld user instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if it fails due to EPERM, we just continue on our way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copumpkin Probably this whole loop should just be done as the nixbld user (excluding the current process, of course)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be really helpful if we could SIGKILL -1 and only kill things we own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll run as subprocess again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually, if I run as subprocess, we need to start worrying about our siblings killing us again...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if our siblings kill us we know there's a rogue

tries--;

// First call with NULL to get number of processes. Yeah yeah TOCTOU is fun, but we check
// that it hasn't changed later and keep going until it's 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check that it hasn't changed later, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't. Should I? My first thought was that I'd need to, but then I thought some more and convinced myself I wouldn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, shouldn't be necessary, just need to update the comment

if (err)
throw SysError("calling sysctl to get process list");

// @grahamc points out https://utcc.utoronto.ca/~cks/space/blog/unix/ProcessKillingTrick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief description of why this helps inline would be nice.

Copy link
Member

@shlevy shlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my minor nits, 🚢

@copumpkin
Copy link
Member Author

Thanks for all the feedback! Will fix when train wifi wakes up again or tomorrow

@matthewbauer
Copy link
Member

I think you’re also going to need to handle this SIGKILL:

::kill(-pid, SIGKILL); /* ignore the result */

@copumpkin
Copy link
Member Author

@matthewbauer I was never able to get it to die when killing a process group. Were you? I left it running for a while but only ever saw crashes from kill(-1, ...)

@matthewbauer
Copy link
Member

matthewbauer commented Oct 18, 2017

@copumpkin Ok maybe not... I guess I was thinking more had to be done based on 3cc1b57 still leading to crashes for some people. That was before we saw the issue with plain old kill(-1, SIGKILL) though.

@matthewbauer
Copy link
Member

matthewbauer commented Nov 3, 2017

@copumpkin I've got this branch installed and running on High Sierra. So far so good 💯. Will report back if I get a system lock again.

@copumpkin
Copy link
Member Author

Nice!


// First call with NULL to get number of processes. Yeah yeah TOCTOU is fun, but we check
// that it hasn't changed later and keep going until it's 0
err = sysctl((int *)name, 4, NULL, &length, NULL, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for you @copumpkin ?

Trying this little program is giving really big length values:

#include <sys/sysctl.h>
#include <stdio.h>
#include <string.h>

#define UID 30001

int main() {
        size_t length = 0;
        int errno = 0;
        static const int name[4] = { CTL_KERN, KERN_PROC, KERN_PROC_UID, UID };
        errno = sysctl((int *)name, 4, NULL, &length, NULL, 0);
        printf("uid:%i length:%zu err:%s\n", UID, length, strerror(errno));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I screwed it up! I haven't actually run this yet 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha that's fine. I'll go through it and see whats works. Before now I had hoped apple would just fix the issue but 10.13.1 still has the issue. Google has the same line here though so maybe my c program is what's wrong:

https://chromium.googlesource.com/chromium/src/base/+/master/process/process_iterator_mac.cc#27

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well each process entry is actually a struct so that might be fairly large. How big are the values you're seeing? Also, now that I think about it, my vector might be too big, since I should probably divide it by the size of the struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that looks more like it (ceacd3e):

Although it looks like it's the same regardless of the user:

euid:501 uid:501 length:344 err:Undefined error: 0
euid:501 uid:30001 length:344 err:Undefined error: 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, maybe we'll just need to filter by hand based on the struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe try KERN_PROC_RUID

@matthewbauer
Copy link
Member

matthewbauer commented Nov 5, 2017

Hey @copumpkin,
I've updated a few things here: https://github.com/matthewbauer/nix/tree/shitty-kill-processes

So far so good. Going to use it for a little while until I say it's worth merging but haven't had the hanging yet. (lot's of weird High Sierra stuff though- but like I've said I'd blame it on High Sierra's bugginess first).

Testing directions

$ git clone https://github.com/matthewbauer/nix
$ git checkout shitty-kill-processes
$ nix-build release.nix -A build.x86_64-darwin
$ sudo su
# nix-env -i ./result
# reboot

@copumpkin copumpkin closed this Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants