-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of Concept: detect stack overflows (x86_64-linux-gnu target only) #6376
Proof of Concept: detect stack overflows (x86_64-linux-gnu target only) #6376
Conversation
Or why not checking if the signal ( def method
method
end
method then I always get signal 11. |
@r00ster91 because you also get it if you do |
@r00ster91 SIGSEGV (signal 11) can have a number of reasons, one of them can be a stack overflow. More details at https://en.wikipedia.org/wiki/Segmentation_fault |
stack = Pointer(Void).new(Fiber.current.@stack_bottom.address - stack_size) | ||
end | ||
|
||
if stack <= addr < Fiber.current.@stack_top |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we've come from a segfault, wouldn't @stack_top
be out of date? What about detecting if the fault is anywhere inside @stack_bottom - stack_size <= addr <= @stack_bottom
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's a typo: it should be stack_bottom
instead of stack_top
. It still works thought, since stack_top
was a valid stack pointer, but stack_bottom
would be better, yeah.
Yet we must check with stack
which is the far limit of the stack:
<- stack grows down
+-------x--+----------+---------------------------------+
^ ^ ^ ^ ^
| | | | |
+ stack | + guard + top + bottom
|
+ invalid memory access: stack overflow
@ysbaddaden ping, this seems like a couple characters fix then ready to merge. And doing the extra |
@ysbaddaden are you interested in continuing this PR? It'd solve a long-standing problem in crystal and the code looks good |
If someone is willing to add the missing |
@RX14 , @ysbaddaden: On macOS it turns out that the SEGV always happens at an address lower than the top of the stack, but near the top of the stack. The actual amount depends on the size of the stack frames being allocated in recursion. So the check being done here never detects stack overflow. Even on x86_64-linux-gnu, the SEGV isn't always within the body of the stack. E.g., in some calls of a recursive function that allocates 4kb on the stack, sometimes I see something like
where
which does not. I think the thing to do is
I.e. keep the original message (after all, although unlikely, it might have just been a vanilla bad pointer dereference). But add on a clause that it looked like a stack overrun to help the diagnosis. The aim isn't to be perfect, just helpful much of the time. |
About those bindings, I'd offer to help but I'm a little intimidated. Presumably the linux versions will all be the same, and the darwin and BSD versions should be the same. But that's a lot of hardware to test and verify on (esp. arm-linux-gnueabihf). What have you guys done in the past? Oh, and Windows. Windows doesn't have the POSIX |
I'd say the last page in the stack and then the page after the stack. |
We use |
@damaxwell mentioned functions which allocate 4k on the stack, which could bypass a whole page. |
I took a look and I don't think there is a call to I just checked, and for spawned fibers on macOS, the seg faults are indeed within the stack for small recursive functions, just like Linux. So the difference seems to be that the two OS's protect the end of the main stack differently, one within and one outside. I think the easiest thing to do is just expand the check zone a page or two regardless. This will address the macOS main thread, and the extra buffer will capture some other cases where stack allocation leads to a SIGSEGV beyond the end of the stack. |
Oh. We can't call Since the program segfaulted, and if that happened within the stack or the next 4KB below it, but we can safely say "Potential stack overflow (e.g., infinite or very deep recursion)", yes. |
I've got the |
@damaxwell you can try https://github.com/multiarch/alpine |
With #6928 merged, this can be closed. |
I noticed that memory accesses that segfaults are near the bottom of stack, that is within the stack guard allocated by the kernel —I initially expected the guard to be after the stack, but it's actually within it. Checking if
addr
in the segfault handler (the faulting memory address) is within the stack of the current fiber stack is enough to detect a stack overflow.Since the main fiber doesn't have a
@stack
, I usegetrlimit(2)
to get the default stack size and determine the main stack limit. I only added bindings for x86_64-linux-gnu, but they should be the same for all Linux libc.I don't know how other POSIX systems (Darwin, OpenBSD, FreeBSD) handle their stack guard.
NOTE: there could a
Crystal::System.within_current_stack?(addr)
orstack_overflow?
method.refs #271