-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trivial memory protection unit (stack overflow protection) #544
Comments
Isn't stack probing slow? And if it can do that, can't it check the stack pointer in software? |
I'm not sure what leads you to this conclusion, but no, it isn't. In any case probing per se only happens on frames with stack objects larger than 4kb (and a few other specific circumstances), in almost all cases the normal stack accesses are enough to ensure a crash on overflow.
This is not only significantly slower but also much more complex, because the machinery that does this depends on thread local storage to store the stack pointer. This machinery is not available without an OS, or in static executables, or in the OR1K LLVM backend, or in Rust (anymore), and in the case that it would be implemented for all of the above, it would have to involve a function call in every single function prologue. On top of all this it's not compatible with C code, so if e.g. a Rust thread calls into lwip and lwip overflows stack, that would result in a crash as usual. |
What about this threading library you are porting?
If the C function allocates a large buffer on the stack, it can still overflow even with your proposed MPU. |
Can stack probing be done with a read and not a write? This way we would not have to quash the bus transaction on error (which would increase memory latency by one cycle). |
It relies on stack probing, of course.
Yes, but I don't think lwip does that.
Sure, the way it's implemented is that the backend inserts a call to |
My question was about thread local storage, which I am surprised it does not provide. |
Thread-local storage is a pretty annoying topic; if you are running in a hosted environment (and the library is designed to work in one also), then there's no real way to integrate with the standard thread-local storage that is done via ELF sections. In effect, compiler-provided thread-locals aren't usable with any usermode scheduling libraries. In addition to that, this is almost completely missing from both the OR1K LLVM backend and the linker, and since there is no published specification for how this should be implemented for OR1K I don't even know where to start. Of course, you could always define a global variable yourself, but then you have to maintain a fork of LLVM forever, because the upstream split-stack machinery (which was (ab)used for stack overflow checking) is not flexible enough to support that out of the box and is not supposed to do this anyway. Basically, all other methods are much more work than stack probes, create problems with upstreaming and most of them are slower on top of that. |
Is a bus error exception appropriate on protected memory access? |
Sure.
Sure. |
This is now in MiSoC, it needs to be tested and hooked up in ARTIQ. |
mor1kx ignores wb.err on read cycles, so looks like this will not work so easily. |
Removing from 3.0 because of #544 (comment). |
Moving from #569; this will also need stack probe fixes in LLVM (https://reviews.llvm.org/D9653) to work correctly. |
Stack probes were merged a good while ago in LLVM. There'll need to be some minimal work for OR1K for them to fully function. |
Funded by Duke. |
Adding a Physical Memory Protection (PMP) unit on the RISC-V core should be able to do that. Accessing the protected memory region without necessary privilege/access right will cause an exception from the CPU. Just like a guard page.
We can perform an environment call ( |
It is probably better to align the thread stack on 4k boundary as well. Then, we make the stack 4k larger as well when spawning a thread and put the lowest 4k into a PMP region as the stack guard. The problem is Also, I might be able to remove this hack. It was to fix a heap corruption issue in |
On both kernel and comms CPUs, we mostly use memory-safe code (or will use, once Rust on comms CPU ships), so it is largely unnecessary to have a full-blown MPU. However, memory-safe code relies on detection of stack overflows, and currently that would not happen--indeed, overflows on both CPUs would silently corrupt data and crash some time later.
I propose a trivial MPU to alleviate this problem. It would work as follows:
n would be configurable; on kernel CPU we need one, on comms CPU we need the same n as number of threads.
It is only necessary to trap on a region 4k long because LLVM and Rust support stack probing, intended to be safe in presence of guard pages; when allocating a stack frame, LLVM will write a dummy value to stack every 4k bytes.
The text was updated successfully, but these errors were encountered: