-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fiber#cancel #6450
Add Fiber#cancel #6450
Conversation
7acecfb
to
0bdfcbe
Compare
How is the performance of https://github.com/crystal-lang/crystal/blob/master/samples/channel_primes.cr affected, replacing 100.times with a bigger number (I think 1000 or 10000 is good for a benchmark) |
4842cac
to
03e503b
Compare
This is the mean over 5 runs of
1% doesn't sound too bad considering the example is constantly switching fibers back and forth. In a real-world application it'll barely be noticeable. The performance loss could probably be avoided with a customized |
03e503b
to
ccf8c83
Compare
With a bit of work an uncatchable exception could be added. But it wouldnt be very useful since I don't think this mechanism is robust enough to add. |
I'd rather have a different implementation which keeps track of the explicit resources the fiber holds that can deadlock on a seperate stack, and use that to clean up. Is it only mutexes or is there stuff around IO scheduling that can break? If its only mutexes it shouldnt be too hard. |
I don't think this is only about avoiding deadlocks with mutexes or anything but to properly clean up a fiber's stack, which means to execute all
What do you mean with that? |
@straight-shoota what do you mean clean up the stack? You can just delete the stack to clean it up. The only issue is you can get deadlocks when you never unlock a mutex or so. One way to do that is to only execute ensures, but you don't have to go that route. |
I mean to unwind the stack and doing all the clean up in the process, such es executing Cancelling a fiber is not huge error (it's not an error at all) but rather a simple way of canceling a task when it is no longer needed. But in my opinion, it should not just try to do the bare minimum to avoid deadlocks, but properly unwind and do clean up. I guess this is similar to I'd like to have a way to gracefully stop a fiber (likely while it is waiting for IO) which can't be implemented in a nice way in user code. A typical use case for this would be something like the typical algorithm for establishing a TCP connection, which starts multiple connection attempts more or less simultaneously in fibers and as soon as one of them succeeds, the others can be stopped. But they should properly unwind and release their resources (for example sockets). |
2 exceptions, 2 stack unwinding (very costly) and an additional check for each and every fiber resume :-/ Isn't killing a thread considered bad practice? I don't see why canceling a fiber would be better. There are better patterns to stop a fiber with proper cleanup than a hard cancel. |
Well, the inner exception is optional. I thought it might be useful to know where a fiber was cancelled, but in practice it's probably not that important. So it could be only one exception and unwinding. And I don't think there is an alternative to that if you want to execute The additional check for each
First of all, it's not a hard cancel. The fiber can rescue from the exception if necessary. What would be a better pattern, say for my example use case? |
if we do this we should do it with custom codegen so it's actually impossible to rescue. It shouldn't be too hard. |
What bugs me is two fold:
|
Also, it only cancels the fiber if and when it's resumed by the scheduler. If it never enters the scheduler it will never be cancelled. Still, cancelling will delayed to later on. Is it really better than to send a message through a channel telling it to stop? Is it really worth to introduce a "custom codegen"? |
Yes, in some cases it is better to use a channel or a simple loop condition to let the fiber stop itself. The problem is that this doesn't work when the fiber is waiting for an event like IO or a timeout. The only way this can be implemented is to insert a check for a cancel request after each expression that could include a context switch. As this is not practical, this logic should be in I don't think this needs custom codegen. It is fine if a
|
@straight-shoota the problem is that without a larger change to the compiler, an ensure block that raises an exception will raise a standard exception which will be cause by |
Actually the only change needed would be to introduce an intermediate exception and make But I don't know about cancelling a fiber. There must be a reason no other language allows such thing. |
@asterite I just described the problem with that approach and I think it's the kind of ugly problem that will raise it's head in production and make a mess. |
This proposed behaviour is actually implemented very similar to Python's @RX14 What exactly is the problem with the solution I proposed in the original post? It shouldn't be very difficult to insert a new exception type above |
It also happens that Ruby has some exceptions that aren't caught by default, and the same is true for a Java (the Error class). In C# there's a ThreadAbortExcrption that can be caught but it's automatically rethrown. It seems the proposed solution here is similar to other languages, I don't see a problem with that. Maybe it would be nice if @waj could comment whether gracefully stopping a fiber is something bad. |
@straight-shoota I just described how exactly what you describe fails!!! Please read again |
Oh, seems like I had overlooked that comment. I'm sorry. I'm not sure this is really a big issue though. An |
@straight-shoota yes it usually is a severe error that won't allow continuation of the task. But so are most exceptions, and raising from Such a common thing shouldn't abort the whole cancelling of a fiber. That's why it'd need seperate codegen, to automatically silently ignore exceptions in ensure blocks when you were unwinding to close. And given that complexity, I really don't like the idea of unwinding at all. I'd like to try the simple solution first, which is not to unwind and to identify the few resources which can deadlock and handle them specially. |
And not to mention, even having the possibility that a fiber can "fail to cancel" and keep running is confusing for the programmer, invites edge cases by it's very nature, and requires more careful design and more code. |
I don't think it is an issue if a cancellation might not work out. It is after all a way to try to gracefully stop a fiber. Concurrency in Crystal is cooperative, so it always depends on how other components behave. Being able to cancel a fiber gracefully with a chance of not succeeding is an improvement over not being able to cancel at all. It could very well be used in a way that a fiber is given some time to cancel itself and after that, it might just get hard killed (removed from the Scheduler). Cancellation through an exception is a very simple implementation and IMHO it is sufficient for most use cases, even if it can't guarantee the |
@straight-shoota the problem is not that cancel can fail to gracefully stop a fiber, it's that it can actually accidentally resume a fiber instead. That's the whole point of what i'm saying, a cancel which can fail resumed from a "paused" state (not in the scheduler queue) is useless. It's not an improvement on the current situation at all. |
I'm not sure why this is a problem. The entire point is to not let a waiting fiber hang around until some timeout kicks in - or even indefinitely. When the fiber is resumed it can deal with the cancellation, even if the specific exception doesn't go through. |
If you call |
I don't understand the problem with ensure and that exception. |
@RX14 That's how Pythons cancel implementation works. And I don't think it is wrong:
If |
@asterite |
ccf8c83
to
ab111af
Compare
ab111af
to
0ed9ae1
Compare
What's the state of The initial design looks good. I'll read more carefully later. |
@vinipsmaker you might want to take a look at #6468 |
Context switch is where the fat goes anyway. You want to do progress as much as you can before doing a context-switch. Checking on a bool on context-switch seems rather cheap to me. You might enjoy reading this: https://lwn.net/Articles/763056/
Recently I've implemented a I guess the “bad practice” that you mention may refer to APIs that don't match POSIX semantics enough like Windows' and Java's. In my case, a fiber may not always wake-up on the same thread and there were some kinds of races I had to deal with. My implementation is also more complete with features such as |
Don't. Always do |
Wrong. If there is clean-up code to be executed, you're already scheduling some task to run (i.e. the |
The “it can either try again or do something else” part worries me. It just complicates design (like Java's state machine on interruption API). |
What should happen in this case instead? |
Thanks for your great input on this topic, btw! |
FWIW I don't love this option, it's a great tool for beginners but can introduce subtle bugs (like exceptions being raised inside finalize handlers so they basically only half cleanup, etc...it's a can of worms, and can cause unexpected behaviors, similar to how Timeout.timeout in ruby feels error prone :| |
@rdp A finalizer can only raise if it contains a potential fiber switch. This would most likely be some kind of IO operation which could already raise for a number of other reasons. The alternative to hard-kill a fiber doesn't even give a chance to run any finalizers. |
I'm not sure what you mean? |
Some IO operation, |
Oh, I think my brain did an incorrect autocorrect. @rdp do you mean object finalizers or |
I was referring to ensure blocks (which can be nested, call other methods,
etc.), sorry to not be clear there.
…On Wed, Dec 19, 2018 at 10:14 AM Chris Hobbs ***@***.***> wrote:
Oh, I think my brain did an incorrect autocorrect. @rdp
<https://github.com/rdp> do you mean object finalizers or ensure blocks?
Because the former is already a problem.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6450 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAw0CC-YZSjWG4aI50dDcjN-I8tsC3-ks5u6nQGgaJpZM4ViY-V>
.
|
To proceed with the discussion, we have to establish some goals. What are we trying to achieve and what are the alternatives? A misalignment of expectations will blur the discussion. Thread cancellation API was designed to avoid stalling threads in IO requests. Threads build a composition of operations and threads run independently of each other. Thread T' does not know the state of thread T''. It's important to keep this point in mind. We do not want to require threads to know about each other state that much as it introduces co-dependence between them (i.e. it would require that both threads are coded with each other's code in mind). As an example of co-dependence that I faced recently, I had 2 fibers spawned from one parent fiber per client. Each child fiber accessed data on the stack of the parent fiber. And here I'm using C++, so no GC over values of the closures. Parent fiber just had to wait for each child fiber to finish or else they would access invalid memory leading to crash on best scenario. By that time I hadn't finished my The dependency information applies to the design of When you cancel a thread, you send an async request to cancel the thread. The Linux's programmer manual does a good job at explaining the the function, so I'll just paste relevant parts here:
— pthread_cancel(3) If you want to interrupt a thread, it's a fire-and-forget operation. Just send a cancellation request and later
—
— As for Java, it uses a design where a thread might be interrupted, non-interrupted, and it can go from one state to the other without knowledge of the interrupter thread. The interrupter thread must somehow know implementation details from the other thread (co-dependency again) so it can react to proper ignored cancellation requests and possibly send them again. No sane mind would think this is good design. Now, going back to POSIX, there is more valuable information on Linux's Programmer's Manual:
— pthread_setcanceltype(3) This quote by itself is enough to understand the main problems with cancelability and why And we're running out of POSIX here. POSIX also means C here. And C doesn't have exceptions nor RAII. So they had to come up with their own clean-up procedures to act during a thread cancellation. The whole thing is, as my colleague puts, “a manually implemented RAII” (i.e. destructors placed to be called deterministically at the end of each scope). When you leave the C realm and look at other APIs, this step is usually translated to an exception that is thrown and provokes stack unwinding cleaning any resource. In Java, this exception is As for my design of fibers in C++, As for the alternative, the main alternative I see for Crystal is Go's:
Anyways, what's the most complete documentation circa Crystal's exception model so I can take my time and come back to the discussion later? [1] https://linux.die.net/man/3/pthread_barrier_init |
@vinipsmaker crystal's current exception model is that everything inherits The suggestion to only cancel fibers on suspention points is great! |
That's actually the only feasible way for cancelling fibers. There is no way to preempt a fiber, so it can't be interrupted at any time ("asynchronous" in pthread lingo). The currently scheduling model only allows for "deferred". |
Regarding adding an uncatchable exception, erlang has this via the exit function if the reason given is Also haskell includes a function throwTo that throws an exception to a green thread (the name Just thought i'd mention this, studying other solutions to a problem is the #1 way to get up to speed fast :) |
uncatchable exception I presume still runs ensure blocks? In my head it's still just too much of a danger to developers, it opens a can of worms if you let people get by without having graceful stop points, FWIW... |
Closing stale PR. |
This PR adds a method
Fiber#cancel
which allows to stop a fiber as described in #3561The implementation is pretty simple: Calling
#cancel
sets a flag (@cancel_request
) and enqueues the fiber to resume at the next rescheduling.Fiber#resume
checks (after the stack switch) if the now resuming fiber has been requested to be cancelled. If so, it raisesFiber::CancelledException
which unwinds the stack like any other exception until it reaches therescue
handler in#run
.Resuming the fiber-to-be-cancelled could already happen in
#cancel
itself. It is better to delay the unwinding until the next rescheduling, in order to allow cancelling multiple fibers at once without any unforeseeable context switches while unwinding the stack.Apart from the included specs, I have also tested this with #6385 to resolve the issue described in #6357 (comment) and a proof of concept for a structured concurrency feature (RFC will follow).
An issue worth mentioning is that
CancelledException
is just an ordinary exception and thus it can be rescued from. It would unintentionally be caught by anyrescue
without an explicit type restriction. This is bad because rescuing it somewhere on the unwinding stack can prevents the purpose of cancelling the fiber.In order to solve this,
CancelledException
should not be matched by arescue
without a type restriction. I would recommend a change to the exception hierarchy, inserting a new exception type (for exampleBaseException
) as parent of bothException
andCancelledException
. Untypedrescue
would still catch allException
types, but notCancelledException
. But you can still rescue fromCancelledException
(orBaseException
) if explicitly declared.This is similar to the exception hierarchies of Ruby and Python, both have a type for user exceptions (
StandardError
/Exception
) and above that a base type (Exception
/BaseException
) which is also the parent type of system exceptions (for exampleSystemExit
in both).Fixes #3561