New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Handle build_wait_timeout >= 0 consistently #2141
[WIP] Handle build_wait_timeout >= 0 consistently #2141
Conversation
By analyzing the blame information on this pull request, we identified @tomprince, @djmitche and @rutsky to be potential reviewers |
I don't think this a correct fix. I think the intent of |
Hmm, that's definitely not what I want then. Since it's very important for us that we only have single use workers. Would a |
Current coverage is
|
I'm not sure. The code here is somewhat subtle and I suspect has a number of undocumented dependencies and interactions. I don't think I would be comfortable with any changes that don't explain why the change is correct. It certainly isn't clear to me that this is any better (or worse) than the existing code. |
K, I'm taking a slightly different approach which might be a little more clear and direct, which is basically to ensure that a worker is disconnected after calling d.insubstantiate(). For some reason, our EC2 workers aren't disconnecting properly with the code that's in master as of 20cac3f To reproduce our issue:
|
@@ -781,6 +781,8 @@ def buildFinished(self, sb): | |||
if not self.building: | |||
if self.build_wait_timeout == 0: | |||
d = self.insubstantiate() | |||
d.addCallback(lambda _: | |||
AbstractWorker.disconnect(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would believe insubstantiate is supposed to do the diconnection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something insubstantiate should always do, or should it be a kwarg defaulting to False?
I didn't put it in insubstantiate, because I wasn't sure on the intended usage of the method and didn't want to break something else unintentionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear that anybody actually knows what any of this is supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me to be added to insubstantiate, just not 100% sure if it should be kwarg flagged or not.
Also, I noticed insubstantiate() is calling self.botmaster.maybeStartBuildsForWorker(self.name)
at the end, and we turn around and call that method again here. If we move the disconnect logic to insubstantiate can we drop the call to self.botmaster.maybeStartBuildsForWorker(self.workername)
from this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me to be added to insubstantiate, [...]
It might. But I'd be inclined to reject any change to this code that is "I think this makes sense ..." instead of "This makes sense because a) ... b) ... c) ...". I suspect the code is in the state that it is largely because of changes like the former.
9f0d091
to
96c4fbb
Compare
For what it's worth, I've updated this pull request with what we're currently using in our CI environment. Without this change build_wait_timeout=0 is causing instances to terminate but not disconnect or ever relaunch so our server just hangs. |
@aelsabbahy How can the instance terminate with the tcp connection being closed as well? |
@aelsabbahy ping |
Sorry for the late response, I think this PR can be closed out all together. I believe some of the changes @tomprince made conflicted with the changes here. There were some initial discussions and @tomprince had some thoughts on how to handle this use-case properly/thoroughly, but I'm not 100% sure if any action was taken since then. |
WIP, looking for feedback on approach.
Fixes #3528