-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate limit concurrent peer dials #1802
Conversation
@@ -44,6 +44,9 @@ var ( | |||
// add loop back in Dial(.) | |||
const dialAttempts = 1 | |||
|
|||
// dial at most 12 peers concurrently | |||
const concurrentPeerDials = 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose 12 here because 12*10 (conccurent dials per peer) * 2 (reuseport) is around 1/4 of the available file descriptors on linux, and is still less than the 256 limit on OSX. Although maybe its a little too close to the OSX limit. Maybe we should set it at runtime with an OS check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably bump the 10
factor to 5
. i worry about putting it lower than that, as otherwise we wont have the chance to prefer local addresses. once we have good estimators, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, 12
max concurrent dials is a bit rough. Maybe this should only be a limit for socket-consuming transports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats really hard to tell at this point. The lock doesnt specify whats going to be dialed, just who.
How about dropping the per peer concurrency to 6 and raising the number of peers to 15? That puts us at a limit of 180 fd's (at least until we fix epoll)
LGTM, though maybe:
Also, i believe that if you fix the reuseport epoll + kqueue fd screwup, that halves the used fds |
82084fe
to
2787ba0
Compare
rate limiting fd-per-conn dials will be a lot of work, and we dont have any such transports yet. I think its best to push this until a little later. |
the point is that we don't need to rate limit utp and we do need to rate limit tcp — On Tue, Oct 6, 2015 at 10:05 AM, Jeromy Johnson notifications@github.com
|
Yeah, but |
Yep, that's why I meant by harder :) |
(Think we should do this-- the limit will hurt) |
Good side effect: utp addresses much more likey to be chosen |
2787ba0
to
5e5967e
Compare
default: | ||
if conn.IsTcpMultiaddr(remoteAddrs[i]) { | ||
// For addresses that require an FD to dial | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ugly because you cant really do conditional select cases...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, this fixes it all for you:
func (s *Swarm) addrDialRateLimit(a ma.Multiaddr) (chan<- struct{}) {
if isFDCostlyTransport(a) {
return s.fdRateLimit
}
// do not rate limit it at all
return make(chan struct{}, 1)
}
// and in here:
for _, i := range rand.Perm(len(remoteAddrs)) {
select {
case <-foundConn: // if one of them succeeded already
break
case <-worker.Closing(): // our context was cancelled
break
default:
}
workerAddr := remoteAddrs[i] // shadow variable to avoid race
// we have to do the waiting concurrently because there are addrs
// that SHOULD NOT be rate limited (utp), nor blocked by other
// rate limited addrs (tcp).
// we need to call limiter.Go as required by goproc/limiter semantics.
// note: limiter.Go is not limiter.LimitedGo.
limiter.Go(func(p process.Process) {
// returns whatever ratelimiting is acceptable for workerAddr.
// may not rate limit at all.
rl := s.addrDialRateLimit(workerAddr)
rl<- struct{}{}
limiter.LimitedGo(func(worker process.Process) {
dialSingleAddr(workerAddr)
})
<-rl
})
}
@jbenet alright, we do the thing now. |
RFCR/M |
func IsTcpMultiaddr(a ma.Multiaddr) bool { | ||
p := a.Protocols() | ||
return len(p) == 2 && (p[0].Name == "ip4" || p[0].Name == "ip6") && p[1].Name == "tcp" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should have:
func isFDCostlyTransport(a ma.Multiaddr) bool {
return isTCPMultiaddr(a)
}
This is very weird: https://gist.github.com/jbenet/d79f4b9cfcfeea48363e |
License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>
5e5967e
to
1c2223d
Compare
@whyrusleeping i pushed here -- fixed it for you i think 1c2223d except for whatever is going on in #1802 (comment) |
@jbenet what changed in that commit? |
also, that gist is quite weird. But I doubt that "what weve been seeing isnt too many fd's", I've reproduced the error many times and confirmed the number of fd's on the system (yay for filesystem notifications on the /proc/pid/fd directories). what did you repro that on? your laptop? does that happen every time? |
@jbenet i cant repro the thing you posted. Was it a fluke? Or can you repro reliably? |
made my roommate test this on OSX for me, looks like its an OSX issue. It also happens on master. |
There were the following issues with your Pull Request
Guidelines and a script are available to help. Your feedback on GitCop is welcome on this issue. This message was auto-generated by https://gitcop.com |
@GitCop killjoy. |
License: MIT Signed-off-by: Jeromy Johnson <why@ipfs.io>
f1f068d
to
c1c9a74
Compare
@jbenet that should fix your issue. |
rate limit concurrent peer dials
This should prevent the crashes when adding large files and directories.
addresses #1654 and other similar issues.
License: MIT
Signed-off-by: Jeromy jeromyj@gmail.com