Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dont create a new ticker each loop #1180

Merged
merged 1 commit into from
May 2, 2015
Merged

dont create a new ticker each loop #1180

merged 1 commit into from
May 2, 2015

Conversation

whyrusleeping
Copy link
Member

Well, it seems that my perf kick is turning up more bugs than perf improvements... sigh

But this one is a pretty ugly and subtle one which i believe is the cause of #1036

A ticker starts a timer that continuously sends on the given channel, every loop, a new one was being created.

I wrote the following program and let it run for a few minutes, watching the number of context switches start out around an idle 600/s, slowly rise to over 10k/s before i killed the program, and the number of context switches resumed to the normal 600/s.

package main

import (
    "fmt"
    "time"
)

func main() {
    a := make(chan int)
    for {
        select {
        case <-time.Tick(time.Second / 10):
            fmt.Println("LOOP")
        case <-a:
        }
    }
}

I searched the codebase for any other similar mistakes and found none.

@whyrusleeping whyrusleeping added kind/bug A bug in existing code (including security flaws) topic/perf Performance status/in-progress In progress labels May 2, 2015
@okket
Copy link

okket commented May 2, 2015

Looks good, no ramp after 9h running with this patch (and the oom fix).

interrupts-day

@whyrusleeping
Copy link
Member Author

@okket thank you for verifying!


for {
select {
case <-time.Tick(10 * time.Second):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. should grep the codebase and make sure this doesn't happen elsewhere. Was it just not getting gc-ed? or is there some releasing we need to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched the codebase for any other similar mistakes and found none.

time.Tick creates a channel intended to be used repeatedly, it does some magic runtime stuff and inserts a system timer callback to send on the channel, so it doesnt show up as a goroutine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call http://golang.org/pkg/time/#Ticker.Stop everywhere

jbenet added a commit that referenced this pull request May 2, 2015
dont create a new ticker each loop
@jbenet jbenet merged commit 2d3fb84 into master May 2, 2015
@jbenet jbenet removed the status/in-progress In progress label May 2, 2015
@jbenet jbenet deleted the fix/ticker-waste branch May 2, 2015 18:59
@jbenet
Copy link
Member

jbenet commented May 2, 2015

thanks @okket @whyrusleeping -- this is great

@@ -182,10 +182,11 @@ func (bs *Bitswap) rebroadcastWorker(parent context.Context) {
defer cancel()

broadcastSignal := time.After(rebroadcastDelay.Get())
tick := time.Tick(10 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe:

tick := time.NewTicker(10 * time.Second)
defer tick.Stop()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, not really needed. This is a single goroutine that lasts the entire lifetime of the executable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today.

And not if we start many nodes and tear them down in one binary. Found a good bug. We should finish the fix. Do we lose anything by adding a stop?


Sent from Mailbox

On Sat, May 2, 2015 at 1:50 PM, Jeromy Johnson notifications@github.com
wrote:

@@ -182,10 +182,11 @@ func (bs *Bitswap) rebroadcastWorker(parent context.Context) {
defer cancel()

 broadcastSignal := time.After(rebroadcastDelay.Get())
  • tick := time.Tick(10 * time.Second)
    Eh, not really needed. This is a single goroutine that lasts the entire lifetime of the executable.

Reply to this email directly or view it on GitHub:
https://github.com/ipfs/go-ipfs/pull/1180/files#r29550583

@okket
Copy link

okket commented May 3, 2015

Looks like 9h were not enough, the ramp is back:

interrupts-day 1

Are any of there potential candidates?

https://github.com/ipfs/go-ipfs/search?utf8=✓&q=ticker

@whyrusleeping
Copy link
Member Author

@okket thats weird....is the left side of that graph the run you did to verify the bugfix for me? did you bring the daemon down in between?

@okket
Copy link

okket commented May 3, 2015

I thought I double checked everything, but the timestamp on the binary looks suspicious. I'll try another build and run. Maybe I did something wrong (likely).

@okket
Copy link

okket commented May 4, 2015

Looks like I messed up yesterday. After 14h with https://github.com/ipfs/go-ipfs/tree/fab5e91a93fa83594796478fb7864c8e5b5f0000 no ramp in sight:

interrupts-day

@whyrusleeping
Copy link
Member Author

@okket i'm glad that you were wrong this time, lol 👍

@okket
Copy link

okket commented May 4, 2015

Yeah, sorry for the false alarm. Good that this is finally fixed, thank you for your hard work. Now I can let the node run without getting annoying graphs 😀

@whyrusleeping
Copy link
Member Author

@okket glad to hear it 😄

Definitely let me know if you experience any other weird issues! I wanna see how long we can keep a node online

@jbenet
Copy link
Member

jbenet commented May 4, 2015

Yay, no ramp!

1.5k ctx switches per second prob still something to reduce.


Sent from Mailbox

On Mon, May 4, 2015 at 5:42 AM, Jeromy Johnson notifications@github.com
wrote:

@okket glad to hear it 😄

Definitely let me know if you experience any other weird issues! I wanna see how long we can keep a node online

Reply to this email directly or view it on GitHub:
#1180 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/perf Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants