Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

Moving event handling for external subscribers to use go-events #2747

Merged
merged 4 commits into from Jun 27, 2017

Conversation

nishanttotla
Copy link
Contributor

@nishanttotla nishanttotla commented Jun 22, 2017

Fix #2718

This PR starts to move event handling for external subscribers (API users) to use the battle-tested github.com/docker/go-events repo instead of cluster.EventHandlers defined internally in Swarm. To be more precise, it uses the Queue abstraction defined in the watch package in github.com/docker/swarmkit.

For now, event listening by the watchdog (for rescheduling) continues to be done by the old code.

@nishanttotla
Copy link
Contributor Author

nishanttotla commented Jun 22, 2017

@aaronlehmann @alexmavr this PR is a rough pattern of the changes. Let me know if this is a reasonable way to use the queue. I have let the old event handler stay, because the rescheduling code uses it, but will work on removing it in a follow-up later.

Still remaining:

  • Close the queue. Thinking about the right place to do it (it should be someplace where the API server on the primary manager is shut down)
  • Handle --until
  • Fix race condition
  • Update listener counter
  • Update Mesos integration code to use an events Queue

api/handlers.go Outdated
f.Flush()
}
if err != nil {
log.Debugf("failed to write event to output stream %s", err.Error())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was an error writing to the socket, this code should return from the function. Currently this loop will keep receiving events and trying to write them forever.

@aaronlehmann
Copy link

This looks like the right approach to me.

@nishanttotla nishanttotla force-pushed the events-restructuring branch 11 times, most recently from 69e0eca to cfcf142 Compare June 23, 2017 01:39
@nishanttotla nishanttotla added this to the 1.2.7 milestone Jun 23, 2017
@nishanttotla nishanttotla force-pushed the events-restructuring branch 2 times, most recently from ef239b5 to 26d10bd Compare June 23, 2017 17:30
@nishanttotla nishanttotla changed the title [WIP] Moving event handling for external subscribers to use go-events Moving event handling for external subscribers to use go-events Jun 23, 2017
@nishanttotla
Copy link
Contributor Author

@aaronlehmann @alexmavr I've updated the PR, and fixed all outstanding issues, barring one (getting an accurate counter for the number of listeners, which at this point will return a lower estimate).

For reference and help with reviewing, the main event handler code is in handlers.go: https://github.com/docker/swarm/pull/2747/files#diff-379151c15066f365334797e959e6ea94R893

The watch queue itself and related functions are part of cluster.go:
https://github.com/docker/swarm/pull/2747/files#diff-423777b5183d89a8b517694ebcebfd6a

The rest of it should be hopefully straightforward. Also ping @dhiltgen @wsong @aluzzardi

api/events.go Outdated
// remove this hack once 1.10 is broadly adopted
from := e.From
e.From = e.From + " node:" + e.Engine.Name

// Lock access to the attributes map
e.Lock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this operate on a copy of e, so that the event itself (which is shared between all the subscribers) is not modified by each subscriber?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann yeah, I intended to do that, but must have forgotten. I think doing a deepcopy of the event is the way to go.

api/handlers.go Outdated
defer cancelFunc()

// create timer for --until
timer := time.NewTimer(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't create the timer in the case where --until is not specified. It would fire immediately, and the select would always be able to take the <-timer.C branch, hogging CPU.

How about something like this:

var (
        timer *time.Timer
        timerCh <-chan time.Time
)
if until > 0 {
        dur := time.Unix(until, 0).Sub(time.Now())
        timer = time.NewTimer(dur)
        timerCh = timer.C
}

Then have a case <-timerCh: in the select below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better idea.

api/handlers.go Outdated
if e, ok := eChan.(*cluster.Event); ok {
if data, err := normalizeEvent(e); err != nil {
return
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need an else because of the return above.

api/handlers.go Outdated

for {
select {
case eChan := <-eventsChan:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is eChan not an event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eChan has an interface type, which we cast to *cluster.Event below.

api/handlers.go Outdated
if err != nil {
return
}
_, err = fmt.Fprint(w, string(data))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not write directly here, rather than reallocate the data as a string? w.Write(data) should be sufficient.

}
var closeNotify <-chan bool
if closeNotifier, ok := w.(http.CloseNotifier); ok {
closeNotify = closeNotifier.CloseNotify()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on go version, you can now use r.Context().Done() to listen to this cancellation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the request context is set correctly in all cases in Swarm, some of the code is from before contexts were introduced in Go. I'll check.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be from net/http, so should be wired to the CloseNotifier.

api/handlers.go Outdated

for {
select {
case eChan := <-eventsChan:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann Does this need "closed" detection? For example, will the watch ever shutdown this channel?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it will not, but moby/swarmkit#2285 adds this as an option. So it probably makes sense to add the check here in anticipation of using that later.

@@ -168,12 +170,20 @@ func NewCluster(scheduler *scheduler.Scheduler, TLSConfig *tls.Config, master st

// Handle callbacks for the events
func (c *Cluster) Handle(e *cluster.Event) error {
// publish event to the watchQueue for external subscribers
c.watchQueue.Publish(e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we adapt this to the eventHandlers list? I am not sure that making this dependent on the watch queue makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe I agree, but I also think that the eventual queue we will used will be moved upstream to go-events. This is supposed to be a temporary fix.

eventHandlers will go away actually, and all events should be handled using the watchQueue. For now though, some internal rescheduling functionality uses events which I did not want to mess with in this PR, that's why we have this case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishanttotla That still doesn't explain why this isn't just a handler. You can do this with a closure to adapt Publish to Handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe not sure I follow exactly here. What do you envision this handler to look like?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean: just make an event handler. I am not sure how this got merged in its current state. It just doesn't make any sense to have both a handler set and a publish set. I'd recommend cleaning this up before it gets too far. It is going to be a pain to remove later and will confuse future contributors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clean that up soon. The handler will entirely go away as it is, and only the publish set will remain.

@nishanttotla nishanttotla force-pushed the events-restructuring branch 3 times, most recently from c326fc7 to 1ca4e7d Compare June 23, 2017 22:48
api/primary.go Outdated
// eventsQueue uses the watch package from SwarmKit, which is based on
// the go-events package. See https://github.com/docker/swarm/issues/2718
// for context
eventsQueue := watch.NewQueue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that moby/swarmkit#2285 is merged, let's revendor swarmkit and change this constructor to:

NewQueue(WithTimeout(10 * time.Second), WithLimit(10000), WithCloseOutChan())

We probably want to define these values as config vars and plumb them through the standard config model used by classic swarm.
Assuming 1KB per event (gross overestimation), the size limit of 10000 for the queue should occupy no more than 10MB per listener in memory. For the edge case of 100 simultaneous inefficient listeners, this would lead up to to an additional 1 GB of memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexmavr done. PTAL.

@nishanttotla nishanttotla force-pushed the events-restructuring branch 2 times, most recently from a04af50 to 0284f9a Compare June 24, 2017 00:48
…0a8ca812

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
@nishanttotla nishanttotla force-pushed the events-restructuring branch 2 times, most recently from 377d88c to 810b74b Compare June 24, 2017 00:51
if !ok {
break
}
data, err := normalizeEvent(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider event filtering here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexmavr maybe we could do it as a follow up?

37d35add5005832485c0225ec870121b78fcff1c

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
…er/go-events

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
@nishanttotla nishanttotla force-pushed the events-restructuring branch 3 times, most recently from 89897a1 to ba7fd94 Compare June 26, 2017 22:58
Signed-off-by: Nishant Totla <nishanttotla@gmail.com>
@nishanttotla
Copy link
Contributor Author

@alexmavr added a counter to track number of active listeners, PTAL.

@aluzzardi
Copy link
Contributor

/cc @wsong for review

@nishanttotla
Copy link
Contributor Author

Merging this now, since it's been reviewed and it worked in testing, both manual and docker-e2e. Any downstream issues can be fixed in a follow-up.

@nishanttotla nishanttotla merged commit c1b0b7b into docker-archive:master Jun 27, 2017
@nishanttotla nishanttotla deleted the events-restructuring branch June 27, 2017 05:54
@stevvooe
Copy link

@nishanttotla I'd recommend taking time to cleanup the double handler logic. That will result in a bug in the future if one adds some to the task queue and the handler set.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants