Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental concurrent add #1324

Closed
wants to merge 1 commit into from

Conversation

barnacs
Copy link

@barnacs barnacs commented Jun 2, 2015

This PR is an experimental concurrent pipeline for "ipfs add"
The idea is to decouple the various cpu / io intensive parts of storing a block into separate stages with a configurable number of workers for each stage.

I would appreciate if some people with different os/fs/physical storage could benchmark this by adding a single large file (~500-1000 MB) and time it against master.
If it turns out to generally improve performance, the implementation could be cleaned up and extended to handle each add operation rather than individual files.

My results on linux/ext4/single hdd (626MB):
master:

% time ipfs add archlinux-2015.05.01-dual.iso 
added Qmb1cWYzczBZz4d8HoSmxhU9GY8QuVq7sUvD6ZCQBkvmU2 archlinux-2015.05.01-dual.iso
ipfs add archlinux-2015.05.01-dual.iso  0.62s user 1.10s system 0% cpu 3:08.60 total

concurrent:

% time ipfs add archlinux-2015.05.01-dual.iso
added Qmb1cWYzczBZz4d8HoSmxhU9GY8QuVq7sUvD6ZCQBkvmU2 archlinux-2015.05.01-dual.iso
ipfs add archlinux-2015.05.01-dual.iso  0.34s user 0.80s system 4% cpu 26.576 total

@jbenet jbenet added the backlog label Jun 2, 2015
@whyrusleeping
Copy link
Member

@barnacs i'll try it out in a sec, but do you think you could try re-applying these changes on our latest master? (git rebase) there have been a few other perf changes to the dagbuilderhelper code in the meantime

@whyrusleeping
Copy link
Member

tried it out, thats really fast, good work!

Verified

This commit was signed with the committer’s verified signature.
4yman-0 4yman
@barnacs barnacs force-pushed the experimental/add-pipeline branch from 2f86e79 to 4c415bd Compare June 3, 2015 01:39
@jbenet
Copy link
Member

jbenet commented Jun 3, 2015

+1 to this PR.

I would appreciate if some people with different os/fs/physical storage could benchmark this by adding a single large file (~500-1000 MB) and time it against master.

I would love to have a benchmark suite and a bunch of workers like in http://build.golang.org/ that can test it out across different "platform configs" (where that here includes choice of fs / devices)

pipeline: make(chan *dag.Node),
Error: make(chan error),
}
dbh.init()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, move the entire construction into a

return NewDagBuilderHelper(dbp.Dagserv, in, dbp.Maxlinks, ncb)

where

func NewDagBuilderHelper(...) {
  db := &DagBuilderHelper{
    dserv:    dbp.Dagserv,        dserv:    dbp.Dagserv,
    in:       in,       in:       in,
    maxlinks: dbp.Maxlinks,       maxlinks: dbp.Maxlinks,
    ncb:      ncb,        ncb:      ncb,
    pipeline: make(chan *dag.Node),
    Error:    make(chan error),
  }

  hash := hasher()
  store := storeWorker(db.dserv)
  pin := pinner(db.ncb)

  hashedch := make(chan *dag.Node)
  storedch := make(chan *dag.Node)

  startWorkers(hash, hashWorkers, db.pipeline, hashedch, db.Error)
  startWorkers(store, storeWorkers, hashedch, storedch, db.Error)
  startWorkers(pin, pinWorkers, storedch, nil, db.Error)

  return db
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we might as well not have the params struct.

@jbenet
Copy link
Member

jbenet commented Jun 3, 2015

@barnacs this is great! I think lots of go-ipfs can have this sort of pipeline-- we haven't really optimized lots of the codebase.

Made some comments above-- pls take a look

  • NewDagBuilderHelper
  • early return from pipeline
  • shut down other workers on error
  • pipeline/err select

Sorry, something went wrong.


const (
hashWorkers = 1
storeWorkers = 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to bet that if we keep all of these at 1, we will still see significant perf improvements, and we will get slightly cleaner code with not having to worry about multiple different workers.

@whyrusleeping
Copy link
Member

A few of the go tests fail. looks to be related.

@whyrusleeping
Copy link
Member

Alright, a little more detail on my benchmarks:

adding a file of size 1000MB, entirely random, no duplicate chunks on
a fresh .ipfs folder each run.

on current master (18297e2dc9367e43cae9fff187d8c1c78c6020e6):
51.53/52.87/51.79, av = 52.06

on your branch (cb2e4ae57da0d1d59c9f0b577f806f29f7a10ca1)
53.10/62.55/44.09, av = 53.24

on your branch with storeWorkers = 1
49.14/44.98/47.64, av = 47.25

I ran a few tests with storeWorkers in the range of 2-10, and they all had considerably worse performance. Not sure why that happened. Also, running your branch vanilla i did get a bunch of other numbers that i threw out because they appeared to be outliers, 26 seconds, 33 seconds, 95 seconds, and another one took a whopping three minutes before i killed the operation.

This was run on a macbook pro with a PCIe SSD, R/W speeds are generally 700MB/s. and my filesystem is ext4

One other note: on your branch, with storeWorkers equal to the default 50, the first 400-500MB of the file flew by in about a second. the next 400MB or so took about five seconds, and the last 100MB took quite a while.

@barnacs
Copy link
Author

barnacs commented Jun 3, 2015

@jbenet Thanks for your comments.
I guess I should have emphasized more that this is not a decent implementation at this point (breaks tricke dag building, errors might leak goroutines, cause deadlocks, kill your cat, etc). I was primarily looking for performance feedback to see if it's worth pursuing at all and since people weren't very responsive on IRC I made a it PR.

@barnacs
Copy link
Author

barnacs commented Jun 3, 2015

@whyrusleeping that's interesting. I like the 1000MB random file approach; my results using that (linux/ext4/SATA 3 hdd):

master
4:59.75 / 4:27.19 / 4:24.74

this branch with storeWorkers = 1
5:02.48 / 5:02.08 / 5:03.67

this branch with storeWorkers = 50
30.187 / 32.616 / 31.187

this branch with storeWorkers = 500
19.290 / 20.076 / 19.729

@jbenet
Copy link
Member

jbenet commented Jun 3, 2015

@barnacs yeah PRs are good, sorry for lack of responsiveness on IRC. github always better for collab things. and totally fine to PR as suggestion/feedback/discussion

@jbenet
Copy link
Member

jbenet commented Jun 12, 2015

@barnacs would love to see this type of thing land, have any time soon?

@barnacs
Copy link
Author

barnacs commented Jun 22, 2015

oops, sorry, I forgot about this

Actually, the performance boost here seems to be about throwing a lot of concurrent writes at the kernel. I guess mine happened to optimize them, but as @whyrusleeping's results show it really depends on your setup and may even slow things down.

I guess the only viable option is then to reconsider durability guarantees and ease up on sync() calls. Maybe expose a sync call explicitly at the lowest data storage level but then do you propagate that all the way up to dag services and change all call sites?
Or maybe the file system storage implementation could simply hide it by calling sync a few milliseconds after the last write (further writes could reset the timer).
Or let the kernel sync whenever? And potentially lose the data after returning succesfully from an add operation?

Either way, I don't think the approach of this PR is the right one and would result in erratic behaviour so feel free to just close it.

@whyrusleeping
Copy link
Member

@barnacs thanks a bunch for bringing this up, it gave a good view into our performance under various loads. I do think that getting the adds out of the main path will be good, maybe having a single extra worker to do the dagservice.Add in its own goroutine.

@davidar davidar mentioned this pull request Aug 25, 2015
12 tasks
@davidar
Copy link
Member

davidar commented Aug 27, 2015

I guess the only viable option is then to reconsider durability guarantees and ease up on sync() calls.

In the meantime, would it be possible to add a --nosync option to ipfs add that just disables sync calls? You could add a warning that using it may result in inconsistencies in the blockstore, so people know the risks.

@whyrusleeping
Copy link
Member

@davidar that would work for offline adds, but if youre doing it with the daemon online, you would have the start the daemon with such an option. I think the ideal stopgap would be to add a nosync field to the config for the datastore in .ipfs/config

@whyrusleeping
Copy link
Member

@davidar do you mind filing a bug as a feature request for this?

@davidar davidar mentioned this pull request Aug 28, 2015
@whyrusleeping
Copy link
Member

closing, old PR cleanup time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants