Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: separate private and public blocks #1538

Closed
wants to merge 56 commits into from
Closed

Conversation

whyrusleeping
Copy link
Member

Use a different datastore prefix to keep private and public blocks separate. Abstract this upwards into a private blockstore and a private dagservice.

License: MIT
Signed-off-by: Jeromy jeromyj@gmail.com

@jbenet jbenet added the status/in-progress In progress label Jul 29, 2015
@jbenet
Copy link
Member

jbenet commented Jul 29, 2015

08:19 <•whyrusleeping> jbenet: thats the gist of it
08:19 <•whyrusleeping> doing this also makes me *really* want to work on unifying daemon construction
08:24 <•jbenet> yeah, to be totally blunt... do not want. it's going to be a mess knowing where to put what, and carrying 2x of each everywhere. let alone that this is just binary, it has no way of growing towards the middle gray areas of serving specific things to specific authed groups. i think coloring (marking things that are allowed or denied) is a much simpler
08:24 <•jbenet> approach. it can be made fast in the exact same way (you can use a datastore to get O(1) access to the perms on any given object)
08:25 → pfraze joined (~pfraze@2605:6000:e900:d200:a8eb:d04e:3359:967e)
08:25 <•whyrusleeping> thats going to suck...
08:28 <•jbenet> .... because?
08:29 <•whyrusleeping> then we have to change all the interfaces to support putting permissions with each block stored
08:29 <•whyrusleeping> the possibility of making a mistake goes up much more
08:29 <•whyrusleeping> and things become a lot slower because now you have 2x the disk hits and latency for each block
08:29 <•jbenet> no, not at all, why? permissions are only relevant on a small set of cases: when an object is externalized
08:29 <•whyrusleeping> every write is now two writes
08:30 <•whyrusleeping> every read is now two reads
08:30 → niran joined (sid3808@gateway/web/irccloud.com/x-oexerhbkbrozumye)
08:30 <•jbenet> "every write is now two writes" not at all. writing perms and writing objects is correlated once.
08:31 <•jbenet> caps are small, can likely be kept in memory (or lots of it)
08:32 <•jbenet> "the possibility of making a mistake goes up much more" this applies just as much to jugging two dagservs/blockstores/.... all over the place. these are different approaches with different things to watch out for. 
08:33 <•jbenet> i think instead of being tied to specific solutions, we should be picking based on the problems we need to solve, which include authentication/capability sets.
08:34 <•jbenet> in fact, can just use caps for everything, store sensitive blocks encrypted, whoever presents the right cap can get something, whoever has the right decrypt cap can decrypt.
08:35  → voxelot joined  
08:35 <•jbenet> zero bookkeeping, as the cap itself used to retrieve the object is the "permission".
08:44 <•jbenet> anyway, let's keep discussing on gh. this is not pressing for 0.3.6

@whyrusleeping
Copy link
Member Author

So would you want something like this?

dagserv.Add(node, flags.Private)

or what would the interface for what youre thinking look like?

@whyrusleeping
Copy link
Member Author

I'm thinking that the actual put call by the blockstore can write a small header before the blocks, including a permissions/capabilities info, as well as a timestamp. Then consumers of the blockstore can get a handle to the blockstore with a given capabilities set, and can only read blocks that match their caps, and all blocks they write would be written with their caps, querying that blockstore handle would only return blocks with matching caps, etc.

The timestamp allows us to ignore recently added blocks during a GC, which is going to be very important to avoid randomly breaking user operations with a rouge GC. The downside here is that each subsequent write of a given block would have to write the entire block again to update the timestamp (since the datastore interface doesnt support partial writes).

@whyrusleeping
Copy link
Member Author

@jbenet re: our discussion yesterday, you wanna CR this?

Blocks *bserv.BlockService // the block service, get/add blocks.

DAG merkledag.DAGService // the merkle dag service, get/add objects.
PrivDAG merkledag.DAGService // the private merkledag service for local objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe:

NetworkDAG
NodeStateDAG

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataDAG
StateDAG

@jbenet
Copy link
Member

jbenet commented Aug 13, 2015

I'm not certain that this works for pinning + gc. can you outline the logic flow of how a GC operation would work now, given the pinset and the objects are in two different dagservices with two different GCBlockstores?

@whyrusleeping
Copy link
Member Author

We're just going to have to run two separate GC operations, one on the normal blockstore, and one on the private blockstore.

@whyrusleeping
Copy link
Member Author

@jbenet i'm trying to determine the best places to take the private blockstores GC locks now.

@whyrusleeping
Copy link
Member Author

@jbenet I think the simplest thing to do would be to always lock both blockstores. at which point we might as well just have one lock. This is tricky.

@jbenet jbenet mentioned this pull request Aug 22, 2015
16 tasks
@jbenet
Copy link
Member

jbenet commented Aug 23, 2015

@jbenet I think the simplest thing to do would be to always lock both blockstores. at which point we might as well just have one lock. This is tricky.

Yeah, this is the sort of trickiness i worried about.

@@ -21,6 +21,8 @@ var log = eventlog.Logger("blockstore")
// BlockPrefix namespaces blockstore datastores
var BlockPrefix = ds.NewKey("blocks")

var PrivateBlockPrefix = ds.NewKey("private")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename these:

StateBlocks -- the state of the node
DataBlocks -- the data of the node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fine to keep DataBlocks at /blocks to avoid a migration there

@jbenet
Copy link
Member

jbenet commented Aug 23, 2015

@whyrusleeping confirm this is how this is supposed to work:

  • the pinset is stored in the StateDAG, not in the DataDAG.
  • the pin leaves point to objects either in the StateDAG or in the DataDAG.
  • the GC function is called for both DAGs
  • the GC function calls ColoredSet(pinSet, dagToGC) -- https://github.com/ipfs/go-ipfs/blob/private-blocks/pin/gc/gc.go#L35
  • ColoredSet calls Descendants(dag, keyset, roots) -- https://github.com/ipfs/go-ipfs/blob/private-blocks/pin/gc/gc.go#L97
  • Descendants adds all the descendants of given roots present in the given dag (i.e. if not present, nothing happens?)
  • thus, the ColoredSet for either DAG consists only of nodes from that DAG which have roots in the pinset (which may overlap).
  • things not in the ColoredSet are removed, as usual.

Seems to work. Need to test the hell out of this.

need one test that:

  • generates a bunch of dags,
  • puts them in either dagset randomly,
  • pins some nodes recursively at random (without discriminating where they are)
  • calls GarbageCollect() (GC both)
  • ensures the right objects are still in the right DAGs.

(this will likely need to be in Go)

@jbenet
Copy link
Member

jbenet commented Aug 23, 2015

Ok, i think this will work, but

  • needs confirmation above
  • needs srs tests
  • needs rename (state and data)

Sorry, something went wrong.

@whyrusleeping
Copy link
Member Author

@jbenet i think this is ready for another look over (when you have time)

This used to lead to large refcount numbers, causing Flush to create a
lot of IPFS objects, and merkledag to consume tens of gigabytes of
RAM.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
OS X sed is documented as "-i SUFFIX", GNU sed as "-iSUFFIX". The one
consistent case seems to be "-iSUFFIX", where suffix cannot empty (or
OS X will parse the next argument as the suffix).

This used to leave around files named `refsout=` on Linux, and was
just confusing.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
Without this, all entries will have nlink==0, which confuses a bunch
of tools. Most dramatically, systemd-nspawn enters a busy loop in its
lock utility function.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
This used to cause files e.g. being edited with `vi` to become 0-size.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
Callers assume this is safe to call whenever, let's make it so.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>

run GC over internal blockstore as well

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>

rename blockstores to Data and State

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>

fix up garbage collection and add a test

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>

separate flatfs

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@whyrusleeping
Copy link
Member Author

@jbenet CR when you can.

@jbenet
Copy link
Member

jbenet commented Sep 16, 2015

at first pass it looks fine. i'll do a deeper look later today.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
u "github.com/ipfs/go-ipfs/util"
)

func TestBasicSeparation(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the sort of guarantees we expect from this separation, we'll probably want some sharness tests too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres not really any way of knowing any of the pinners internal hashes through the API, but we can check 'ipfs refs local' to make sure that the only entries there are actual data. sound alright?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some blocks should be internal only, like for example keys. could add the private key or something.

sure that's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we migrate keys to the blockstore we can have a test like that, but we arent there yet.

@jbenet
Copy link
Member

jbenet commented Sep 18, 2015

  • needs sharness tests verifying the expected security properties.
  • also tests with 2 nodes, with one node trying to get private objects from the other.

I don't feel confident in this. I'm not sure that this is actually going to do what we expect it to. It seems right to me, but i'm not sure about all the corner cases in which expectations may be incorrect. Up to @whyrusleeping

Sorry, something went wrong.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@whyrusleeping
Copy link
Member Author

@jbenet I'm really not sure how I could possibly write a sharness test for this, outside of the code, theres no way to know what the keys of the 'private objects' are, they wont show up in 'ipfs refs local'. The tests I have in core/corerepo/gc_test.go construct a node the same way a daemon would, and they work as expected.

@jbenet
Copy link
Member

jbenet commented Sep 22, 2015

Fine let's wait until keystore is implemented 


Sent from Mailbox

On Mon, Sep 21, 2015 at 6:25 PM, Jeromy Johnson notifications@github.com
wrote:

@jbenet I'm really not sure how I could possibly write a sharness test for this, outside of the code, theres no way to know what the keys of the 'private objects' are, they wont show up in 'ipfs refs local'. The tests I have in core/corerepo/gc_test.go construct a node the same way a daemon would, and they work as expected.

Reply to this email directly or view it on GitHub:
#1538 (comment)

@RichardLitt RichardLitt mentioned this pull request Sep 23, 2015
65 tasks
@whyrusleeping whyrusleeping force-pushed the dev0.4.0 branch 3 times, most recently from 764bef9 to 1bbc472 Compare November 11, 2015 18:04
@ghost ghost added the topic/encryption Topic encryption label Dec 22, 2015
@whyrusleeping whyrusleeping force-pushed the dev0.4.0 branch 2 times, most recently from 68b9745 to b0a8591 Compare December 28, 2015 14:36
@Kubuxu Kubuxu closed this May 13, 2016
@Kubuxu Kubuxu deleted the private-blocks branch February 27, 2017 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/in-progress In progress topic/encryption Topic encryption
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants