Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add IPFS to Nix #1167

Closed
wants to merge 4 commits into from
Closed

RFC: Add IPFS to Nix #1167

wants to merge 4 commits into from

Conversation

mguentner
Copy link

@mguentner mguentner commented Dec 29, 2016

This adds IPFS support to Nix. 🚀
It adds .nar files to IPFS aswell, and writes the resulting hash to the .narinfo and signs the results.
When the .narinfo is accessed, the .nar can be fetched from IPFS instead of using the HTTP method.

Please have a look at a nixops hydra setup where this is explained in more detail.
https://github.com/mguentner/nix-ipfs

This is a proof of concept. More code will follow once the design is approved and finished.

Ref: #859
Ref: ipfs/notes#51

adds support to 'cat' and 'add' .nar files to IPFS.
If IPFS should be used to fetch .nar files without
using the API interface, a gateway can be used
aswell. Adding file through a gateway is not possible.

Signed-off-by: Maximilian Güntner <code@klandest.in>
Signed-off-by: Maximilian Güntner <code@klandest.in>
@wscott
Copy link

wscott commented Jan 1, 2017

Is the .nar file compressed? If not, you might want to enable the rabin-fingerprint chunker when writing the .nar files. This will allow deduplication of identical files inside multiple archives. I don't think IPFS uses rabin by default yet.

From the command-line that is done with 'ifps add --chunker rabin FILE'.
You might also want to tweak the params to use a large blocksize.

Even compressed can work if you use a 'rsync-able' compression algorithm.

@mguentner
Copy link
Author

@wscott : The .nar files are compressed according to the compression option. So in order to have best results this should be set to compression=none when creating the binary cache/nar files.
The --chunker option looks interesting, I will look into it.

@@ -4,14 +4,48 @@ libstore_NAME = libnixstore

libstore_DIR := $(d)

libstore_SOURCES := $(wildcard $(d)/*.cc)
libstore_SOURCES := \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is wildcard not used?

Copy link
Author

@mguentner mguentner Jan 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the additional IPFS sources are the second config dependent input for libstore I wanted to make a clean solution instead of following s3-binary-cache-store.cc (being the first one) with adding a lot of #if ENABLE to the source files. That way only sources are compiled and linked that the config requires. Makes the build processes a bit cleaner and easier to debug.

Signed-off-by: Maximilian Güntner <code@klandest.in>
Signed-off-by: Maximilian Güntner <code@klandest.in>
@mguentner
Copy link
Author

Publishing to IPFS is now optional (default off/disabled). An example of how to generate a uncompressed binary cache looks like this:

nix copy --to file:///var/www/example.org/cache?secret-key=/etc/nix/hydra.example.org-1/secret\&compression=none\&publish-to-ipfs=1 -r /nix/store/wkhdf9jinag5750mqlax6z2zbwhqb76n-hello-2.10/

@mguentner
Copy link
Author

Added for future reference.

The .narinfo is fingerprinted and signed. The fingerprint currently includes
this information

std::string ValidPathInfo::fingerprint() const
{
    if (narSize == 0 || !narHash)
        throw Error(format("cannot calculate fingerprint of path ‘%s’ because its size/hash is not known")
            % path);
    return
        "1;" + path + ";"
        + printHashType(narHash.type) + ":" + printHash32(narHash) + ";"
        + std::to_string(narSize) + ";"
        + concatStringsSep(",", references);
}

[1]

So the IPFSHash is signed indirectly through the narHash since this will be compared to the hash
of the result of the IPFS download (i.e. the downloaded data is not validated until it has been fetched
completely). However the current design provides that the .narinfo is fetched from a trusted source (i.e.
cache.nixos.org using TLS).

[1] From: https://github.com/NixOS/nix/blob/master/src/libstore/store-api.cc#L523

@copumpkin
Copy link
Member

cc @edolstra @shlevy

@shlevy
Copy link
Member

shlevy commented Jan 10, 2017

Can you write up a quick summary of what IPFS is and why I should care?

@Mic92
Copy link
Member

Mic92 commented Jan 18, 2017

@shlevy I cannot speak for @mguentner, but @vcunat provide some motivation for IPFS

@veprbl
Copy link
Member

veprbl commented Jan 18, 2017

I haven't been carefully tracking IPFS threads, but since nobody else answered I will pile my thoughts here, and people will correct me.

As far as I understand IPFS can make a global storage for NAR's so people can choose to host their builds or cache builds from others. This can potentially unload some bandwidth load from the Hydra S3.

Some enthusiasts could build and share things that are not currently being build by Hydra like python packages. This is already possible now, but it requires doing two things:

  1. Establishing a source (url of "nix-serve" instance)
  2. Establishing trust (NAR signing, SSL)

IPFS could eliminate the first step since the namespace now becomes global and NAR's could probably be discovered through IPLD. The current implementation doesn't do that because it requires IPFS address to be served with nar-info. But the distributed NAR hosting should work already.

There was also a discussion about implementing file or chunk deduplication over IPFS which could have potential for reducing sizes of things. Is this supposed to happen for download size or the size on the disk? I don't know.

Anything written above might be wrong. I don't claim that I possess excessive knowledge on the topic discussed. Please, don't get angry :)

@mguentner
Copy link
Author

mguentner commented Jan 19, 2017

I wrote an article which explains why IPFS could be useful for NixOS.
That gives some answers to @shlevy's question:
https://sourcediver.org/blog/2017/01/18/distributing-nixos-with-ipfs-part-1/

Also I think that you are quite right @veprbl 👍

@shlevy
Copy link
Member

shlevy commented Jan 19, 2017

Cool, thanks! Awesome idea.

else if (cmd == "cat_gw")
return "/ipfs/" + arg;
else
throw "No such command";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we catch strings anywhere, so this should be throw Error("...").

}
ipfsHash = IPFSAccessor::addFile(narPath, *narCompressed);

if (!ipfsHash.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would ipfsHash be empty here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, it shouldn't -- an IpfsNode should emit the hash + name to the client when the process completes without an err.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is uploaded through the HTTP API and a lot can go wrong there.
As for the cpp part, this is the relevant code:
https://github.com/vasild/cpp-ipfs-api/blob/master/src/client.cc#L164

I have tested the code in this PR with more paths after posting this RFC and quite some requests failed silently as the function is void and does not raise anything. As this is unacceptable, the next iteration of the implementation needs to include error handling when adding files if this feature will be even included.
Reason:
My research into IPFS revealed that one needs to pay attention to a lot of things (it's not ftp after all). These include trivial things like selecting a chunker and rather complex tasks like collecting garbage after n ipfs adds while not throwing away unpinned content (race condition).
From a design perspective, adding nars/nix store paths to IPFS must be handles by a separate tool as this is to much complexity to go into Nix (following Ken Thompson's philosophy here).
(Also part of the reason for #1167 (comment))

@@ -41,6 +41,8 @@ NarInfo::NarInfo(const Store & store, const std::string & s, const std::string &
compression = value;
else if (name == "FileHash")
fileHash = parseHashField(value);
else if (name == "IPFSHash")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this IpfsNarHash or something for future comparability with computing Nars on the fly from some nicer format? Or am I jumping the gun for a file called nar-info.cc after all :).

@@ -290,8 +310,23 @@ bool BinaryCacheStore::isValidPathUncached(const Path & storePath)
void BinaryCacheStore::narFromPath(const Path & storePath, Sink & sink)
{
auto info = queryPathInfo(storePath).cast<const NarInfo>();
std::shared_ptr<std::string> nar;

#if ENABLE_IPFS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented, downloading from IPFS can open the door for new attacks. I will recommend adding a download-from-ipfs flag as well which is disabled by default.

@mguentner
Copy link
Author

Thank you for the reviews / comments. I am currently rewriting the implementation as injecting IPFS directly into BinaryCacheStore seems not like the best way to go forward anymore -
instead I am writing a dedicated IPFSCacheStore that resolves /ipfs/

Stay tuned.

@edolstra
Copy link
Member

@mguentner How would that work, and how would it handle narinfo files?

The present approach seems reasonable to me. It just needs a flag to enable/disable IPFS (e.g. as part of the store URI, like https://cache.nixos.org/?ipfs=true).

Another possibility: rather than add an IPFSHash field to narinfo, we could turn the URL field into a list, allowing the binary cache to announce multiple ways to get the file:

URL: nar/1cvgji7mk3q68f257fmwlqvz8rhfdla6y0lxwqq8nwxagy3w34cx.nar.xz # i.e. HTTP, relative URI
URL: ipfs:/ipfs/QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT

This would have the advantage of not putting transport-specific info in the NarInfo data structure.

@Ericson2314
Copy link
Member

Ericson2314 commented Jan 20, 2017

@mguentner I like that proposed code structure. From just how you described it (I haven't read the interface your implementing), I think having two separate implementations segues into a narless world nicely.

@mguentner
Copy link
Author

The binary cache implementation is finished:
https://github.com/NixIPFS/nix/tree/ipfs_binary_cache

Have a look at the code - there are some small TODOs in there but

    binaryCaches = [
      "/ipns/nix.ipfs.sourcediver.org/binary_cache"
    ];

just works 🎉 and can me merged soonish.

If you want to test it (easy: nixops VirtualBox config), have a look at
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

Currently there is only release-16.09-small but release-16.09 has been tested as well and will follow once we have the infrastructure in place (have a look at https://github.com/NixIPFS/infrastructure/issues) 👍

@mguentner
Copy link
Author

@edolstra Addressing your question:
The .narinfo files are available just like using https://cache.nixos.org with their signature from cache.nixos.org-1.
Example:
https://ipfs.io/ipfs/QmR9NabMW7E3XLksJTdtpWYsro9EZpy1NxcbtEtw8Cr8Sq/binary_cache/00n5n3g1jlffq11d4mq7hy1d6yr3x91p.narinfo

I like your idea of adding multiple URL: but NarInfo files are currently a map/dict and you are suggesting to make it a multimap, which I think adds too much complexity.
I still think that adding &ipfs=true to any binary cache is a nice to have but it currently only makes sense using the HTTP cache, right? So we would add code to BinaryCache that is only usable for one BinaryCache, namely the HTTPBinaryCache. In my opinion, the right way to go forward is to use a IPFS binary cache until another form of binary distribution is found (see #1006 + #859) or we will create way too much technical debt.

@matthiasbeyer
Copy link
Contributor

matthiasbeyer commented Jan 30, 2017

This might be a stupid question, but I'll ask it anyways:

If I package a piece of software, I define the nix expression for it. How do I add it to IPFS without changing the nix expression? I mean... I write down the default.nix, build it... and now things get pushed to IPFS, I get a hash. Now I need to add this hash to the default.nix - so, I need to change it...

or am I getting something wrong here?


Forget the above. I guess I get it...

@mguentner mguentner closed this Feb 4, 2017
@nmikhailov
Copy link

@mguentner So what is the status of this? Why was it closed?

@mguentner
Copy link
Author

@nmikhailov

Nothing changed much since #1167 (comment)
I am trying to get everything organised over @ https://github.com/NixIPFS - the core infrastructure for the initial distribution needs to be setup and in parallel the binary cache needs to be merged into nix.
If you have spare servers/storage for the initial distribution have a look @ NixIPFS/infrastructure#1

Again, you can try the current status using this nixops config:
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

(Once the machines are setup, do nixops ssh -d name bc_user and realise a path, e.g. nix-store --realise /nix/store/005zk8a10js00kbhgcbq48h4cv5im1qn-yelp-3.22.0)

@matthiasbeyer
Copy link
Contributor

@mguentner Is there/will there be a way to partially mirror a channel? I have machines, but not enough storage for a complete channel...

@mguentner
Copy link
Author

You will be able to run a local gateway that serves content from the binary cache and then caches/redistributes the content until it is garbage collected (LRU) depending on how much storage you allocate for this. If you want to warm your cache with a partial channel, you need to write a script/nixos test that requests all the hashes you are interested in storing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet