Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnist: init at 2018-11-16 #50448

Merged
merged 1 commit into from Dec 3, 2018
Merged

mnist: init at 2018-11-16 #50448

merged 1 commit into from Dec 3, 2018

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Nov 16, 2018

Motivation for this change

Adding the MNIST dataset. This makes it really useful to use any ML applications that depends on this dataset, and tests can be ran using this smaller dataset.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Nov 16, 2018

I added the version as the date today because the package doesn't have versions. Unless you count the paper 1999 as the "version". I'm not even sure if this paper: LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.

@CMCDragonkai
Copy link
Member Author

Related: #46922

@CMCDragonkai
Copy link
Member Author

I'm also wondering if this is the right way of doing multi-source derivations. I've just copied the style from Ocaml derivations. I also wonder about mv vs cp operations in the install phase. The files are not meant to be unpacked, as MNIST readers expect the data as compressed.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Nov 19, 2018

I've discovered another way to create this. It's actually more minimal. Because the current derivation leaves the "srcs" still in the /nix/store, and ends up copying the data.

The below just creates symlinks to the individual data.

  let
    srcs = {
      train-images = fetchurl {
        url = "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz";
        sha256 = "029na81z5a1c9l1a8472dgshami6f2iixs3m2ji6ym6cffzwl3s4";
      };
      train-labels= fetchurl {
        url = "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz";
        sha256 = "0p152200wwx0w65sqb65grb3v8ncjp230aykmvbbx2sm19556lim";
      };
      test-images= fetchurl {
        url = "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz";
        sha256 = "1rn4vfigaxn2ms24bf4jwzzflgp3hvz0gksvb8j7j70w19xjqhld";
      };
      test-labels= fetchurl {
        url = "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz";
        sha256 = "1imf0i194ndjxzxdx87zlgn728xx3p1qhq1ssbmnvv005vwn1bpp";
      };
    };
  in
    linkFarm
    "mnist-2018-11-16"
    [
      { name = srcs.train-images.name; path = srcs.train-images; }
      { name = srcs.train-labels.name; path = srcs.train-labels; }
      { name = srcs.test-images.name; path = srcs.test-images; }
      { name = srcs.test-labels.name; path = srcs.test-labels; }
    ]

However I suspect this does not fit the conventions because there is no maintainer information nor is there any version information. Is there something equivalently simple? Or should I just swap the cp for ln?

@CMCDragonkai
Copy link
Member Author

@c0bw3b What do you think of using ln vs cp?

@c0bw3b
Copy link
Contributor

c0bw3b commented Nov 28, 2018

I think it's preferable to ln over cp since fetchurl is going to install the sources in the nix store.

The only other approach I can think of would be to use fetchzip instead of fetchurl and install -D -t $out <theFlatFilesWeWant> in your install phase?
You may need to add unpackPhase to you list of phases for this to work. I have not tested locally.
migmix font seems to be close enough to what I have in mind.

@CMCDragonkai
Copy link
Member Author

The files should not be decompressed.

@c0bw3b
Copy link
Contributor

c0bw3b commented Nov 28, 2018

Oh.. Then I guess your current derivation is the better approach.

@CMCDragonkai
Copy link
Member Author

@c0bw3b ready to merge?

@veprbl
Copy link
Member

veprbl commented Dec 3, 2018

Would it make sense to mark it with preferLocalBuild = true?

@CMCDragonkai
Copy link
Member Author

@veprbl why would we do that when none of the data packages do that?

Copy link
Contributor

@c0bw3b c0bw3b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally. LGTM.

@c0bw3b c0bw3b merged commit 4cde69a into NixOS:master Dec 3, 2018
@CMCDragonkai CMCDragonkai deleted the mnist branch January 10, 2019 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants