Add Zip::File, Zip::Reader and Zip::Writer #3901

asterite · 2017-01-15T20:00:53Z

Please check the docs for Zip, Zip::File, Zip::Reader and Zip::Writer. The docs explain why there are two types for reading a zip file.

I needed to add File#read_at and IO::Memory#read_at for this. This isn't strictly necessary, but provides a way to read multiple files from a zip file (stored in a File or IO::Memory) concurrently without problems, so you could for example extract multiple files concurrently. For File I used pread, and for IO::Memory I use a sub-view. Go also uses pread for reading from a zip file.

bcardiff · 2017-01-16T14:58:30Z

I think there we are missing specs for zip with files nested in directories.
From this SO thread we should be able to write with / but both \ and / should be checked at parsing.

asterite · 2017-01-16T15:05:25Z

@bcardiff Zip entries have a filename, for example "foo/bar/baz.txt". There's no notion of "directories" inside a zip file, you just add entries to it. A directory entry is assumed to have a trailing "/" and no content, but that's not enforced by the zip spec. What would the missing spec look like?

bcardiff · 2017-01-16T18:32:45Z

I checked with a .zip created in windows and they do have forward slashes as path separator. This SO thread had better references.

Regarding missing specs, I though entries were traversed in a tree fashion. Maybe a spec could be added to clarify that.

asterite · 2017-01-16T18:50:21Z

Entries are stored sequentially inside a zip file. I must admit that I learned everything about the zip file format for implementing this functionality and I also didn't know how zip file entries were stored. I don't know how much of the spec we should repeat in the docs, though... For example, in the spec it says:

4.3 General Format of a .ZIP file
---------------------------------

 4.3.3 Files MAY be stored in arbitrary order within a ZIP file.

So you can put first a file named "foo/bar/baz.txt", then a directory named "foo/", then a directory named "foo/bar/" and so on, without any order that relates to a tree in the filesystem.

There's also the thing that Zip::Reader reads zip entries sequentially from a stream, so this means the order comes from the zip file and can't be that of a tree. For Zip::File we could do something "smarter", but maybe people expect the entries to be in a specified order, because a zip file maintains an order (so if at one point I compress a file and put file "a.txt" in the beginning and in another point in the program I decompress it, I can be very sure that the first entry will be "a.txt")

Sija · 2017-01-17T17:00:54Z

Does it support ZIP64 archives?

asterite · 2017-01-17T17:08:12Z

@Sija No, ZIP64 is not supported, there's a note I left in the doc comment of the Zip module. It can be implemented later in a separate PR. ZIP64 allows you to compress files bigger than 4GB, which shouldn't be terribly common.

Sija · 2017-01-17T17:29:51Z

@asterite Roger that, thx for clarifying.

Sija · 2017-01-17T17:31:42Z

src/zip/zip.cr

+#
+# `Zip::File` is the preferred method to read zip files if you have
+# can provide a `File`, because it's a bit more flexible and provides
+# more complete information for zip entries (such as comments).


[...] if you have can provide [...] — there's either too much or too lil' of sth ;)

spalladino

@asterite I've added a few comments: most of them questions or suggestions. There is only one small thing I think could be a bug, but maybe I'm wrong.

spalladino · 2017-01-17T18:24:00Z

src/file.cr

+
+    if self_bytesize < 0
+      raise ArgumentError.new("negative bytesize")
+    end


Shouldn't this check be on bytesize instead of self_bytesize?

You are right! I'll fix it

spalladino · 2017-01-17T18:26:21Z

spec/std/file_spec.cr

+      io.gets_to_end.should eq(File.read(filename))
+    end
+  end
+


I'd add a spec on the invalid arguments cases: negative offset and offset plus size out of bounds

Actually, I see there are such specs for memory. Maybe add them to file as well?

And I'll add a spec too :-)

spalladino · 2017-01-17T18:29:08Z

src/file/preader.cr

+
+  def unbuffered_write(slice : Bytes)
+    raise IO::Error.new("can't flush read-only IO")
+  end


flush => write?

spalladino · 2017-01-17T18:34:26Z

spec/std/zip/zip_file_spec.cr

+  end
+
+  typeof(Zip::File.new("file.zip"))
+  typeof(Zip::File.open("file.zip") { })


What are these two typeof calls for here?

They check that these calls compile, regardless of what they do. Tests for these should almost be the same as previous tests, but I want to make sure that their body compiles fine.

spalladino · 2017-01-17T18:38:58Z

spec/std/zip/zip_spec.cr

+  typeof(Zip::Reader.open("file.zip") { })
+
+  typeof(Zip::Writer.new("file.zip"))
+  typeof(Zip::Writer.open("file.zip") { })


Same here, maybe I'm missing some standard practice in the specs?

spalladino · 2017-01-17T18:44:39Z

src/zip/checksum_reader.cr

+  # Computes a CRC32 while reading from an underlying IO,
+  # optionally verifying the computed value against an
+  # expected one.
+  private class ChecksumReader


Do we want this class to be private and scoped in the Zip module? As far as I understand, CRC32 is quite useful in many other applications, so maybe we could move it to Zlib and make it public. What do you think?

We could do that, maybe in a next PR. I checked some other implementations, for example Go has something similar and it's private to the zip package, maybe it's not that useful in other contexts.

spalladino · 2017-01-17T18:45:00Z

src/zip/checksum_writer.cr

+module Zip
+  # Counts written bytes and optionally computes a CRC32
+  # checksum while writing to an underlying IO.
+  private class ChecksumWriter


Same as with Reader

spalladino · 2017-01-17T18:49:33Z

src/zip/file.cr

+    directory_end_offset = find_directory_end_offset
+    entries_size, directory_offset = read_directory_end(directory_end_offset)
+    @entries = Array(Entry).new(entries_size)
+    @entries_by_filename = {} of String => Entry


Zip allows duplicate entries, though it seems to be a VERY rare use case, and can still be handled by simply iterating the @entries. If we want to support it, then @entries_by_filename should be a hash from string to Array(Entry), which I don't particularly like since it makes the API more awkward. What do you think?

I took inspiration from Java's API. Apparently C# has it too. They don't mention the possibility of duplicated entries. Maybe it's so uncommon that it isn't worth considering, and a nicer API is better.

spalladino · 2017-01-17T18:54:26Z

src/zip/file.cr

+        io.skip(22)
+        filename_length = read(io, UInt16)
+        extra_length = read(io, UInt16)
+        @offset + 30 + filename_length + extra_length


I'm gonna go ahead and trust you in the 22 and 30 there :-P

It's basically skipping 22 bytes from the header instead of reading them one by one. 30 is the size of the header without counting the filename length and extra length.

Can we get a code comment about those constants?

@RX14 @spalladino here you go: 6952e76

spalladino · 2017-01-17T18:58:42Z

spec/std/zip/zip_file_spec.cr

+      zip["one.txt"].open(&.gets_to_end).should eq("One")
+      zip["two.txt"].open(&.gets_to_end).should eq("Two")
+    end
+  end


How was the test.zip generated? I was going to suggest adding a spec to read a zip file generated with a standard tool (otherwise we are just checking that we can read the zip files that we have written ourselves), but if this file was created somehow else, then that covers it.

However, the other way around would also apply. Can we can rely to have a zip command-line tool on the CI (at least for the major architectures), so we test that it can open zipfiles generated from Crystal?

I generated test.zip by creating a zip file using Mac, right-click + compress. In fact when I did that I found some bugs in the original code I had, so that's why later I decided to add a zip file that wasn't generated by the same Zip module.

I don't know if we have a zip command line tool on the CI, but it's definitely a possibility (but maybe fore the future)

Adding zip to CI shouldn't be hard in the future.

Additionally add File#read_at and IO::Memory#read_at

asterite added kind:feature pr:needs-review status:draft topic:stdlib labels Jan 15, 2017

asterite force-pushed the feature/zip branch from da52583 to 029543b Compare January 15, 2017 20:31

spalladino self-requested a review January 17, 2017 15:31

Sija reviewed Jan 17, 2017

View reviewed changes

asterite force-pushed the feature/zip branch from 029543b to eda3576 Compare January 17, 2017 17:41

spalladino reviewed Jan 17, 2017

View reviewed changes

Add Zip::File, Zip::Reader and Zip::Writer

06bbe30

Additionally add File#read_at and IO::Memory#read_at

asterite force-pushed the feature/zip branch from eda3576 to 06bbe30 Compare January 17, 2017 19:53

spalladino merged commit 4e7c5c7 into master Jan 19, 2017

spalladino added this to the Next milestone Jan 19, 2017

spalladino removed pr:needs-review status:draft labels Jan 19, 2017

asterite deleted the feature/zip branch January 19, 2017 17:57

Sija mentioned this pull request Sep 22, 2017

Update to Crystal 0.20.4 pablotron/libzip-crystal#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zip::File, Zip::Reader and Zip::Writer #3901

Add Zip::File, Zip::Reader and Zip::Writer #3901

asterite commented Jan 15, 2017

bcardiff commented Jan 16, 2017

asterite commented Jan 16, 2017

bcardiff commented Jan 16, 2017

asterite commented Jan 16, 2017

Sija commented Jan 17, 2017

asterite commented Jan 17, 2017

Sija commented Jan 17, 2017

Sija Jan 17, 2017

spalladino left a comment

spalladino Jan 17, 2017

asterite Jan 17, 2017

spalladino Jan 17, 2017

spalladino Jan 17, 2017

asterite Jan 17, 2017

spalladino Jan 17, 2017

spalladino Jan 17, 2017

asterite Jan 17, 2017

spalladino Jan 17, 2017

spalladino Jan 17, 2017

asterite Jan 17, 2017

spalladino Jan 17, 2017

spalladino Jan 17, 2017

asterite Jan 17, 2017

spalladino Jan 17, 2017

asterite Jan 17, 2017

RX14 Jan 19, 2017

asterite Jan 19, 2017

spalladino Jan 17, 2017

asterite Jan 17, 2017

RX14 Jan 17, 2017

Add Zip::File, Zip::Reader and Zip::Writer #3901

Add Zip::File, Zip::Reader and Zip::Writer #3901

Conversation

asterite commented Jan 15, 2017

bcardiff commented Jan 16, 2017

asterite commented Jan 16, 2017

bcardiff commented Jan 16, 2017

asterite commented Jan 16, 2017

Sija commented Jan 17, 2017

asterite commented Jan 17, 2017

Sija commented Jan 17, 2017

Choose a reason for hiding this comment

spalladino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment