Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] Unable to unzip large gzip file #4133

Closed
bjfish opened this issue Sep 3, 2016 · 5 comments
Closed

[Truffle] Unable to unzip large gzip file #4133

bjfish opened this issue Sep 3, 2016 · 5 comments
Assignees
Milestone

Comments

@bjfish
Copy link
Contributor

bjfish commented Sep 3, 2016

Environment

require "zlib"

#path = "/Users/brandonfish/Documents/jruby-patches/gzipped_versions"
path = "./gzipped_versions"

content = IO.binread(path)
result = Zlib::GzipReader.new(StringIO.new(content)).read

puts result
puts path
puts ""

gzipped_versions.gz

Found while trying to do bundle install

Expected Behavior

The script prints a bunch of text.

Actual Behavior

Fails with Out of Memory Error

@chrisseaton
Copy link
Contributor

chrisseaton commented Oct 30, 2016

I can run this example with a gzipped file of 5 MB (so about the same size) in 90 seconds. The file attached to this issue doesn't seem to ever finish and gradually consumes more memory.

screen shot 2016-10-30 at 14 15 25

A heap dump shows it's all ASCII leaf ropes.

screen shot 2016-10-30 at 14 18 21

A majority of the ropes seem to be 8,216 or 25 bytes long. Are these some buffer sizes somewhere?

Presumably we're creating pathologically deep ropes here and need a new flattening heuristic. I don't understand why a real gzipped file behaves differently to a random one, but the random one is very slow anyway.

@bjfish
Copy link
Contributor Author

bjfish commented Oct 30, 2016

@chrisseaton My current understanding is that append to rope is worst case because every append will add two nodes which consumes a lot of memory. @nirvdrum said he has some ideas on how to fix this.

The shape looks something like this:

str = "a"
10.times {
  str << "a"
}
Truffle::Ropes.debug_print_rope(str)
Legend: 
BN = Bytes Null? (byte[] not yet populated)
BL = Byte Length
CL = Character Length
CR = Code Range
O = Offset (SubstringRope only)
T = Times (RepeatingRope only)
D = Depth
LD = Left Depth (ConcatRope only)
RD = Right Depth (ConcatRope only)
aaaaaaaaaaa (ConcatRope; BN: true; BL: 11; CL: 11; CR: CR_7BIT; D: 11; LD: 10; RD: 1)
|  aaaaaaaaaa (ConcatRope; BN: true; BL: 10; CL: 10; CR: CR_7BIT; D: 10; LD: 9; RD: 1)
|  |  aaaaaaaaa (ConcatRope; BN: true; BL: 9; CL: 9; CR: CR_7BIT; D: 9; LD: 8; RD: 1)
|  |  |  aaaaaaaa (ConcatRope; BN: true; BL: 8; CL: 8; CR: CR_7BIT; D: 8; LD: 7; RD: 1)
|  |  |  |  aaaaaaa (ConcatRope; BN: true; BL: 7; CL: 7; CR: CR_7BIT; D: 7; LD: 6; RD: 1)
|  |  |  |  |  aaaaaa (ConcatRope; BN: true; BL: 6; CL: 6; CR: CR_7BIT; D: 6; LD: 5; RD: 1)
|  |  |  |  |  |  aaaaa (ConcatRope; BN: true; BL: 5; CL: 5; CR: CR_7BIT; D: 5; LD: 4; RD: 1)
|  |  |  |  |  |  |  aaaa (ConcatRope; BN: true; BL: 4; CL: 4; CR: CR_7BIT; D: 4; LD: 3; RD: 1)
|  |  |  |  |  |  |  |  aaa (ConcatRope; BN: true; BL: 3; CL: 3; CR: CR_7BIT; D: 3; LD: 2; RD: 1)
|  |  |  |  |  |  |  |  |  aa (ConcatRope; BN: true; BL: 2; CL: 2; CR: CR_7BIT; D: 2; LD: 1; RD: 1)
|  |  |  |  |  |  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  |  |  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  |  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  |  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)
|  a (AsciiOnlyLeafRope; BN: false; BL: 1; CL: 1; CR: CR_7BIT; D: 1)

@chrisseaton
Copy link
Contributor

Are you saying that's definitely how the deflate code works or are you just making a good guess? Can you point me at it the right place in the source?

@bjfish
Copy link
Contributor Author

bjfish commented Oct 30, 2016

@chrisseaton I'm pretty sure it uses a String/StringIO by default for buffers and a deflate destination. Internally, the ZStream uses a Bytef (

) string buffer output by default. It appears it may have some support for using array for output (my experiments with the array output had marginal success, because I think strings were still used somewhere in zlib). I think this is where the buffer is appended:
https://github.com/jruby/jruby/blob/master/lib/ruby/truffle/pr-zlib/lib/pr/zlib.rb#L138

Also, it looks like StringIO intends to use strings with a lot of appending with its write method.

@bjfish
Copy link
Contributor Author

bjfish commented Jan 8, 2017

I'm going to close this since it appears to give the expected results now.

I took some timings that shows there some room for improvement:
MRI

real	0m1.971s
user	0m0.213s
sys	0m0.159s

jruby+truffle

real	4m14.588s
user	4m35.363s
sys	0m3.570s

@bjfish bjfish closed this as completed Jan 8, 2017
@enebo enebo added this to the truffle-dev milestone Jan 10, 2017
@enebo enebo added this to the Non-Release milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants