Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra padding for ByteArray after encoding #3157

Merged
merged 2 commits into from
Oct 17, 2014

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Oct 5, 2014

Currently String#encode produces an ill-state string in some cases.

In all other places, String allocates its internal ByteArray buffer with size + 1. Follow the rule here too.

This bug causes the following mysterious error (was hard to track down…):

The reproducible code

require 'json'

(s = "70g・1パック".encode(Encoding::UTF_16LE).force_encoding(Encoding::UTF_16LE).encode(Encoding::UTF_8));
p [s, s.bytesize, s.instance_variable_get(:@data)]
s.to_json

Before

bash-3.2$ ./bin/rbx ~/rubinius/bad-encode.rb 
["70g・1パック", 16, #<Rubinius::ByteArray:0x288 16 bytes>]
An exception occurred running /Users/ryo-onodera/rubinius/bad-encode.rb:
source sequence is illegal/malformed utf-8 (JSON::GeneratorError)

Backtrace:

                 Object#__script__ at /Users/ryo-onodera/rubinius/bad-encode.rb:5
  Rubinius::CodeLoader#load_script at kernel/delta/code_loader.rb:66
  Rubinius::CodeLoader.load_script at kernel/delta/code_loader.rb:152
           Rubinius::Loader#script at kernel/loader.rb:645
             Rubinius::Loader#main at kernel/loader.rb:799

After

bash-3.2$ ./bin/rbx ~/rubinius/bad-encode.rb 
["70g・1パック", 16, #<Rubinius::ByteArray:0x160 24 bytes>]

(*The final ByteArray size is aligned by ObjectHeader::align)

@yorickpeterse
Copy link
Contributor

@ryantm Is there some sort of test you can add for this? That way we can ensure this doesn't break again in the future.

@ryantm
Copy link
Contributor

ryantm commented Oct 6, 2014

@yorickpeterse Did you mean to mention @ryoqun?

@yorickpeterse
Copy link
Contributor

Woops, my bad. I indeed meant @ryoqun

@ryoqun
Copy link
Member Author

ryoqun commented Oct 10, 2014

@yorickpeterse I created a rather weird spec. Because this bug only causes subtle internal implementation-dependant state corruption of String, this is hard to spec only using external API... Instead should I rather create a Rubinius-specific spec under spec/core/string? Btw, this bug also depends on machine's pointer size. ;)

@yorickpeterse
Copy link
Contributor

@ryoqun If the test uses (or tests) things that are specific to Rbx then yes, it should probably be located in spec/core/string.

@@ -495,7 +495,7 @@ VALUE string_spec_RSTRING_PTR_iterate(VALUE self, VALUE str) {

ptr = RSTRING_PTR(str);
for(i = 0; i < RSTRING_LEN(str); i++) {
rb_yield(INT2FIX(ptr[i]));
rb_yield(CHR2FIX(ptr[i]));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to prevent chars larger than 127 from being negative at the Ruby side.

@brixen brixen merged commit 4dd7279 into rubinius:master Oct 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants