Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash has bug to set encoding into key string wrongly when the key string is used once with different encoding #3405

Closed
tagomoris opened this issue Oct 21, 2015 · 7 comments

Comments

@tagomoris
Copy link

On JRuby 9.0.1.0, Hash sets wrong encoding only when the key string is used as Hash key with different encoding.

hash1 = {}
hash1['str'.force_encoding('ASCII-8BIT')] = 1
p hash1.keys.first.encoding # ASCII-8BIT

hash2 = {}
hash2['str'.force_encoding('UTF-8')] = 1
p hash2.keys.first.encoding # ASCII-8BIT !? (expected: UTF-8)

Script to show situations to reproduce bugs:

# encoding: ascii-8bit

str = 'hello'
obj1 = {'hello' => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'hello'.force_encoding('UTF-8')
obj1 = {'hello'.force_encoding('UTF-8') => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'hello'.force_encoding('UTF-8')
obj1 = {}
obj1[str] = 1
obj2 = {}
obj2[str] = 2
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'world'.force_encoding('UTF-8')
obj1 = {'world'.force_encoding('UTF-8') => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'unused'
obj1 = {'unused'.force_encoding('UTF-8') => 1}
obj2 = {str.force_encoding('UTF-8') => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

Expected result is:

  • First line: all keys are encoded in ASCII-8BIT
  • Rest: all keys are encoded in UTF-8

Actual result in JRuby 9.0.1.0 is:

  • First line: all keys are encoded in ASCII-8BIT
  • All hello as hash keys are encoded in ASCII-8BIT
@tagomoris
Copy link
Author

JRuby 1.7.x and MRI works as expected, and only JRuby 9.0.1.0 works in wrong way.

All outputs from script above in JRuby 9.0.1.0, JRuby 1.7.22 and MRI 2.2.2:

# $ ruby -v ; ruby ~/encoding_test.rb
# jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.31-b07 on 1.8.0_31-b13 +jit [darwin-x86_64]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

# $ ruby -v ; ruby ~/encoding_test.rb
# jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 +jit [darwin-x86_64]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

# $ ruby -v ; ruby ~/encoding_test.rb
# ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

@headius
Copy link
Member

headius commented Oct 23, 2015

Huh, wacky.

@headius
Copy link
Member

headius commented Oct 23, 2015

Ok, I think this is due to our Hash logic using the new "string".freeze optimization for String keys. They are getting deduplicated, but improperly not considering encoding in the dedup store.

Options:

  • Don't dedup keys except for literal hashes with literal string keys.
  • Fix dedup to consider encoding as well.

@enebo enebo modified the milestones: JRuby 9.0.5.0, JRuby 9.0.4.0 Nov 12, 2015
@headius
Copy link
Member

headius commented Nov 19, 2015

I will fix this today.

@headius
Copy link
Member

headius commented Nov 19, 2015

I have chosen the second option for now. The proper way for us to dedup hash keys is to only do it for literal hashes with literal string keys, but that will require a bit more work in IR compilation. The short term fix of only using the cached string if the encoding matches will work for now (and may be necessary anyway).

@headius
Copy link
Member

headius commented Nov 19, 2015

I'm not sure the best way to spec this out, since it depends on literal strings in hashes having specific encodings; this is not easy to spec in a single file. I've filed ruby/spec#162 to discuss that, and we'll proceed for now without a spec or test to go with my fix.

@headius
Copy link
Member

headius commented Nov 19, 2015

I've committed a fix for the dedup cache to consider encoding. It's a trivial change, preventing new strings with the same contents but different encodings from updating the cache or picking up a wrong-encoding string.

I'll file a separate issue to fix dedup behavior wrt Hash keys, and we'll work on specs separately in ruby/spec#162.

headius added a commit to ruby/spec that referenced this issue Nov 20, 2015
Specs for #162.

I confirmed it does test behavior from jruby/jruby#3405:

jruby 9.0.4.0 (2.2.2) 2015-11-12 b9fb7aa Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [darwin-x86_64]
..................F

1)
Hash literal does not change encoding of literal string keys during creation FAILED
Expected #<Encoding:ASCII-8BIT>
 to equal #<Encoding:UTF-8>

/Users/headius/projects/rubyspec/language/hash_spec.rb:145:in `block in (root)'
org/jruby/RubyBasicObject.java:1633:in `instance_eval'
org/jruby/RubyEnumerable.java:1552:in `all?'
org/jruby/RubyFixnum.java:301:in `times'
org/jruby/RubyArray.java:1560:in `each'
/Users/headius/projects/rubyspec/language/hash_spec.rb:6:in `<top>'
org/jruby/RubyKernel.java:957:in `load'
org/jruby/RubyBasicObject.java:1633:in `instance_eval'
org/jruby/RubyArray.java:1560:in `each'

Finished in 0.091000 seconds

1 file, 19 examples, 44 expectations, 1 failure, 0 errors

jruby 9.0.5.0-SNAPSHOT (2.3.0) 2015-11-20 a8eab95 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [darwin-x86_64]
...................

Finished in 0.101000 seconds

1 file, 19 examples, 46 expectations, 0 failures, 0 errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants