Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid hash value for nested arrays #3884

Closed
flash-gordon opened this issue May 12, 2016 · 21 comments
Closed

Invalid hash value for nested arrays #3884

flash-gordon opened this issue May 12, 2016 · 21 comments

Comments

@flash-gordon
Copy link

Environment

It seems that every version of jruby starting from at least 9k-pre1 has the following behavior.

Expected Behavior

2.3.1 :001 > [1, [1, []]].hash
 => 2397343324588010773
2.3.1 :002 > [2, [2, []]].hash
 => 4238421788679554352

Actual Behavior

jruby-9.1.0.0 :086 > [1, [1, []]].hash
 => 0
jruby-9.1.0.0 :087 > [2, [2, []]].hash
 => 0

We expect things to work in dry-validation because we use caching based on a hash of AST which in order is represented by a deeply nested array. The issue leads to returning the same result based on different outputs which surprises a lot.

Original values that you can use for testing a fix:

jruby-9.1.0.0 :001 > [[:and, [[:val, [:predicate, [:key?, [:age]]]], [:key, [:age, [:predicate, [:lt?, [23]]]]]]]].hash
 => 8481884717
jruby-9.1.0.0 :002 > [[:and, [[:val, [:predicate, [:key?, [:foo]]]], [:key, [:foo, [:predicate, [:lt?, [23]]]]]]]].hash
 => 8481884717

I don't know what exactly is going wrong but I tried to provide as minimal failing case as I could and it would supercool to have this fixed because you guys are doing a great job and having a support for jruby really matters for us, thank you!

@headius
Copy link
Member

headius commented May 12, 2016

That is very strange.

I will point out that just using a hash of the AST has potential to collide. In the past I have used SHA1 or similar to reduce that risk, but they're obviously much more expensive than a simple hash calculation.

@flash-gordon
Copy link
Author

Thanks! I will do the math but I guess a memory leak we introduce by caching is more likely than having an issue with hash collisions :) Anyway, I'll check the probability tomorrow, just need to recover some science from my mind places 🔬

headius added a commit that referenced this issue May 12, 2016
Relates to #3884, where JRuby
was always returning the same hash for certain combinations of
nested array with an innermost empty array.
@headius headius added this to the JRuby 9.1.1.0 milestone May 12, 2016
@headius
Copy link
Member

headius commented May 12, 2016

It turns out that the main issue was that we always hashed an empty array as 0, which had cascading effects for the rest of the Array#hash calculation. I went ahead and re-ported MRI's logic with some JRuby-specific tweaks.

@flash-gordon
Copy link
Author

@headius you're awesome, thank you! Will it be backported and which versions are affected? In other words could you tell me what versions of jruby we should test against (starting from 9k)?

@flash-gordon
Copy link
Author

btw I tested the code and it works now!

@headius
Copy link
Member

headius commented May 12, 2016

It will be in 9.1.1.0. If 1.7 exhibits the same problem we could backport there, but we only maintain one 9.x release at a time (i.e. there will never be a 9.0.6.0 now that 9.1 is out).

@flash-gordon
Copy link
Author

@headius OK. We don't support jruby 1.7, only 9k, so it's not an issue for us, feel free to decide about backporting.

@headius headius reopened this May 13, 2016
@headius
Copy link
Member

headius commented May 13, 2016

My ported hash logic appears to have broken some recursive detection, so I'm backing off that change and trying to just do the [] hash fix. I think that will be enough, but we might need you to reverify.

@flash-gordon
Copy link
Author

@headius yeah, sure. A few minutes ago I found that something is not quite right too :)

@headius
Copy link
Member

headius commented May 13, 2016

I have a patch to master that gets specs passing again (silly bit width mistake on my part):

diff --git a/core/src/main/java/org/jruby/RubyArray.java b/core/src/main/java/org/jruby/RubyArray.java
index 6a0bb4c..1dd924f 100644
--- a/core/src/main/java/org/jruby/RubyArray.java
+++ b/core/src/main/java/org/jruby/RubyArray.java
@@ -678,7 +678,7 @@ public class RubyArray extends RubyObject implements List, RandomAccess {
     public RubyFixnum hash(ThreadContext context) {
         final Ruby runtime = context.runtime;
         int begin = RubyArray.this.begin;
-        long h = (realLength << 32) & System.identityHashCode(RubyArray.class);
+        long h = (((long) realLength) << 32) & System.identityHashCode(RubyArray.class);
         for (int i = begin; i < begin + realLength; i++) {
             h = (h << 1) | (h < 0 ? 1 : 0);
             final IRubyObject value = safeArrayRef(runtime, values, i);

Doing further testing on it locally before pushing.

@flash-gordon
Copy link
Author

OK I'll be able to run the specs again In a couple of hours
On Пт, 13 мая 2016 г. at 20:54, Charles Oliver Nutter <
notifications@github.com> wrote:

I have a patch to master that gets specs passing again (silly bit width
mistake on my part):

diff --git a/core/src/main/java/org/jruby/RubyArray.java b/core/src/main/java/org/jruby/RubyArray.java
index 6a0bb4c..1dd924f 100644--- a/core/src/main/java/org/jruby/RubyArray.java+++ b/core/src/main/java/org/jruby/RubyArray.java@@ -678,7 +678,7 @@ public class RubyArray extends RubyObject implements List, RandomAccess {
public RubyFixnum hash(ThreadContext context) {
final Ruby runtime = context.runtime;
int begin = RubyArray.this.begin;- long h = (realLength << 32) & System.identityHashCode(RubyArray.class);+ long h = (((long) realLength) << 32) & System.identityHashCode(RubyArray.class);
for (int i = begin; i < begin + realLength; i++) {
h = (h << 1) | (h < 0 ? 1 : 0);
final IRubyObject value = safeArrayRef(runtime, values, i);

Doing further testing on it locally before pushing.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#3884 (comment)

@headius
Copy link
Member

headius commented May 13, 2016

Sorry...I was wrong, it doesn't pass specs. I'll report back here when I have everything passing.

@headius
Copy link
Member

headius commented May 13, 2016

Yeah for various reasons I think this is going to require a completely new recursion detection. I have a prototype that appears to be passing specs.

The new code should address #3887 too. May be a little too risky for 9.1.1.0...@enebo?

headius added a commit that referenced this issue May 14, 2016
The old recursion guard (ported from MRI years ago) had a number
of flaws:

* It forced an object ID for every object encountered.
* It created transient objects for every recursive stack.
* It was not using identity hashing (#3887).
* It used a RubyHash internally, which has much more overhead than
  a typical JDK Map.

The new implementation largely follows the pattern of the original
but fixes all the above items. It passes all untagged specs.

See also #3884, which started this whole thing.
headius added a commit that referenced this issue May 14, 2016
The fix here is as follows:

* Always provide a non-zero hash for any array, including empty.
* Better hash combination function with less collision.

An improved recursion guard will go into the next release.
@headius
Copy link
Member

headius commented May 14, 2016

Ok, I've merged in a simpler fix for 9.1.1.0, which you could verify. You might also be interested in trying out the test-new-recursion branch, which has the improved logic.

headius added a commit that referenced this issue May 14, 2016
headius added a commit that referenced this issue May 14, 2016
headius added a commit that referenced this issue May 14, 2016
The old recursion guard (ported from MRI years ago) had a number
of flaws:

* It forced an object ID for every object encountered.
* It created transient objects for every recursive stack.
* It was not using identity hashing (#3887).
* It used a RubyHash internally, which has much more overhead than
  a typical JDK Map.

The new implementation largely follows the pattern of the original
but fixes all the above items. It passes all untagged specs.

See also #3884, which started this whole thing.
@flash-gordon
Copy link
Author

Thanks! I tested both solutions and here is what I got:

  1. Smaller fix f1e056b works fine, all specs (2K+) are green with several runs & random order.
  2. test-new-recursion fails the following example:
irb(main):017:0> [{a: 1}].hash
=> 5019126971421186946
irb(main):018:0> [{b: 1}].hash
=> 5019126971421186946

@headius
Copy link
Member

headius commented May 16, 2016

@flash-gordon I do not see that on the current branch, but I believe I had fixes go in after you tested.

[] ~/projects/jruby $ jruby -e 'p [{a: 1}].hash'
1994458792184350789

[] ~/projects/jruby $ jruby -e 'p [{b: 1}].hash'
4863057760227230173

I do have one MRI test failure to investigate.

@flash-gordon
Copy link
Author

@headius you should test it in the same process

gordon@mac ~/dev/jruby [test-new-recursion] $ ./bin/jruby -e 'p [{a: 1}].hash; p [{b: 1}].hash;'
6962820890589519712
6962820890589519712

@headius
Copy link
Member

headius commented May 16, 2016

Yeah I realized that afterward, and it turns out the MRI failure was the same issue. It's now fixed on the branch and will be merged into 9.1.2.0. Feel free to verify again if you like :-)

@flash-gordon
Copy link
Author

Sure. I ran specs (again, 2.2K specs) ten times with random order and had no issues, so it seems to work fine and it's very awesome, thank you!

@flash-gordon
Copy link
Author

Btw, when do you plan to release 9.1.1.0. I mean rough estimation of course.

@headius
Copy link
Member

headius commented May 16, 2016

Within the next couple days. Blocked on #3867 and related issues right now (problems activating gems inside bundle exec.

flash-gordon added a commit to dry-rb/dry-validation that referenced this issue May 16, 2016
* See jruby/jruby#3884 for retails
* I'll replace it with add 9.1.1.0 release when it is available
headius added a commit that referenced this issue May 17, 2016
The old recursion guard (ported from MRI years ago) had a number
of flaws:

* It forced an object ID for every object encountered.
* It created transient objects for every recursive stack.
* It was not using identity hashing (#3887).
* It used a RubyHash internally, which has much more overhead than
  a typical JDK Map.

The new implementation largely follows the pattern of the original
but fixes all the above items. It passes all untagged specs.

See also #3884, which started this whole thing.
headius added a commit that referenced this issue May 18, 2016
The old recursion guard (ported from MRI years ago) had a number
of flaws:

* It forced an object ID for every object encountered.
* It created transient objects for every recursive stack.
* It was not using identity hashing (#3887).
* It used a RubyHash internally, which has much more overhead than
  a typical JDK Map.

The new implementation largely follows the pattern of the original
but fixes all the above items. It passes all untagged specs.

See also #3884, which started this whole thing.
eregon pushed a commit to ruby/spec that referenced this issue May 29, 2016
Relates to jruby/jruby#3884, where JRuby
was always returning the same hash for certain combinations of
nested array with an innermost empty array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants