Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML.load significant performance degradation when JSON string is included #5098

Closed
vassilios opened this issue Mar 21, 2018 · 29 comments
Closed

Comments

@vassilios
Copy link

Environment

  • JRuby version
    jruby 9.1.16.0 (2.3.3) 2018-02-21 8f3f95a OpenJDK 64-Bit Server VM 25.72-b15 on 1.8.0_72-internal-b15 +jit [linux-x86_64]
  • Operating system and platform
    Linux ip-172-30-3-223 3.13.0-74-generic Bump compilation heap size for jruby-1_6 #118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Expected Behavior

On jruby 9.1.5.0, 9.1.6.0 and 9.1.7.0, the following gist test_yaml.rb completes in a few milliseconds.

$ jruby test_yaml.rb 
9.1.5.0
  0.240000   0.020000   0.260000 (  0.070586)

$ jruby test_yaml.rb 
9.1.6.0
  0.260000   0.010000   0.270000 (  0.075033)

$ jruby test_yaml.rb 
9.1.7.0
  0.280000   0.000000   0.280000 (  0.080087)

Actual Behavior

The same gist executed with jruby 9.1.8.0 takes more than 10 seconds.

$ jruby test_yaml.rb 
9.1.8.0
 11.570000   0.010000  11.580000 ( 11.411209)

Same thing happens for jruby 9.1.16.0

$ jruby test_yaml.rb 
9.1.16.0
 11.830000   0.010000  11.840000 ( 11.437966)
@headius
Copy link
Member

headius commented Mar 26, 2018

Good find! Investigating.

@headius
Copy link
Member

headius commented Mar 26, 2018

SnakeYAML was updated in October, so if that's the culprit it would have first manifested in JRuby 9.1.14.

A quick sampled profile shows only one big hot spot, a method in SnakeYAML: ScannerImpl.scanFlowScalarNonSpaces. On this profile, 650 out of 867 ticks were during this method's execution.

@headius
Copy link
Member

headius commented Mar 26, 2018

Ah I see you mention it being slow in 9.1.8.0. I'll start looking there for a possible culprit.

@headius
Copy link
Member

headius commented Mar 26, 2018

I traced it to the Psych update in b0487b4. Continuing.

@headius
Copy link
Member

headius commented Mar 26, 2018

The only substantive commit in Psych 2.2.3 was ruby/psych@fc55b92, which updated SnakeYAML from 1.17 to 1.18. So we're back to that theory again.

@headius
Copy link
Member

headius commented Mar 26, 2018

Manually downgrading SnakeYAML from 1.18 to 1.17 does not resolve the issue, so we're back to Psych being the culprit.

@headius
Copy link
Member

headius commented Mar 26, 2018

Manually downgrading Psych to 2.2.2 does appear to remedy the problem, but I'm still confused because there's really nothing different between 2.2.2 and 2.2.3 except for the SnakeYAML update.

@vassilios
Copy link
Author

@headius thanks for looking into this and keeping me up-to-date. Hope it gets resolved.

@headius
Copy link
Member

headius commented Mar 26, 2018

Ok ignore what I said earlier...after changing all references to "1.18" and manually downgrading SnakeYAML, performance does return to normal. I will check if latest SnakeYAML fixes it and if not I'll file an issue with them.

@headius
Copy link
Member

headius commented Mar 26, 2018

It appears this regression was fixed in a later version of SnakeYAML. Updating Psych to a build of 3.0.2 with SnakeYAML 1.20 returns performance to the original range. I will proceed to update Psych.

@vassilios
Copy link
Author

great, thank you @headius

@headius
Copy link
Member

headius commented Mar 26, 2018

The Psych update is on hold until we can get a fix for https://bitbucket.org/asomov/snakeyaml/issues/401/jrubys-psych-library-uses-markgetindex

Hopefully it won't take long.

@headius
Copy link
Member

headius commented Mar 26, 2018

Ok, so the bad news is that SnakeYAML 1.19 does not appear to fix the issue, so we'll have to wait for 1.21 that fixes the issue I filed so we can release a new Psych.

@headius
Copy link
Member

headius commented Mar 26, 2018

Short term workaround: you may be able to install an older Psych and use it. If you are using Bundler, adding an older Psych version to e.g. Gemfile may do the trick.

For a more permanent solution, you can replace lib/ruby/stdlib/psych* with the contents of the gem's lib dir. There may be some oddities matching versions with the gemspec, so that may need to be replaced under lib/ruby/gems/shared/specifications/default.

@vassilios
Copy link
Author

I can confirm that the workaround of downgrading the psych gem works.
Here's what I did.

I added psych version 2.2.2 in my Gemfile.

source 'https://rubygems.org'
gem 'psych', '2.2.2'

And added those two lines at the top of the gist.

require 'rubygems'
require 'bundler/setup'

The output after those changes showed no degradation:

$ jruby -S test_yaml.rb 
9.1.16.0
  0.220000   0.010000   0.230000 (  0.062216)

@asomov
Copy link

asomov commented Apr 4, 2018

If you test SnakeYAML 1.21-SNAPSHOT and it is quick enough, we can release it immediately, otherwise it will be released in August 2018.
The performance penalty is most probably caused by more complex Unicode processing. SnakeYAML now properly works with characters above 65536 (Java chars were designed for 2 bytes per char)

@vassilios
Copy link
Author

@asomov thank you. @headius would you be testing the SnakeYAML 1.21-SNAPSHOT ? thx

@headius
Copy link
Member

headius commented Apr 11, 2018

@asomov @vassilios Testing now!

@headius
Copy link
Member

headius commented Apr 12, 2018

@asomov @vassilios The 1.21 version of SnakeYAML does indeed fix the performance issue, and requires only minor tweaks to the Psych code. Ship it! 👍

@vassilios
Copy link
Author

@headius thank you for testing it!

@asomov
Copy link

asomov commented Apr 13, 2018

1.21 released.
Please be aware that now plain scalars are allowed in the flow context.
{foo:bar} -> foo:bar -> null

@headius
Copy link
Member

headius commented Apr 16, 2018

@asomov Thank you!

@asomov
Copy link

asomov commented Apr 16, 2018

Is not yet solved ?

@headius
Copy link
Member

headius commented Apr 16, 2018

@asomov This issue should not be closed until we have an updated Psych pushed for JRuby 9.1.17. It closed automatically (incorrectly) when I made my commit to Psych.

@headius
Copy link
Member

headius commented Apr 16, 2018

This is going to get pushed post-9.1.17 unless we can address the failures from updating Psych and its tests. I got both Psych test failures and YAML failures in the existing ruby specs.

@headius headius modified the milestones: JRuby 9.1.17.0, JRuby 9.2.0.0 Apr 16, 2018
@headius
Copy link
Member

headius commented Apr 16, 2018

@asomov Your comment about plain scalars...that may be the cause of some failures. I'm pushing a JRuby branch shortly for #5142 that should show all the new failures. Some appear to be related to scalars.

@headius
Copy link
Member

headius commented Apr 16, 2018

I have pushed branch psych_3_update which updates to Psych 3.0.3 that includes the above fixes. We'll need to address the failures before we commit this to a release.

@headius
Copy link
Member

headius commented Sep 19, 2018

#365 is merged and I believe everything is in place for a Psych release. Waiting to hear from @hsbt about that.

@hsbt
Copy link

hsbt commented Sep 27, 2018

I will release it in this weekend.

@headius headius closed this as completed Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants