Skip to content

Date parsing (still) noticeably slower than MRI #5255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kares opened this issue Jul 19, 2018 · 8 comments
Closed

Date parsing (still) noticeably slower than MRI #5255

kares opened this issue Jul 19, 2018 · 8 comments

Comments

@kares
Copy link
Member

kares commented Jul 19, 2018

JRuby 9.2 moved most of date.rb into native, a lot of date/format.rb pieces are left.

these have been cleaned up with some parts moved into .java as well (in MRI its all C since 2.x)

still performance is worse than MRIs :

Rehearsal ---------------------------------------------------------------------------------------------------------
Date._strptime('2001-02-03', '%Y-%m-%d') [3000000x]                     2.850000   0.130000   2.980000 (  1.585740)
Date._strptime('2017-05-20 18:20:10', '%Y-%m-%d %H:%M:%S') [3000000x]   2.520000   0.010000   2.530000 (  1.888403)
Date.parse('2018-07-17', false) [3000000x]                             60.960000   0.080000  61.040000 ( 57.224618)
Date._parse('2018-07-17 21:20:55') [3000000x]                          76.980000   0.110000  77.090000 ( 73.975601)
Date.iso8601('1999-12-31T00:00:00') [3000000x]                         12.860000   0.040000  12.900000 ( 11.075091)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]                     39.950000   0.080000  40.030000 ( 34.704867)
---------------------------------------------------------------------------------------------- total: 196.570000sec

                                                                            user     system      total        real
Date._strptime('2001-02-03', '%Y-%m-%d') [3000000x]                     1.290000   0.000000   1.290000 (  1.220147)
Date._strptime('2017-05-20 18:20:10', '%Y-%m-%d %H:%M:%S') [3000000x]   1.780000   0.000000   1.780000 (  1.689286)
Date.parse('2018-07-17', false) [3000000x]                             56.580000   0.060000  56.640000 ( 55.751624)
Date._parse('2018-07-17 21:20:55') [3000000x]                          74.200000   0.190000  74.390000 ( 72.432050)
Date.iso8601('1999-12-31T00:00:00') [3000000x]                         10.870000   0.020000  10.890000 ( 10.519134)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]                     34.040000   0.060000  34.100000 ( 33.126543)

MRI 2.5 :

                                                                            user     system      total        real
Date._strptime('2001-02-03', '%Y-%m-%d') [3000000x]                     2.099370   0.000000   2.099370 (  2.099385)
Date._strptime('2017-05-20 18:20:10', '%Y-%m-%d %H:%M:%S') [3000000x]   3.028066   0.000000   3.028066 (  3.028100)
Date.parse('2018-07-17', false) [3000000x]                             12.222756   0.000000  12.222756 ( 12.222796)
Date._parse('2018-07-17 21:20:55') [3000000x]                          29.789987   0.000000  29.789987 ( 29.790399)
Date.iso8601('1999-12-31T00:00:00') [3000000x]                          9.213094   0.000000   9.213094 (  9.213815)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]                      9.899020   0.000000   9.899020 (  9.899302)

... since some pieces are 3-4x slower this might be worth investigating.
code for benchmark was chosen based on patterns/methods used in Rails

NOTE: haven't looked into those, but there's a lot of 'crazy' regexp matching, maybe MRI avoids some of those. as I did some C porting I noticed the format.rb parts are no longer an exact match to MRI's date.c

@enebo
Copy link
Member

enebo commented Jul 19, 2018

@kares can you link or paste the benchmark so I don't need to write it?

@kares
Copy link
Member Author

kares commented Jul 19, 2018

@enebo was planning on commit-ing, but you're fast thus here you go :

require 'benchmark'

TIMES = 3 * 1_000_000

require 'date'

Benchmark.bmbm do |x|

  # x.report("Date._strptime('2001-02-03', '%Y-%m-%d') [#{TIMES}x]") do
  #   TIMES.times do
  #     Date._strptime('2001-02-03', '%Y-%m-%d')
  #   end
  # end
  #
  # x.report("Date._strptime('2017-05-20 18:20:10', '%Y-%m-%d %H:%M:%S') [#{TIMES}x]") do
  #   str = '2017-05-20 18:20:10'; fmt = '%Y-%m-%d %H:%M:%S'
  #   TIMES.times { Date._strptime(str, fmt) }
  # end

  x.report("Date.parse('2018-07-17', false) [#{TIMES}x]") do
    str = '2018-07-17'
    TIMES.times { Date.parse(str, false) }
  end

  x.report("Date._parse('2018-07-17 21:20:55') [#{TIMES}x]") do
    str = '2018-07-17 21:20:55'
    TIMES.times { Date._parse(str) }
  end

  x.report("Date.iso8601('1999-12-31T00:00:00') [#{TIMES}x]") do
    str = '1999-12-31T00:00:00'
    TIMES.times { Date.iso8601(str) }
  end

  x.report("DateTime.iso8601('1999-12-31T19:20:06') [#{TIMES}x]") do
    str = '1999-12-31T19:20:06'
    TIMES.times { DateTime.iso8601(str) }
  end

end

@enebo
Copy link
Member

enebo commented Jul 19, 2018

This is a goofy thing to play with but I was wondering how expensive the last iso8601 bench would perform if I eliminate two two array operations in complete_frags:

    g = COMPLETE_FRAGS[3]#COMPLETE_FRAGS.max_by { |_, fields| fields.count { |field| elem.key? field } }
    c = 3#g[1].count { |field| elem.key? field }

Obviously this is just to eliminate the cost and see what happens but wow! we dropped from 30s to about 17s. So this logic is a significant bottleneck. Obviously we could make it native and it would reduce the cost of the algorthim but I wonder if this complexity is really needed. The density of this function is pretty fascinating. It seems to order preference while also calculating whether other values need to be added.

@enebo
Copy link
Member

enebo commented Jul 19, 2018

I hate to add that a big benefit MRI got out of doing native was not setting $~ and just using oni directly. We would also get a gain in at least moving the regexps out of Ruby since then most of these methods would not even construct a frame.

@enebo
Copy link
Member

enebo commented Jul 19, 2018

Just also noticed date_core.c added a fast path for civil before calling this logic....So they are not calling all that list counting stuff for this last case at all.

@enebo
Copy link
Member

enebo commented Jul 19, 2018

Ok added it. I think this is right...I inverted logic from MRI to call complete_frags instead of calling civil date method directly. It is possible they cut out more work that we could as well:

diff --git a/lib/ruby/stdlib/date.rb b/lib/ruby/stdlib/date.rb
index b8571ef154..ceb34ec9b8 100644
--- a/lib/ruby/stdlib/date.rb
+++ b/lib/ruby/stdlib/date.rb
@@ -685,7 +685,11 @@ class DateTime < Date
     raise ArgumentError, 'invalid date' unless elem
     
     elem = rewrite_frags(elem)
-    elem = complete_frags(elem)
+
+    # More work to do if not :civil
+    if elem[:jd] || elem[:yday] || !elem[:year] || !elem[:mon] || !elem[:mday]
+      elem = complete_frags(elem)
+    end
     unless (jd = valid_date_frags?(elem, sg)) &&
            (fr = valid_time_frags?(elem))
       raise ArgumentError, 'invalid date'

Before:

Rehearsal --------------------------------------------------------------------------------------
Date.parse('2018-07-17', false) [3000000x]          52.130000   0.080000  52.210000 ( 47.959873)
Date._parse('2018-07-17 21:20:55') [3000000x]       63.590000   0.050000  63.640000 ( 61.122145)
Date.iso8601('1999-12-31T00:00:00') [3000000x]       9.350000   0.010000   9.360000 (  7.877006)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]  37.080000   0.070000  37.150000 ( 32.807544)
--------------------------------------------------------------------------- total: 162.360000sec

                                                         user     system      total        real
Date.parse('2018-07-17', false) [3000000x]          47.420000   0.020000  47.440000 ( 46.808937)
Date._parse('2018-07-17 21:20:55') [3000000x]       61.770000   0.020000  61.790000 ( 60.806602)
Date.iso8601('1999-12-31T00:00:00') [3000000x]       7.910000   0.020000   7.930000 (  7.597194)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]  33.520000   0.020000  33.540000 ( 32.346390)

After:

Rehearsal --------------------------------------------------------------------------------------
Date.parse('2018-07-17', false) [3000000x]          51.910000   0.080000  51.990000 ( 48.323558)
Date._parse('2018-07-17 21:20:55') [3000000x]       63.960000   0.040000  64.000000 ( 61.711276)
Date.iso8601('1999-12-31T00:00:00') [3000000x]       9.320000   0.020000   9.340000 (  8.011551)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]  19.050000   0.040000  19.090000 ( 16.621278)
--------------------------------------------------------------------------- total: 144.420000sec

                                                         user     system      total        real
Date.parse('2018-07-17', false) [3000000x]          48.270000   0.030000  48.300000 ( 47.561174)
Date._parse('2018-07-17 21:20:55') [3000000x]       62.900000   0.020000  62.920000 ( 61.811325)
Date.iso8601('1999-12-31T00:00:00') [3000000x]       8.060000   0.010000   8.070000 (  7.802881)
DateTime.iso8601('1999-12-31T19:20:06') [3000000x]  16.990000   0.040000  17.030000 ( 16.340034)

So iso8601 last case doubled perf by bypassing that logic. I will land this if we pass specs. This particular location may not be the best but it is an improvement and as we learn more we may find a better way of doing this.

@enebo
Copy link
Member

enebo commented Jul 19, 2018

Ah I misread this and this opt already exists for Time but cannot be used for DateTime but cext appears to do something different in DatetimeCase...Learning is fun

@enebo
Copy link
Member

enebo commented Jul 19, 2018

Commit a2bc9a3 does similar optimization that was done to Date to DateTime. Commit message on what changed, but results:

before:

DateTime.iso8601('1999-12-31T19:20:06') [3000000x]  23.470000   0.040000  23.510000 ( 23.257737)

after:

DateTime.iso8601('1999-12-31T19:20:06') [3000000x]  14.640000   0.050000  14.690000 ( 14.543591)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants