-
-
Notifications
You must be signed in to change notification settings - Fork 924
Date parsing (still) noticeably slower than MRI #5255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@kares can you link or paste the benchmark so I don't need to write it? |
@enebo was planning on commit-ing, but you're fast thus here you go : require 'benchmark'
TIMES = 3 * 1_000_000
require 'date'
Benchmark.bmbm do |x|
# x.report("Date._strptime('2001-02-03', '%Y-%m-%d') [#{TIMES}x]") do
# TIMES.times do
# Date._strptime('2001-02-03', '%Y-%m-%d')
# end
# end
#
# x.report("Date._strptime('2017-05-20 18:20:10', '%Y-%m-%d %H:%M:%S') [#{TIMES}x]") do
# str = '2017-05-20 18:20:10'; fmt = '%Y-%m-%d %H:%M:%S'
# TIMES.times { Date._strptime(str, fmt) }
# end
x.report("Date.parse('2018-07-17', false) [#{TIMES}x]") do
str = '2018-07-17'
TIMES.times { Date.parse(str, false) }
end
x.report("Date._parse('2018-07-17 21:20:55') [#{TIMES}x]") do
str = '2018-07-17 21:20:55'
TIMES.times { Date._parse(str) }
end
x.report("Date.iso8601('1999-12-31T00:00:00') [#{TIMES}x]") do
str = '1999-12-31T00:00:00'
TIMES.times { Date.iso8601(str) }
end
x.report("DateTime.iso8601('1999-12-31T19:20:06') [#{TIMES}x]") do
str = '1999-12-31T19:20:06'
TIMES.times { DateTime.iso8601(str) }
end
end |
This is a goofy thing to play with but I was wondering how expensive the last iso8601 bench would perform if I eliminate two two array operations in complete_frags: g = COMPLETE_FRAGS[3]#COMPLETE_FRAGS.max_by { |_, fields| fields.count { |field| elem.key? field } }
c = 3#g[1].count { |field| elem.key? field } Obviously this is just to eliminate the cost and see what happens but wow! we dropped from 30s to about 17s. So this logic is a significant bottleneck. Obviously we could make it native and it would reduce the cost of the algorthim but I wonder if this complexity is really needed. The density of this function is pretty fascinating. It seems to order preference while also calculating whether other values need to be added. |
I hate to add that a big benefit MRI got out of doing native was not setting $~ and just using oni directly. We would also get a gain in at least moving the regexps out of Ruby since then most of these methods would not even construct a frame. |
Just also noticed date_core.c added a fast path for civil before calling this logic....So they are not calling all that list counting stuff for this last case at all. |
Ok added it. I think this is right...I inverted logic from MRI to call complete_frags instead of calling civil date method directly. It is possible they cut out more work that we could as well: diff --git a/lib/ruby/stdlib/date.rb b/lib/ruby/stdlib/date.rb
index b8571ef154..ceb34ec9b8 100644
--- a/lib/ruby/stdlib/date.rb
+++ b/lib/ruby/stdlib/date.rb
@@ -685,7 +685,11 @@ class DateTime < Date
raise ArgumentError, 'invalid date' unless elem
elem = rewrite_frags(elem)
- elem = complete_frags(elem)
+
+ # More work to do if not :civil
+ if elem[:jd] || elem[:yday] || !elem[:year] || !elem[:mon] || !elem[:mday]
+ elem = complete_frags(elem)
+ end
unless (jd = valid_date_frags?(elem, sg)) &&
(fr = valid_time_frags?(elem))
raise ArgumentError, 'invalid date' Before:
After:
So iso8601 last case doubled perf by bypassing that logic. I will land this if we pass specs. This particular location may not be the best but it is an improvement and as we learn more we may find a better way of doing this. |
Ah I misread this and this opt already exists for Time but cannot be used for DateTime but cext appears to do something different in DatetimeCase...Learning is fun |
Commit a2bc9a3 does similar optimization that was done to Date to DateTime. Commit message on what changed, but results: before:
after:
|
JRuby 9.2 moved most of date.rb into native, a lot of date/format.rb pieces are left.
these have been cleaned up with some parts moved into .java as well (in MRI its all C since 2.x)
still performance is worse than MRIs :
MRI 2.5 :
... since some pieces are 3-4x slower this might be worth investigating.
code for benchmark was chosen based on patterns/methods used in Rails
NOTE: haven't looked into those, but there's a lot of 'crazy' regexp matching, maybe MRI avoids some of those. as I did some C porting I noticed the format.rb parts are no longer an exact match to MRI's date.c
The text was updated successfully, but these errors were encountered: