Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Store token list of strptime format string in a cache so we do not co…
…ntinually re-parse the same strings. For this very very simple single format string removing this parsing ended up speeding us up by 8x. More analysis below: BEFORE: ```text jruby ../snippets/date_bench.rb Warming up -------------------------------------- _strptime 29.320k i/100ms Calculating ------------------------------------- _strptime 353.076k (± 2.9%) i/s - 3.548M in 10.057641s ``` AFTER: ```text system ~/work/jruby master * 1571% jruby ../snippets/date_bench.rb Warming up -------------------------------------- _strptime 124.491k i/100ms Calculating ------------------------------------- _strptime 2.696M (± 5.3%) i/s - 26.890M in 10.007182s ``` Removing the cost of lexing the format string is the entirety of the speed up. I moved the lexer from being a field to a local so we did not have to keep re-initializing the object (surprisingly expensive in it's own right). After spending some time with this I think JFlex is a bit heavy for this parsing. A stateless (e.g. static recursive descent parser passing all values as params) lexer/parser would have probably made the cost of lexing lower (which also includes converting from String to ByteList as well). With that said... Having a format cache eliminates needing the lexer to be fast. I doubt the cache will ever grow large and if it does we can just make it bounded by LRU. At this point I think the performance is ok. Follow up work items if anyone wants more would be: 1. Use ByteList and not String 2. Eliminate Bag and just use RubyHash directly 3. Add dyn dispatch for all math (this is not performance but a disparity with MRI itself). I did try to use bytelist last week for fun and saw no jump in perf but it was completely obscured by parsing the format string. It may give us another perf bump if we try it.