-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RubyDateParser in Java #4635
Conversation
8551d09
to
745b70f
Compare
@muga The failures in Travis appear to be related to your changes. Can you look into them? To reproduce the MRI failure, run the associated file like this:
The failure in the "-Ptest" suite doesn't appear to be related. |
Sure, I will take a look later. |
I found a bug in our strptime implementation. Will fix it. |
Since cwyear, year, sec_fraction, seconds and _cent are variable-length integers, their data types need to be changed with |
I need to care of taint mark in objects. |
Don't sweat the tainting too much, but if we regressed we should at least try. I would be surprised if tainting and trusting survive into Ruby 3. |
… FormatBag with BigInteger
2a4e3fb
to
886deb6
Compare
I did |
this should now pass 💚 those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Can you add some documentation -- perhaps in the .flex file -- explaining how to generate the lexer?
@headius if this is documented for this or incorporated into the Maven part of the buid then we should also hook up:
I can tell at one point these were in our build (see evidence of jflex.bin but no references). Obviously we can manually generate these like our main parser but jflex is something I am positive maven can generate. |
@enebo I don't have any problem with generating the source every time, except that if we want these sources in javadoc or available for browsing online it would be preferable to regenerate them infrequently. I doubt anyone cares about the javadoc or source for these though. It would, however, be very nice if we at least had the option to generate them as a build target or tool script. |
@headius yeah I don't care if it is hidden behind a -P or generated every time. I guess we have two other lexers that use jflex so +1 to hooking it up in maven somewhere... |
Is there any benchmarking on this yet? |
Removed "[WIP]" from the title of this PR but, too late.. |
@enebo I did check the performance of this Java strptime by micro bench based on https://github.com/macournoyer/tinyrb. This Java implementation is basically faster than the current version (9.1.9.0) of strptime. But, it seems that it depends on the complexity of the format specified by an user. If you have any questions, please let me know.
The raw data:
|
The speed up is great. Do you know why MRI is so much faster though? Even with changing this to Java and we are an order of magnitude slower? |
@enebo Since MRI's
The raw data: https://gist.github.com/muga/7b61e9284841c222275f9636dd9b7808 |
In general, I very very rarely find C an order of a magnitude faster than Java. Usually they are comparable (notable exception is penalty of transcoding internally to/from UTF16-LE in Java which then can drop performance in 1/2 or a little more). Actually, I did not review this so I did not notice it uses java.lang.String for parsing/lexing. That is one 2+x speed issue so we can entertain using ByteList and a Stream over a Reader. I just realized you are opening up some more work on this but I will also look at some profiling and see if we can get some more wins out of this. Part of me anticipates being able to beat MRI for this...I am optimistic! :) |
Ah just a comment since I did not realize this last night...this does actually perform Ruby per call whereas MRI does do this all in C. So that will be some penalty as well. |
@enebo @kares Thank you for improving it. I built 65e1495 and measured the performance with my benchmark again.
|
This PR introduces
RubyDateParser
in Java to improve the performance for strptime. We discussed about this implementation on #4591.