-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv.rb uses postfix exceptions when converters are specified #1816
Comments
Link to the converters, where the postfix rescues live: https://github.com/jruby/jruby/blob/master/lib/ruby/2.1/csv.rb#L947 Link to the convert_fields method: https://github.com/jruby/jruby/blob/master/lib/ruby/2.1/csv.rb#L2165 Note that if working on this, JRuby master is currently a little unstable...better to use JRuby 1.7.x |
Admittedly, a JRuby peformance bug, but we also do not invoke lambdas the same speed as ordinary methods; So if anyone also wants to change converters to simple classes with a convert method it might be interesting to see the perf difference from that activity too (but postfix exception is the bigger target obviously). |
Copying @JEG2. Modifying the Integer converter to use to_i and a nil check (as opposed to the postfix rescue) is around 25% faster in MRI too. |
But >> Integer("bug")
ArgumentError: invalid value for Integer(): "bug"
from (irb):1:in `Integer'
from (irb):1
from /Users/jeg2/.rubies/ruby-2.1.2/bin/irb:11:in `<main>' |
@JEG2 I believe @tenderlove and I came up with a reasonable no-exception "maybe convert" logic for Psych, but I don't recall exactly where it was. I believe it involved some simple regexp matches to filter out strings we know won't parse.
|
So do we need to change this in CSV to make JRuby perform better? |
Looks like changing CSV makes MRI faster: diff --git a/lib/csv.rb b/lib/csv.rb
index 60b22e7..af9c918 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -945,7 +945,13 @@ class CSV
# can be nested with other combo fields.
#
Converters = { integer: lambda { |f|
- Integer(f.encode(ConverterEncoding)) rescue f
+ begin
+ num = f.encode(ConverterEncoding)
+ return f unless num =~ /^[-+]?\d+$/
+ Integer(num)
+ rescue
+ f
+ end
},
float: lambda { |f|
Float(f.encode(ConverterEncoding)) rescue f Before:
After:
Should make all rubies faster AFAICT. |
@JEG2 @tenderlove Yes, as mentioned in #1816 (comment) fixing it makes MRI faster too. About 25% for MRI, and about 100x faster for JRuby. Seems like a win. |
Ideally the other converters would not be so prone to throw-away exceptions too. |
I'll be honest and say that I'm less inclined to view this as a win.
However, if we need this, I guess we need it. I'm happy to apply a patch if someone wants to try and work out the equivalent code. |
Just in case I wasn't very clear there, I feel we're trading simpler, easier to maintain code for faster code with this change. I've never felt the pain of slow CSV conversions, so it's hard for me to justify doing that. I will though, if needed. |
For me, it's two wins: speed and easier to use debugging output. Check out the output with Before the patch:
After:
|
@tenderlove You win for best example of why rescue nil here is a problem. I always end up fixating on perf, but that output is completely unusable. |
@JEG2 The problem here is that Integer does something else we don't want: it raises an exception if it can't handle the input. Your use of If Integer returned false or nil when it can't parse, your logic would be simplest of all: Perhaps there's a feature request possible here, like a form of Integer that takes an option, or a more robust String#to_i. That's not in scope for me :-) |
As I said, I'll apply a patch. 😄 |
@JEG2 Great, thank you very much! I'll leave this open until we can import the modified version from MRI. |
Marking to fix for 1.7.23 (and 9.0.2.0). |
I am closing this. We did indirectly address performance issue itself on 9k already without applying the better error reporting logic which was mentioned above. I think the train for improving general performance on 1.7 should be spent on 9k at this point. It's dead... |
See code in csv.rb, in CSV#convert_fields and the lambda-based converters around line 947.
Simple benchmark to show the devastating effect:
Without converters, we perform about like MRI.
The text was updated successfully, but these errors were encountered: