-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in csv.rb automatically discover row separator may break unicode charactor #1386
Comments
Is there an sample input to reproduce this error? Does MRI exhibit the same problem? |
this @io.gets(nil, 1024) might split a valid 16bit unicode charactor, if that happens it will raise the exception |
If it does, is it not a problem with MRI, too? |
I just tried it under ruby-2.1.0 and got no exception what failed under jruby-1.7.0/1.7.9. It seems RubyRegexp.java in jruby is a little less robust than MRI version.
|
Ah, great! You have an example that shows this problem. Can you share this |
@joey-he8x Thanks for the example. Still a problem on master:
|
Sorry @joey-he8x your example seems to have disappeared, so it's hard to reproduce this. Can you try JRuby master/9k and/or provide the reproduction again? |
I didn't backup the example, so I'm affraid it is not easy to provide it again or to verify its existence in master. |
No data and not enough time to spelunk here. Hopefully it is actually fixed in 9k since we have not have any reported m17n bugs in quite a while and 1.7.x is reaching EOL. |
lib/ruby/1.9/csv.rb
line 2051: sample = @io.gets(nil, 1024)
may break a valid unicode character and raise a
ArgumentError: invalid byte sequence in UTF-8 at line 2058
.that will happen when execute CSV.open
A probably work around is adding "rescue ArgumentError" at line 2078, at the tail of the rescues list.
The text was updated successfully, but these errors were encountered: