Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in csv.rb automatically discover row separator may break unicode charactor #1386

Closed
joey-he8x opened this issue Jan 8, 2014 · 10 comments
Closed

Comments

@joey-he8x
Copy link

lib/ruby/1.9/csv.rb
line 2051: sample = @io.gets(nil, 1024)
may break a valid unicode character and raise a ArgumentError: invalid byte sequence in UTF-8 at line 2058.

that will happen when execute CSV.open

A probably work around is adding "rescue ArgumentError" at line 2078, at the tail of the rescues list.

@BanzaiMan
Copy link
Member

Is there an sample input to reproduce this error? Does MRI exhibit the same problem?

@joey-he8x
Copy link
Author

this @io.gets(nil, 1024) might split a valid 16bit unicode charactor, if that happens it will raise the exception

@BanzaiMan
Copy link
Member

If it does, is it not a problem with MRI, too?

@joey-he8x
Copy link
Author

I just tried it under ruby-2.1.0 and got no exception what failed under jruby-1.7.0/1.7.9. It seems RubyRegexp.java in jruby is a little less robust than MRI version.
Here is the full error stack:

jruby-1.7.9 :001 > require 'csv'
jruby-1.7.9 :002 > f=CSV.open 'test.csv','r'
ArgumentError: invalid byte sequence in UTF-8
    from org/jruby/RubyRegexp.java:1674:in `=~'
    from org/jruby/RubyString.java:1697:in `=~'
    from /usr/local/rvm/rubies/jruby-1.7.9/lib/ruby/1.9/csv.rb:2058:in `init_separators'
    from /usr/local/rvm/rubies/jruby-1.7.9/lib/ruby/1.9/csv.rb:1590:in `initialize'
    from /usr/local/rvm/rubies/jruby-1.7.9/lib/ruby/1.9/csv.rb:1349:in `open'
    from (irb):2:in `evaluate'
    from org/jruby/RubyKernel.java:1119:in `eval'
    from org/jruby/RubyKernel.java:1519:in `loop'
    from org/jruby/RubyKernel.java:1282:in `catch'
    from org/jruby/RubyKernel.java:1282:in `catch'
    from /usr/local/rvm/rubies/jruby-1.7.9/bin/irb:13:in `(root)'

@BanzaiMan
Copy link
Member

Ah, great! You have an example that shows this problem. Can you share this test.csv file?

@joey-he8x
Copy link
Author

test csv

need rename to test.csv

@BanzaiMan
Copy link
Member

@joey-he8x Thanks for the example.

Still a problem on master:

$ jruby -v -rcsv -e "f=CSV.open 'test.csv','r'"
jruby 9000.dev (2.1.0.dev) 2014-01-10 a3d98f8 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18 [darwin-x86_64]
ArgumentError: invalid byte sequence in UTF-8
               =~ at org/jruby/RubyRegexp.java:1442
               =~ at org/jruby/RubyString.java:1590
  init_separators at /Users/asari/Development/src/jruby/lib/ruby/2.1/csv.rb:1988
       initialize at /Users/asari/Development/src/jruby/lib/ruby/2.1/csv.rb:1513
             open at /Users/asari/Development/src/jruby/lib/ruby/2.1/csv.rb:1263
           (root) at -e:1

@headius
Copy link
Member

headius commented Nov 12, 2014

Sorry @joey-he8x your example seems to have disappeared, so it's hard to reproduce this. Can you try JRuby master/9k and/or provide the reproduction again?

@joey-he8x
Copy link
Author

I didn't backup the example, so I'm affraid it is not easy to provide it again or to verify its existence in master.
But the problem seems clear to me if your read the specific line in src that I pointed out in the post.

@enebo enebo added this to the Invalid or Duplicate milestone Feb 17, 2017
@enebo
Copy link
Member

enebo commented Feb 17, 2017

No data and not enough time to spelunk here. Hopefully it is actually fixed in 9k since we have not have any reported m17n bugs in quite a while and 1.7.x is reaching EOL.

@enebo enebo closed this as completed Feb 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants