Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

active_support inflector (4.1.0.beta1) raises ConverterNotFoundError in jruby #1472

Closed
jcoyne opened this issue Feb 2, 2014 · 3 comments
Closed

Comments

@jcoyne
Copy link

jcoyne commented Feb 2, 2014

Same as https://jira.codehaus.org/browse/JRUBY-7194

> Encoding::Converter.new(Encoding::UTF_8, Encoding::UTF_8_MAC)
Encoding::ConverterNotFoundError: code converter not found (UTF-8 to UTF8-MAC)
    from org/jruby/RubyConverter.java:162:in `initialize'
    from org/jruby/RubyConverter.java:135:in `initialize'

It was added in this commit:
rails/rails@738dbc0#commitcomment-5248924

jcoyne referenced this issue in rails/rails Feb 2, 2014
The previous implementation was quite slow. This leverages some of the
transcoding abilities built into Ruby 1.9 instead. It is roughly 96%
faster.

The roundtrip through UTF_8_MAC here is because ruby won't let you
transcode from UTF_8 to UTF_8. I chose the closest encoding I could
find as an intermediate.
@jcoyne
Copy link
Author

jcoyne commented Feb 2, 2014

It appears this transcoding is defined here: https://github.com/ruby/ruby/blob/trunk/enc/trans/utf8_mac-tbl.rb

@headius
Copy link
Member

headius commented Feb 20, 2014

In order to support UTF_8_MAC we'll need to port the whole transcoding subsystem. Currently we're using Java's Charset logic to transcode, and it does not support UTF_8_MAC.

My understanding of UTF_8_MAC is that it prefers to use combining characters rather than single codepoints, so UTF_8 to UTF_8_MAC and back is not likely to round-trip in all cases.

I would suggest that instead of this hack, Rails should use some version of the pure-Ruby String#scrub I implemented (and I think @yorickpeterse improved) from this issue: rubinius/rubinius#2912

Note that this version does not successfully handle all bad characters on JRuby due to incompatibilities in the Charset-based transcoding pipeline (#1459), but for strings with malformed input or no errors, it will work fine and not have the error above.

I will mark this as a bug for JRuby 9k, since by then we should have a proper port of MRI's transcoding logic.

/cc @burke

@headius headius added this to the JRuby 9000 milestone Feb 20, 2014
@headius
Copy link
Member

headius commented Nov 12, 2014

UTF-8 MAC encoding is now in 9k.

@headius headius closed this as completed Nov 12, 2014
@headius headius self-assigned this Nov 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants