Skip to content

Commit

Permalink
Improve transcoding of Java String to bytes using joni.
Browse files Browse the repository at this point in the history
Previously, if the target encoding was not supported by the JDK,
we would be unable to encode the string and would just make it
UTF-8. Now if there's no JDK support we will fall back on joni
transcoding.

Part of #3877 work.
headius committed May 12, 2016
1 parent 3b03bdf commit d025e43
Showing 2 changed files with 42 additions and 2 deletions.
6 changes: 4 additions & 2 deletions core/src/main/java/org/jruby/RubyString.java
Original file line number Diff line number Diff line change
@@ -5463,8 +5463,10 @@ public static ByteList encodeBytelist(CharSequence value, Encoding encoding) {

Charset charset = encoding.getCharset();

// if null charset, fall back on Java default charset
if (charset == null) charset = Charset.defaultCharset();
// if null charset, let our transcoder handle it
if (charset == null) {
return EncodingUtils.transcodeString(value.toString(), encoding, 0);

This comment has been minimized.

Copy link
@ahorek

ahorek May 12, 2016

Contributor

@headius could you check?

jruby -S irb
ArgumentError: string contains null byte
            exist? at org/jruby/RubyFileTest.java:125
      default_path at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/defaults.rb:94
      default_path at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/path_support.rb:73
    split_gem_path at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/path_support.rb:65
        initialize at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/path_support.rb:32
             paths at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems.rb:356
              path at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems.rb:402
              dirs at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/defaults/jruby.rb:68
         stubs_for at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/specification.rb:850
    matching_specs at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/dependency.rb:279
          to_specs at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/dependency.rb:300
           to_spec at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/dependency.rb:320
               gem at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_gem.rb:65
           require at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:47
             <top> at C:/jruby_repo/jruby/lib/ruby/stdlib/readline.rb:3
           require at org/jruby/RubyKernel.java:944
            (root) at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1
  <module:require> at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
             <top> at C:/jruby_repo/jruby/lib/ruby/stdlib/irb/input-method.rb:130
           require at org/jruby/RubyKernel.java:944
            (root) at C:/jruby_repo/jruby/lib/ruby/stdlib/irb/input-method.rb:15
             <top> at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1
           require at org/jruby/RubyKernel.java:944
           require at C:/jruby_repo/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
             <top> at C:/jruby_repo/jruby/bin/jirb:10

This comment has been minimized.

Copy link
@headius

headius May 12, 2016

Author Member

Hmm, I don't get this. Run with -Xbacktrace.style=full and gist that somewhere.

This comment has been minimized.

Copy link
@ahorek

This comment has been minimized.

Copy link
@headius

headius May 12, 2016

Author Member

Can you print out the string that's getting passed to exist? there? Seems like we've got a string floating around with null bytes, and it could be due to my patch.

This comment has been minimized.

Copy link
@headius

headius May 12, 2016

Author Member

Here's a patch that might fix it:

diff --git a/core/src/main/java/org/jruby/util/io/EncodingUtils.java b/core/src/main/java/org/jruby/util/io/EncodingUtils.java
index 6f343cd..47ac11f 100644
--- a/core/src/main/java/org/jruby/util/io/EncodingUtils.java
+++ b/core/src/main/java/org/jruby/util/io/EncodingUtils.java
@@ -1338,6 +1338,8 @@ public class EncodingUtils {

             ec.close();

+            destination.setRealSize(outPos.p);
+
             return true;
         }
     }

This comment has been minimized.

Copy link
@ahorek

ahorek May 12, 2016

Contributor

it was my homepath C:/Users/Pavel Rosický (Windows-1250)
your patch helped, thanks

This comment has been minimized.

Copy link
@headius

headius May 12, 2016

Author Member

Managed to reproduce with your path and added an additional ruby/spec to check this. Thanks!

}

byte[] bytes;
if (charset == RubyEncoding.UTF8) {
38 changes: 38 additions & 0 deletions core/src/main/java/org/jruby/util/io/EncodingUtils.java
Original file line number Diff line number Diff line change
@@ -1259,6 +1259,44 @@ public static void transcodeLoop(ThreadContext context, byte[] inBytes, Ptr inPo
}
}

/**
* A version of transcodeLoop for working without any Ruby runtime available.
*
* MRI: transcode_loop with no fallback and java.lang.String input
*/
public static ByteList transcodeString(String string, Encoding toEncoding, int ecflags) {
Encoding encoding;

// This may be inefficient if we aren't matching endianness right
if (Platform.BYTE_ORDER == Platform.LITTLE_ENDIAN) {
encoding = UTF16LEEncoding.INSTANCE;
} else {
encoding = UTF16BEEncoding.INSTANCE;
}

EConv ec = TranscoderDB.open(encoding.getName(), toEncoding.getName(), ecflags);

byte[] inBytes = string.getBytes(encoding.getCharset());
Ptr inPos = new Ptr(0);

int inStop = inBytes.length;
// most encodings will be shorter than UTF-16 for typical input
int outStop = (int)((double) inBytes.length / 1.5 + 1);

byte[] outBytes = new byte[outStop];
Ptr outPos = new Ptr(0);

ByteList destination = new ByteList(outBytes, toEncoding, false);

boolean success = transcodeLoop(ec, null, null, null, inBytes, inPos, outBytes, outPos, inStop, outStop, destination, strTranscodingResize);

if (!success) {
// TODO: anything?
}

return destination;
}

/**
* Perform the inner transcoding loop.
*

0 comments on commit d025e43

Please sign in to comment.