Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding negotiation in chomp not matching 2.3 behavior #3692

Open
headius opened this issue Feb 22, 2016 · 1 comment
Open

Encoding negotiation in chomp not matching 2.3 behavior #3692

headius opened this issue Feb 22, 2016 · 1 comment

Comments

@headius
Copy link
Member

headius commented Feb 22, 2016

Environment

JRuby 9.1 HEAD

Expected Behavior

The following test should succeed (from MRI's test_m17n_comb.rb):

  def test_str_smart_chomp
    bug10893 = '[ruby-core:68258] [Bug #10893]'
    encodings = Encoding.list.select {|enc| !enc.dummy?}
    combination(encodings, encodings) do |e1, e2|
      expected = "abc".encode(e1)
      combination(["abc\n", "abc\r\n"], ["", "\n"]) do |str, rs|
        assert_equal(expected, str.encode(e1).chomp(rs.encode(e2)), bug10893)
      end
    end
  end

A fix for MRI was applied for https://bugs.ruby-lang.org/issues/10893 but that change did not appear to fix the failure for us.

Actual Behavior

Here's the failure I'll be excluding from test runs until we can sort this out:

[1/1] TestM17NComb#test_str_smart_chomp = 0.11 s
  1) Error:
TestM17NComb#test_str_smart_chomp:
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-16BE
    java/lang/Thread.java:1552:in `getStackTrace'
    org/jruby/runtime/backtrace/TraceType.java:215:in `getBacktraceData'
    org/jruby/runtime/backtrace/TraceType.java:47:in `getBacktrace'
    org/jruby/RubyException.java:232:in `prepareBacktrace'
    org/jruby/exceptions/RaiseException.java:229:in `preRaise'
    org/jruby/exceptions/RaiseException.java:196:in `preRaise'
    org/jruby/exceptions/RaiseException.java:111:in `<init>'
    org/jruby/Ruby.java:4116:in `newRaiseException'
    org/jruby/Ruby.java:4091:in `newEncodingCompatibilityError'
    org/jruby/RubyString.java:309:in `checkEncoding'
    org/jruby/RubyString.java:296:in `checkEncoding'
    org/jruby/RubyString.java:4290:in `chompBangCommon19'
    org/jruby/RubyString.java:4245:in `chomp_bang19'
    org/jruby/RubyString.java:4225:in `chomp19'
    org/jruby/runtime/callsite/CachingCallSite.java:161:in `call'
@headius
Copy link
Member Author

headius commented Feb 22, 2016

Our version of the patch from the related MRI issue, which did not fix the failure for us:

diff --git a/core/src/main/java/org/jruby/RubyString.java b/core/src/main/java/org/jruby/RubyString.java
index bce90ab..430eda3 100644
--- a/core/src/main/java/org/jruby/RubyString.java
+++ b/core/src/main/java/org/jruby/RubyString.java
@@ -4254,11 +4254,28 @@ public class RubyString extends RubyObject implements EncodingCapable, MarshalEn
         int end = p + len;
         byte[] bytes = value.getUnsafeBytes();

+        Encoding enc = getEncoding();
         int rslen = rs.value.getRealSize();
         if (rslen == 0) {
-            while (len > 0 && bytes[p + len - 1] == (byte)'\n') {
-                len--;
-                if (len > 0 && bytes[p + len - 1] == (byte)'\r') len--;
+            if (enc.minLength() > 1) {
+                while (end > p) {
+                    int pp = enc.leftAdjustCharHead(bytes, p, end - enc.minLength(), end);
+                    if (!enc.isNewLine(bytes, pp, end)) break;
+                    end = pp;
+                    pp -= enc.minLength();
+                    if (pp >= p) {
+                        pp = enc.leftAdjustCharHead(bytes, p, pp, end);
+                        if (EncodingUtils.encAscget(bytes, pp, end, null, enc) == '\r') {
+                            end = pp;
+                        }
+                    }
+                }
+                len = end - p;
+            } else {
+                while (len > 0 && bytes[p + len - 1] == (byte) '\n') {
+                    len--;
+                    if (len > 0 && bytes[p + len - 1] == (byte) '\r') len--;
+                }
             }
             if (len < value.getRealSize()) {
                 keepCodeRange();
@@ -4269,12 +4286,13 @@ public class RubyString extends RubyObject implements EncodingCapable, MarshalEn
         }

         if (rslen > len) return runtime.getNil();
-        byte newline = rs.value.getUnsafeBytes()[rslen - 1];
-        if (rslen == 1 && newline == (byte)'\n') return smartChopBangCommon19(runtime);

-        Encoding enc = checkEncoding(rs);
+        enc = checkEncoding(rs);
         if (rs.scanForCodeRange() == CR_BROKEN) return runtime.getNil();

+        byte newline = rs.value.getUnsafeBytes()[rslen - 1];
+        if (rslen == 1 && newline == (byte)'\n') return smartChopBangCommon19(runtime);
+
         int pp = end - rslen;
         if (bytes[p + len - 1] == newline && rslen <= 1 || value.endsWith(rs.value)) {
             if (enc.leftAdjustCharHead(bytes, p, pp, end) != pp) return runtime.getNil();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant