Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO::Read is changing encoding from ASCII-8BIT(binary) to UTF-8 #4986

Closed
nightsurge opened this issue Jan 17, 2018 · 2 comments
Closed

IO::Read is changing encoding from ASCII-8BIT(binary) to UTF-8 #4986

nightsurge opened this issue Jan 17, 2018 · 2 comments

Comments

@nightsurge
Copy link

nightsurge commented Jan 17, 2018

Environment

Provide at least:

  • JRuby version 9.1.15.0
  • JRUBY_OPTS ='-J-Xmx2048m -J-Xmn512m --server'
  • MacOS 10.12.6

Other relevant info you may wish to add:

  • HTTP.rb (http gem v 3.0.0)
  • Rails 4.1

Expected Behavior

  • I expect that when I am reading from a file that the encoding should be binary/ASCII-8BIT
require "stringio"

# Provides IO interface
class IOTest
  # IO object
  def initialize(io)
    @buffer = String.new
    puts "@buffer: #{@buffer.encoding} in initialize\n\n"
    @io = if io.is_a?(String)
            StringIO.new(io)
          elsif io.respond_to?(:read)
            io
          else
            raise ArgumentError,
              "#{io.inspect} is neither a String nor an IO object"
          end
  end

  # @param [Integer] length Number of bytes to retrieve
  # @param [String] outbuf String to be replaced with retrieved data
  #
  # @return [String, nil]
  def read(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end

  def read_force_outbuf(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear
    outbuf.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end

  def read_force_both(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear
    outbuf.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)
    outbuf << @buffer.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end
end

the_test = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read without force_encoding"
the_test.read
puts "**************** END read without force_encoding\n\n"


the_test_2 = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read with outbuf force_encoding"
the_test_2.read_force_outbuf
puts "**************** END read with outbuf force_encoding\n\n"

the_test_3 = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read with both outbuf/@buffer force_encoding"
the_test_3.read_force_both
puts "**************** END read with both outbuf/@buffer force_encoding"

Actual Behavior

  • reading IO from a file converts buffer to UTF-8 and breaks further downstream file upload processes as a result
@buffer: ASCII-8BIT in initialize

**************** read without force_encoding
Outbuf: US-ASCII setup
@buffer: ASCII-8BIT setup
Outbuf: US-ASCII after read
@buffer: UTF-8 after read
Outbuf: UTF-8 after outbuf append
@buffer: UTF-8 after outbuf append
**************** END read without force_encoding

@buffer: ASCII-8BIT in initialize

**************** read with outbuf force_encoding
Outbuf: ASCII-8BIT setup
@buffer: ASCII-8BIT setup
Outbuf: ASCII-8BIT after read
@buffer: UTF-8 after read
Outbuf: UTF-8 after outbuf append
@buffer: UTF-8 after outbuf append
**************** END read with outbuf force_encoding

@buffer: ASCII-8BIT in initialize

**************** read with both outbuf/@buffer force_encoding
Outbuf: ASCII-8BIT setup
@buffer: ASCII-8BIT setup
Outbuf: ASCII-8BIT after read
@buffer: ASCII-8BIT after read
Outbuf: ASCII-8BIT after outbuf append
@buffer: ASCII-8BIT after outbuf append
**************** END read with both outbuf/@buffer force_encoding
@nightsurge nightsurge changed the title IO::Read is changing encoding from ASCII-8BIT to UTF-8 IO::Read is changing encoding from ASCII-8BIT(binary) to UTF-8 Jan 17, 2018
@enebo enebo added this to the Invalid or Duplicate milestone Jan 18, 2018
@enebo
Copy link
Member

enebo commented Jan 18, 2018

@nightsurge this may be a bug (I did not really study your code very much) but this particular behavior matches MRIs and we generally follow MRI bug-for-bug. Your script did point out a bug in JRuby for sure though #4990.

My suggestion is that you re-open this issue in https://bugs.ruby-lang.org/ and see what they say. IO has a ton of coverage and there might be a specific reason it behave the way it does. M17n is a complicated beast in Ruby :)

I am closing if they say this is a bug we can re-open and target for whichever versions of Ruby they decide to fix it in.

@enebo enebo closed this as completed Jan 18, 2018
@enebo
Copy link
Member

enebo commented Jan 18, 2018

Also I just changed the 'break' to 'return' to make the script run on MRI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants