You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jruby 1.7.16.1 (1.9.3p392) 2014-10-28 4e93f31 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b18 +jit [Windows 8.1-amd64]
Original: < p >ěščřžýáíé< /p >
Encoding: UTF-8
Parsed: < p >?š??žýáíé< /p > ?????????
Encoding: UTF-8
from nokogiri 1.6.3.1 /lib/nokogiri/html/document_fragment.rb
moduleNokogirimoduleHTMLclassDocumentFragment < Nokogiri::XML::DocumentFragmentattr_accessor:errors##### Create a Nokogiri::XML::DocumentFragment from +tags+, using +encoding+defself.parsetags,encoding=nil##################################### tags.encoding => #<Encoding:UTF-8>####################################doc=HTML::Document.newencoding ||= tags.respond_to?(:encoding) ? tags.encoding.name : 'UTF-8'doc.encoding=encodingnew(doc,tags)enddefinitializedocument,tags=nil,ctx=nil############################################ tags.encoding => #<Encoding:Windows-1252> ????????###########################################returnselfunlesstagsifctxpreexisting_errors=document.errors.dupnode_set=ctx.parse("<div>#{tags}</div>")node_set.first.children.each{ |child| child.parent=self}unlessnode_set.empty?self.errors=document.errors - preexisting_errorselse# This is a horrible hack, but I don't careiftags.strip =~ /^<body/ipath="/html/body"elsepath="/html/body/node()"endtemp_doc=HTML::Document.parse"<html><body>#{tags}",nil,document.encodingtemp_doc.xpath(path).each{ |child| child.parent=self}self.errors=temp_doc.errorsendchildrenendendendend
The text was updated successfully, but these errors were encountered:
Tested on linux (ubuntu) and the parsed string was properly encoded, so it's a windows platform or some sort of java locale problem.
I tried to freeze original string first
...
str = f.read
+ str.freeze
...
this results no effect on MRI (string was unchanged), but jruby raises an error
chomp! at org/jruby/RubyString.java:5775
new at nokogiri/XmlDocumentFragment.java:93
parse at C:/jruby-1.7.16/lib/ruby/gems/shared/gems/nokogiri-1.6.5-java/lib/no
kogiri/html/document_fragment.rb:14
(root) at tt.rb:13
open at org/jruby/RubyIO.java:1181
(root) at tt.rb:7
How to reproduce this:
Results:
ruby 2.1.3p242 (2014-09-19 revision 47630) [i386-mingw32]
Original: < p >ěščřžýáíé< /p >
Encoding: UTF-8
Parsed: < p >ěščřžýáíé< /p >
Encoding: UTF-8
jruby 1.7.16.1 (1.9.3p392) 2014-10-28 4e93f31 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b18 +jit [Windows 8.1-amd64]
Original: < p >ěščřžýáíé< /p >
Encoding: UTF-8
Parsed: < p >?š??žýáíé< /p > ?????????
Encoding: UTF-8
from nokogiri 1.6.3.1 /lib/nokogiri/html/document_fragment.rb
The text was updated successfully, but these errors were encountered: