-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Extend XML::Reader with more LibXML methods (#5740)
* Extend XML::Reader with more LibXML methods This adds a couple of method bindings that come in handy when doing pull parsing or hybrid parsing (search with pull then expand node). * Work around incomplete xmlTextReaderNextSibling() The current implementation of xmlTextReaderNextSibling() only works on preparsed documents, so we need to detect the error returned if the reader is not using a preparsed document and implement our own next sibling by looking at reader internals. * Fix XML::Reader#name/#value when not on node This avoids segfaults when those methods are called before the first or after the last read. * Add XML::Type::NONE for XML::Reader#node_type This fixes a problem where XML::Reader#node_type would return zero before the first or after the last read, which previously had no mapping in the XML::Type enum, so the value couldn't be checked. * Document all XML::Reader methods * Add specs for all XML::Reader methods * Avoid Nil for XML::Reader string getters instead return an empty string if the methods are called in an invalid reader state (before the first or after the last read). A special case is the #value method, which could also return nil if called on a node without a text value, like `<tag>`, but here an empty string also makes sense. * Use explicit type for XML:Reader attribute methods * Rename XML::Reader#attribute to #[]/#[]? and implement behavior similar to XML::Node.
- 1.15.1
- 1.15.0
- 1.14.1
- 1.14.0
- 1.13.3
- 1.13.2
- 1.13.1
- 1.13.0
- 1.12.2
- 1.12.1
- 1.12.0
- 1.11.2
- 1.11.1
- 1.11.0
- 1.10.1
- 1.10.0
- 1.9.2
- 1.9.1
- 1.9.0
- 1.8.2
- 1.8.1
- 1.8.0
- 1.7.3
- 1.7.2
- 1.7.1
- 1.7.0
- 1.6.2
- 1.6.1
- 1.6.0
- 1.5.1
- 1.5.0
- 1.4.1
- 1.4.0
- 1.3.2
- 1.3.1
- 1.3.0
- 1.2.2
- 1.2.1
- 1.2.0
- 1.1.1
- 1.1.0
- 1.0.0
- 0.36.1
- 0.36.0
- 0.35.1
- 0.35.0
- 0.34.0
- 0.33.0
- 0.32.1
- 0.32.0
- 0.31.1
- 0.31.0
- 0.30.1
- 0.30.0
- 0.29.0
- 0.28.0
- 0.27.2
- 0.27.1
- 0.27.0
- 0.26.1
- 0.26.0
1 parent
89b3867
commit 3696bb1
Showing
4 changed files
with
551 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,453 @@ | ||
require "spec" | ||
require "xml" | ||
|
||
private def xml | ||
<<-XML | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<people> | ||
<person id="1"> | ||
<name>John</name> | ||
</person> | ||
<person id="2"> | ||
<name>Peter</name> | ||
</person> | ||
</people> | ||
XML | ||
end | ||
|
||
module XML | ||
describe Reader do | ||
describe ".new" do | ||
it "can be initialized from a string" do | ||
reader = Reader.new(xml) | ||
reader.should be_a(XML::Reader) | ||
reader.read.should be_true | ||
reader.name.should eq("people") | ||
end | ||
|
||
it "can be initialize from an io" do | ||
io = IO::Memory.new(xml) | ||
reader = Reader.new(io) | ||
reader.should be_a(XML::Reader) | ||
reader.read.should be_true | ||
reader.name.should eq("people") | ||
end | ||
end | ||
|
||
describe "#read" do | ||
it "reads all nodes" do | ||
reader = Reader.new(xml) | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("people") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("person") | ||
reader["id"].should eq("1") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("name") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::TEXT_NODE) | ||
reader.name.should eq("#text") | ||
reader.value.should eq("John") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("name") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("person") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("person") | ||
reader["id"].should eq("2") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("name") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::TEXT_NODE) | ||
reader.name.should eq("#text") | ||
reader.value.should eq("Peter") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("name") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("person") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.read.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("people") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#next" do | ||
it "reads next node in doc order, skipping subtrees" do | ||
reader = Reader.new(xml) | ||
while reader.read | ||
break if reader.depth == 2 | ||
end | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("name") | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("person") | ||
reader["id"].should eq("1") | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("person") | ||
reader["id"].should eq("2") | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.next.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("people") | ||
reader.next.should be_false | ||
end | ||
end | ||
|
||
describe "#next_sibling" do | ||
it "reads next sibling node in doc order, skipping subtrees" do | ||
reader = Reader.new(xml) | ||
while reader.read | ||
break if reader.depth == 1 | ||
end | ||
reader.next_sibling.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("person") | ||
reader["id"].should eq("1") | ||
reader.next_sibling.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.next_sibling.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("person") | ||
reader["id"].should eq("2") | ||
reader.next_sibling.should be_true | ||
reader.node_type.should eq(XML::Type::DTD_NODE) | ||
reader.name.should eq("#text") | ||
reader.next_sibling.should be_false | ||
end | ||
end | ||
|
||
describe "#node_type" do | ||
it "returns the node type" do | ||
reader = Reader.new("<root/>") | ||
reader.node_type.should eq(XML::Type::NONE) | ||
reader.read | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
end | ||
end | ||
|
||
describe "#name" do | ||
it "reads node name" do | ||
reader = Reader.new("<root/>") | ||
reader.name.should eq("") | ||
reader.read | ||
reader.name.should eq("root") | ||
end | ||
end | ||
|
||
describe "#empty_element?" do | ||
it "checks if the node is empty" do | ||
reader = Reader.new("<root/>") | ||
reader.empty_element?.should be_false | ||
reader.read | ||
reader.empty_element?.should be_true | ||
reader = Reader.new("<root></root>") | ||
reader.read | ||
reader.empty_element?.should be_false | ||
end | ||
end | ||
|
||
describe "#has_attributes?" do | ||
it "checks if the node has attributes" do | ||
reader = Reader.new(%{<root id="1"><child/></root>}) | ||
reader.has_attributes?.should be_false | ||
reader.read # <root id="1"> | ||
reader.has_attributes?.should be_true | ||
reader.read # <child/> | ||
reader.has_attributes?.should be_false | ||
reader.read # </root> | ||
reader.has_attributes?.should be_true | ||
end | ||
end | ||
|
||
describe "#attributes_count" do | ||
it "returns the node's number of attributes" do | ||
reader = Reader.new(%{<root id="1"><child/></root>}) | ||
reader.attributes_count.should eq(0) | ||
reader.read # <root id="1"> | ||
reader.attributes_count.should eq(1) | ||
reader.read # <child/> | ||
reader.attributes_count.should eq(0) | ||
reader.read # </root> | ||
# This is weird, since has_attributes? will be true. | ||
reader.attributes_count.should eq(0) | ||
end | ||
end | ||
|
||
describe "#move_to_first_attribute" do | ||
it "moves to the first attribute of the node" do | ||
reader = Reader.new(%{<root id="1"><child/></root>}) | ||
reader.move_to_first_attribute.should be_false | ||
reader.read # <root id="1"> | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.move_to_first_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.value.should eq("1") | ||
reader.read # <child/> | ||
reader.move_to_first_attribute.should be_false | ||
reader.read # </root> | ||
reader.move_to_first_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.value.should eq("1") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#move_to_next_attribute" do | ||
it "moves to the next attribute of the node" do | ||
reader = Reader.new(%{<root id="1" id2="2"><child/></root>}) | ||
reader.move_to_next_attribute.should be_false | ||
reader.read # <root id="1" id2="2"> | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.move_to_next_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.value.should eq("1") | ||
reader.move_to_next_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id2") | ||
reader.value.should eq("2") | ||
reader.move_to_next_attribute.should be_false | ||
reader.read # <child/> | ||
reader.move_to_next_attribute.should be_false | ||
reader.read # </root> | ||
reader.move_to_next_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.value.should eq("1") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#move_to_attribute" do | ||
it "moves to attribute with the specified name" do | ||
reader = Reader.new(%{<root id="1" id2="2"><child/></root>}) | ||
reader.move_to_attribute("id2").should be_false | ||
reader.read # <root id="1" id2="2"> | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.move_to_attribute("id2").should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id2") | ||
reader.value.should eq("2") | ||
reader.move_to_attribute("id").should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.value.should eq("1") | ||
reader.move_to_attribute("bogus").should be_false | ||
reader.read # <child/> | ||
reader.move_to_attribute("id2").should be_false | ||
reader.read # </root> | ||
reader.move_to_attribute("id2").should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id2") | ||
reader.value.should eq("2") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#[]" do | ||
it "reads node attributes" do | ||
reader = Reader.new("<root/>") | ||
expect_raises(KeyError) { reader["id"] } | ||
reader.read | ||
expect_raises(KeyError) { reader["id"] } | ||
reader = Reader.new(%{<root id="1"/>}) | ||
reader.read | ||
reader["id"].should eq("1") | ||
reader = Reader.new(%{<root id="1"><child/></root>}) | ||
reader.read # <root id="1"> | ||
reader["id"].should eq("1") | ||
reader.read # <child/> | ||
expect_raises(KeyError) { reader["id"] } | ||
reader.read # </root> | ||
reader["id"].should eq("1") | ||
end | ||
end | ||
|
||
describe "#[]?" do | ||
it "reads node attributes" do | ||
reader = Reader.new("<root/>") | ||
reader["id"]?.should be_nil | ||
reader.read | ||
reader["id"]?.should be_nil | ||
reader = Reader.new(%{<root id="1"/>}) | ||
reader.read | ||
reader["id"]?.should eq("1") | ||
reader = Reader.new(%{<root id="1"><child/></root>}) | ||
reader.read # <root id="1"> | ||
reader["id"]?.should eq("1") | ||
reader.read # <child/> | ||
reader["id"]?.should be_nil | ||
reader.read # </root> | ||
reader["id"]?.should eq("1") | ||
end | ||
end | ||
|
||
describe "#move_to_element" do | ||
it "moves to the element node that contains the current attribute node" do | ||
reader = Reader.new(%{<root id="1"></root>}) | ||
reader.move_to_element.should be_false | ||
reader.read # <root id="1"> | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("root") | ||
reader.move_to_element.should be_false | ||
reader.move_to_first_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.move_to_element.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_NODE) | ||
reader.name.should eq("root") | ||
reader.read # </root> | ||
reader.move_to_element.should be_false | ||
reader.move_to_first_attribute.should be_true | ||
reader.node_type.should eq(XML::Type::ATTRIBUTE_NODE) | ||
reader.name.should eq("id") | ||
reader.move_to_element.should be_true | ||
reader.node_type.should eq(XML::Type::ELEMENT_DECL) | ||
reader.name.should eq("root") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#depth" do | ||
it "returns the depth of the node" do | ||
reader = Reader.new("<root><child/></root>") | ||
reader.depth.should eq(0) | ||
reader.read # <root> | ||
reader.depth.should eq(0) | ||
reader.read # <child/> | ||
reader.depth.should eq(1) | ||
reader.read # </root> | ||
reader.depth.should eq(0) | ||
end | ||
end | ||
|
||
describe "#read_inner_xml" do | ||
it "reads the contents of the node including child nodes and markup" do | ||
reader = Reader.new("<root>\n<child/>\n</root>\n") | ||
reader.read_inner_xml.should eq("") | ||
reader.read # <root> | ||
reader.read_inner_xml.should eq("\n<child/>\n") | ||
reader.read # \n | ||
reader.read_inner_xml.should eq("") | ||
reader.read # <child/> | ||
reader.read_inner_xml.should eq("") | ||
reader.read # \n | ||
reader.read_inner_xml.should eq("") | ||
reader.read # </root> | ||
reader.read_inner_xml.should eq("") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#read_outer_xml" do | ||
it "reads the xml of the node including child nodes and markup" do | ||
reader = Reader.new("<root>\n<child/>\n</root>\n") | ||
reader.read_outer_xml.should eq("") | ||
reader.read # <root> | ||
reader.read_outer_xml.should eq("<root>\n<child/>\n</root>") | ||
reader.read # \n | ||
reader.read_outer_xml.should eq("\n") | ||
reader.read # <child/> | ||
reader.read_outer_xml.should eq("<child/>") | ||
reader.read # \n | ||
reader.read_outer_xml.should eq("\n") | ||
reader.read # </root> | ||
# Note that the closing element is transformed into a self-closing one. | ||
reader.read_outer_xml.should eq("<root/>") | ||
reader.read.should be_false | ||
end | ||
end | ||
|
||
describe "#expand" do | ||
it "parses the content of the node and subtree" do | ||
reader = Reader.new(%{<root id="1"><child/></root>}) | ||
reader.expand.should be_nil | ||
reader.read # <root id="1"> | ||
node = reader.expand | ||
node.should be_a(XML::Node) | ||
node.not_nil!.attributes["id"].content.should eq("1") | ||
node.not_nil!.xpath_node("child").should be_a(XML::Node) | ||
end | ||
|
||
it "is only available until the next read" do | ||
reader = Reader.new(%{<root><child><subchild/></child></root>}) | ||
reader.read # <root> | ||
reader.read # <child> | ||
node = reader.expand | ||
node.should be_a(XML::Node) | ||
node.not_nil!.xpath_node("subchild").should be_a(XML::Node) | ||
reader.read # <subchild/> | ||
reader.read # </child> | ||
node.not_nil!.xpath_node("subchild").should be_nil | ||
end | ||
end | ||
|
||
describe "#value" do | ||
it "reads node text value" do | ||
reader = Reader.new(%{<root id="1">hello<!-- world --></root>}) | ||
reader.value.should eq("") | ||
reader.read # <root> | ||
reader.value.should eq("") | ||
reader.read # hello | ||
reader.value.should eq("hello") | ||
reader.read # <!-- world --> | ||
reader.value.should eq(" world ") | ||
reader.read # </root> | ||
reader.move_to_first_attribute.should be_true | ||
reader.value.should eq("1") | ||
end | ||
end | ||
|
||
describe "#to_unsafe" do | ||
it "returns a pointer to the underlying LibXML::XMLTextReader" do | ||
reader = Reader.new("<root/>") | ||
reader.to_unsafe.should be_a(LibXML::XMLTextReader) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
enum XML::Type | ||
NONE = 0 | ||
ELEMENT_NODE = 1 | ||
ATTRIBUTE_NODE = 2 | ||
TEXT_NODE = 3 | ||
|