Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throw exception when predicate has illegal localname #954

Closed
wants to merge 5 commits into from

Conversation

nianma
Copy link
Contributor

@nianma nianma commented Nov 19, 2015

@@ -128,7 +128,10 @@ public static String getPropertyNameFromPredicate(final NamespaceRegistry namesp
}

final String rdfLocalname = predicate.getLocalName();

if (rdfLocalname.isEmpty()) {
LOGGER.info("predicate local name is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inaccurate. The local name is empty, not null. How about "Predicate local name is empty!"

@ajs6f
Copy link
Contributor

ajs6f commented Nov 19, 2015

Does this actually resolve FCREPO-1799? That issue appears to have something to do with a localname that begins with an integer. Why are there no new tests associated with this PR?

@nianma
Copy link
Contributor Author

nianma commented Nov 19, 2015

Do we really need a new test for this one? since the solution is just throwing exception when this case happens.

@escowles
Copy link
Contributor

If we choose to throw an exception (which I'm fine with), we should add a test so we'll notice if Jena's namespace parsing rules change in the future. We should also document the decision in the wiki. Since there is a conflict between RDF/XML and Turtle over what is a valid localName, we can't support both. Choosing the stricter one seems reasonable to me, though I expect some users to be frustrated by the choice. So at the very least, we should document and justify the decision.

@escowles
Copy link
Contributor

@nianma, I see that FCREPO-1799 is now in review. But I don't know if you saw my comment yesterday. I am 👎 on this PR until there is a test that will let us know if the Jena behavior changes.

@ajs6f
Copy link
Contributor

ajs6f commented Nov 20, 2015

Also a 👎 from me without tests.

@nianma
Copy link
Contributor Author

nianma commented Nov 20, 2015

I will add a test soon.

@nianma
Copy link
Contributor Author

nianma commented Nov 20, 2015

Let me know if this test is OK. Just found out it is OK to have predicate local name starting from digit, but it cannot be all digits. Not so sure where to document it in wiki.

@Test(expected = RuntimeException.class)
public final void IllegalPredicateLocalname() throws RepositoryException {
final Property p = createProperty(mockUri, "1234");
final Map<String, String> nsMap = new HashMap<String, String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just final Map<String, String> nsMap = ImmutableMap.of("example", mockUri), and maybe static import the of, or inline the whole value.

@nianma
Copy link
Contributor Author

nianma commented Nov 20, 2015

I should say, the actual local name will start from a letter and ignore all the previous digits, but won't give any exception.

@escowles
Copy link
Contributor

@nianma, when you say "the actual local name will start from a letter and ignore all the previous digits, but won't give any exception.", I'm not sure what you mean exactly. The original issue reported in https://jira.duraspace.org/browse/FCREPO-1799 was doing a SPARQL Update with a QName that started with a digit, which produced an error. That's still the case -- we're just throwing an error on insert instead of when you try to read the triples (where the error was happening before).

@nianma
Copy link
Contributor Author

nianma commented Nov 20, 2015

@escowles, yes, the issue of https://jira.duraspace.org/browse/FCREPO-1799 will happen for local name is "1234", but if you change the query to

PREFIX bf: http://bibframe.org/vocab/
PREFIX cc: http://catalog.coloradocollege.edu/
INSERT {
<> cc:1a234 "anything"
}
WHERE {}

It will actually create the node as below without exception.
ns001: a234
anything

@escowles
Copy link
Contributor

@nianma OK, that's surprising -- and going to result in a lot of strange namespaces being registered if people have many predicates with localNames that start with digits, but also contain letters.

I think the best thing to do would be:

  1. Add a test demonstrating that http://foo.com/1a234 is being interpreted as (http://foo.com/1)(a234).
  2. Add a note to the Wiki (maybe in the note after the Containers POST method?) saying that having all-numeric localNames will result in an error, and having a localName starting with a number will result in incorrectly-parsed namespaces.

@ajs6f
Copy link
Contributor

ajs6f commented Nov 20, 2015

WHOA! Either our interpretation is incorrect, or there is a deeper incorrectness in our code, or there is a huge bug in Jena (this is frankly untenable, a bug that big would have surfaced many years ago). I think we need to understand this behavior much better before we decide to just document it and move on.

@escowles
Copy link
Contributor

@ajs6f Because the Turtle spec allows QNames that aren't valid in XML, I'm not at all surprised that weird behavior is happening. And AFAICT, this is allowed in Turtle only as of RDF 1.1 (so early this year), and previous versions of Turtle didn't allow it.

I don't know why they would change Turtle to allow URIs that can't be legally convert to RDF/XML, but that seems to be exactly what they've done. Maybe this is part of the general trend toward de-emphasizing RDF/XML and not allowing XML's limitations to prevent things that don't pose problems for other serializations?

@ajs6f
Copy link
Contributor

ajs6f commented Nov 20, 2015

I know about the discrepancy to which you refer (and for my money, RDF/XML can't die soon enough) but it doesn't seem to me to explain why localname glyphs are getting absorbed into namespaces. That's a bug no matter how you play it.

@escowles
Copy link
Contributor

I agree it's a bug. One way to address it would be to stop using Jena's namespace parsing and just iterate through the registered namespaces to find one that's the longest substring of the URI in question.

Predicates that start with a digit would still generate invalid RDF/XML, but I'm guessing many people would be fine with that limit if they wanted to use those predicates.

@ajs6f
Copy link
Contributor

ajs6f commented Nov 20, 2015

Is it Jena's NS parsing that is doing this, or MODE's?

@ajs6f
Copy link
Contributor

ajs6f commented Nov 20, 2015

If we do think this is Jena's parsing (and I am willing to bet you a bomber of really good ale that it is not) then we owe them a ticket and better, a PR.

@escowles
Copy link
Contributor

@ajs6f I believe it's Jena. In PropertyConverter.getPropertyNameFromPredicate() we're calling Resource.getNameSpace(), which is actually Node_URI.getNameSpace(), which in turn calls Util. splitNamespaceXML().

@ajs6f
Copy link
Contributor

ajs6f commented Nov 20, 2015

If you think it is in Jena, please file a ticket there, in addition to doing whatever you think is right here. If Jena wants to consider those QNames illegal, fine, but this is not expected behavior. This is definitely not okay behavior from our system, whatever the cause. There is no reason that XML rules should have any effect on this at all. If we want to restrict to XML QNames, fine, but let's make sure that we are rejecting outside-the-bounds names properly.

@DiegoPino
Copy link

@escowles and @ajs6f, it's Jena and it's expected (old) behaviour for rdf 1.0 specs, so basically made for xml specs as you all already said. I have seen questions about the same problem since 2013? with always the same response, so i think, ticket to them maybe not worth. Newbie question (not need to answer if it's obvious), but why extract local name and namespace using Jena (and transforming the property first to a jena resource), instead directly from the native JCR property?

https://issues.jboss.org/browse/MODE-1855 implemented access local methods by casting item, or property in this case to Namespaced, so you get getLocalName() and getNamespaceURI().

Lastly, sorry for asking silly questions, i'm trying to get a better understanding of F4 (smile) and i don't get the whole code yet, not even partially.

@escowles
Copy link
Contributor

@DiegoPino, Yes, we are using the Namespaced utility for doing the JCR -> RDF mapping, and it works well. But we are using Jena to do the RDF -> JCR mapping. Jena provides roughly the same API, but has this strange behavior in URIs with localNames starting with digits.

I agree it's not worth filing a bug for Jena. I found a couple in their bug tracker already, and one mentioned frequent postings to the mailing list. So I think they're fully aware of this issue.

For us, I think it's hard to categorically say that any predicate is being mis-parsed, unless the localName is empty. To do that, we'd need to make some much stricter rule (e.g., namespaces must end with # or /) that would be too restrictive, and/or we'd need to implement our own parser and maintain it. I have no desire to do that.

I think this PR has the correct behavior: throw an error if the localName is empty, logging at WARN if the parsed namespace is new (maybe mis-parsed, maybe just a new namespace). There's a test to cover the case. So I'm now 👍

@ajs6f
Copy link
Contributor

ajs6f commented Nov 23, 2015

@DiegoPino @escowles Do you have links to those Jira issues?

@escowles
Copy link
Contributor

@ajs6f:

Though I should say that I tried parsing URIs with jena 3.0 and found the behavior unchanged. So there may be more to it, or we may need to adjust our usage of the Jena API to use the new parser.

@ajs6f
Copy link
Contributor

ajs6f commented Nov 23, 2015

That makes a lot more sense. My reading is that it was just plain fixed in 3.0.0. Are we not using 3.0.0?

@DiegoPino
Copy link

@ajs6f, Jena 3.0 fix basically renames a method called in Node.getLocalName to Util.splitNamespaceXML to denote XML specs are being followed, and adds a new class SplitIRI to play with Turtle.
apache/jena@6d3a74f and
apache/jena@a3907db

So, it's not fixed, you have to choose what to use depending on the desired serialisation?

Sesame's similar(equally named) methods are more forgiving, but the problem will be present always when serialising to rdf/xml.

@ajs6f
Copy link
Contributor

ajs6f commented Nov 23, 2015

I'd say that's fixed. Jena can't change the fact that the serializations disagree, the most it can do it provide you with the machinery to use both to the fullest.
My feeling would be that we choose for the more forgiving spec and let the RDF/XML serialization machinery blow out if people put cray-cray into it.

@escowles
Copy link
Contributor

I think it's perfectly acceptable to have the RDF/XML serialization blow up if you put predicates in that don't have valid XML QNames. That would let people use anything valid with Turtle, but people who wanted to use RDF/XML would have to use something more restrictive.

Logging these with WARN would be even better so people interested in using RDF/XML would be warned.

@ajs6f
Copy link
Contributor

ajs6f commented Nov 24, 2015

Right, just so, and WARN makes sense.

@@ -116,6 +116,13 @@ public final void shouldRejectFcrPredicates() throws RepositoryException {
PropertyConverter.getPropertyNameFromPredicate(mockNode, p, nsMap);
}

@Test(expected = RuntimeException.class)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please catch a RepositoryRuntimeException here.

@awoods
Copy link

awoods commented Nov 24, 2015

Is the consensus that this ticket will be ready to go once the logging has been changed to WARN?
...and my two other minor comments have been addressed?

@escowles
Copy link
Contributor

@awoods I think the consensus is that we should update to Jena 3.0 and use SplitIRI.localname/namespace, and WARN if the localname isn't valid in RDF/XML (starts with a digit).

@awoods
Copy link

awoods commented Nov 24, 2015

My suspicion is that upgrading to Jena 3.0 will have sweeping impacts throughout the core codebase as well as in the labs and exts projects.
Should we create a new JIRA ticket for that upgrade, then @nianma can finish this current PR once Jena 3.0 is in place?

@acoburn
Copy link
Contributor

acoburn commented Nov 24, 2015

There is an existing ticket for the Jena 3.x upgrade: https://jira.duraspace.org/browse/FCREPO-1672

@awoods
Copy link

awoods commented Nov 24, 2015

I have bumped the priority of https://jira.duraspace.org/browse/FCREPO-1672 from minor to major. Does anyone have an interest in moving that prerequisite forward in the short term?

@acoburn
Copy link
Contributor

acoburn commented Nov 24, 2015

I looked into that issue a few months ago and I'd be happy to revisit it.

@awoods
Copy link

awoods commented Nov 24, 2015

👍 @acoburn

@ajs6f
Copy link
Contributor

ajs6f commented Nov 24, 2015

👍 @acoburn and 👍 @escowles

@escowles
Copy link
Contributor

Also, I should mention that SplitIRI.namespaceTTL() is broken in Jena 3.0.0 -- I put in a (trivial) PR to fix this that's already been merged: apache/jena#102. So that should be fixed in 3.0.1. In the meantime, we can just use SplitIRI.namespace() and SplitIRI.localnameTTL().

@barmintor
Copy link
Contributor

If this PR is currently being used to hold place for another fix (Jena 3), please label it as wontfix.

@awoods awoods changed the base branch from master to main July 9, 2020 16:17
@Surfrdan
Copy link
Contributor

I think this can be closed. The upgrade to Jena 3.x was completed and that seemed to be all this was hanging on

@Surfrdan Surfrdan closed this Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants