New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throw exception when predicate has illegal localname #954
Conversation
@@ -128,7 +128,10 @@ public static String getPropertyNameFromPredicate(final NamespaceRegistry namesp | |||
} | |||
|
|||
final String rdfLocalname = predicate.getLocalName(); | |||
|
|||
if (rdfLocalname.isEmpty()) { | |||
LOGGER.info("predicate local name is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inaccurate. The local name is empty, not null. How about "Predicate local name is empty!"
Does this actually resolve FCREPO-1799? That issue appears to have something to do with a localname that begins with an integer. Why are there no new tests associated with this PR? |
Do we really need a new test for this one? since the solution is just throwing exception when this case happens. |
If we choose to throw an exception (which I'm fine with), we should add a test so we'll notice if Jena's namespace parsing rules change in the future. We should also document the decision in the wiki. Since there is a conflict between RDF/XML and Turtle over what is a valid localName, we can't support both. Choosing the stricter one seems reasonable to me, though I expect some users to be frustrated by the choice. So at the very least, we should document and justify the decision. |
@nianma, I see that FCREPO-1799 is now in review. But I don't know if you saw my comment yesterday. I am 👎 on this PR until there is a test that will let us know if the Jena behavior changes. |
Also a 👎 from me without tests. |
I will add a test soon. |
Let me know if this test is OK. Just found out it is OK to have predicate local name starting from digit, but it cannot be all digits. Not so sure where to document it in wiki. |
@Test(expected = RuntimeException.class) | ||
public final void IllegalPredicateLocalname() throws RepositoryException { | ||
final Property p = createProperty(mockUri, "1234"); | ||
final Map<String, String> nsMap = new HashMap<String, String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just final Map<String, String> nsMap = ImmutableMap.of("example", mockUri)
, and maybe static import
the of
, or inline the whole value.
I should say, the actual local name will start from a letter and ignore all the previous digits, but won't give any exception. |
@nianma, when you say "the actual local name will start from a letter and ignore all the previous digits, but won't give any exception.", I'm not sure what you mean exactly. The original issue reported in https://jira.duraspace.org/browse/FCREPO-1799 was doing a SPARQL Update with a QName that started with a digit, which produced an error. That's still the case -- we're just throwing an error on insert instead of when you try to read the triples (where the error was happening before). |
@escowles, yes, the issue of https://jira.duraspace.org/browse/FCREPO-1799 will happen for local name is "1234", but if you change the query to PREFIX bf: http://bibframe.org/vocab/ It will actually create the node as below without exception. |
@nianma OK, that's surprising -- and going to result in a lot of strange namespaces being registered if people have many predicates with localNames that start with digits, but also contain letters. I think the best thing to do would be:
|
WHOA! Either our interpretation is incorrect, or there is a deeper incorrectness in our code, or there is a huge bug in Jena (this is frankly untenable, a bug that big would have surfaced many years ago). I think we need to understand this behavior much better before we decide to just document it and move on. |
@ajs6f Because the Turtle spec allows QNames that aren't valid in XML, I'm not at all surprised that weird behavior is happening. And AFAICT, this is allowed in Turtle only as of RDF 1.1 (so early this year), and previous versions of Turtle didn't allow it. I don't know why they would change Turtle to allow URIs that can't be legally convert to RDF/XML, but that seems to be exactly what they've done. Maybe this is part of the general trend toward de-emphasizing RDF/XML and not allowing XML's limitations to prevent things that don't pose problems for other serializations? |
I know about the discrepancy to which you refer (and for my money, RDF/XML can't die soon enough) but it doesn't seem to me to explain why localname glyphs are getting absorbed into namespaces. That's a bug no matter how you play it. |
I agree it's a bug. One way to address it would be to stop using Jena's namespace parsing and just iterate through the registered namespaces to find one that's the longest substring of the URI in question. Predicates that start with a digit would still generate invalid RDF/XML, but I'm guessing many people would be fine with that limit if they wanted to use those predicates. |
Is it Jena's NS parsing that is doing this, or MODE's? |
If we do think this is Jena's parsing (and I am willing to bet you a bomber of really good ale that it is not) then we owe them a ticket and better, a PR. |
@ajs6f I believe it's Jena. In PropertyConverter.getPropertyNameFromPredicate() we're calling Resource.getNameSpace(), which is actually Node_URI.getNameSpace(), which in turn calls Util. splitNamespaceXML(). |
If you think it is in Jena, please file a ticket there, in addition to doing whatever you think is right here. If Jena wants to consider those QNames illegal, fine, but this is not expected behavior. This is definitely not okay behavior from our system, whatever the cause. There is no reason that XML rules should have any effect on this at all. If we want to restrict to XML QNames, fine, but let's make sure that we are rejecting outside-the-bounds names properly. |
@escowles and @ajs6f, it's Jena and it's expected (old) behaviour for rdf 1.0 specs, so basically made for xml specs as you all already said. I have seen questions about the same problem since 2013? with always the same response, so i think, ticket to them maybe not worth. Newbie question (not need to answer if it's obvious), but why extract local name and namespace using Jena (and transforming the property first to a jena resource), instead directly from the native JCR property? https://issues.jboss.org/browse/MODE-1855 implemented access local methods by casting item, or property in this case to Namespaced, so you get getLocalName() and getNamespaceURI(). Lastly, sorry for asking silly questions, i'm trying to get a better understanding of F4 (smile) and i don't get the whole code yet, not even partially. |
@DiegoPino, Yes, we are using the Namespaced utility for doing the JCR -> RDF mapping, and it works well. But we are using Jena to do the RDF -> JCR mapping. Jena provides roughly the same API, but has this strange behavior in URIs with localNames starting with digits. I agree it's not worth filing a bug for Jena. I found a couple in their bug tracker already, and one mentioned frequent postings to the mailing list. So I think they're fully aware of this issue. For us, I think it's hard to categorically say that any predicate is being mis-parsed, unless the localName is empty. To do that, we'd need to make some much stricter rule (e.g., namespaces must end with # or /) that would be too restrictive, and/or we'd need to implement our own parser and maintain it. I have no desire to do that. I think this PR has the correct behavior: throw an error if the localName is empty, logging at WARN if the parsed namespace is new (maybe mis-parsed, maybe just a new namespace). There's a test to cover the case. So I'm now 👍 |
@DiegoPino @escowles Do you have links to those Jira issues? |
Though I should say that I tried parsing URIs with jena 3.0 and found the behavior unchanged. So there may be more to it, or we may need to adjust our usage of the Jena API to use the new parser. |
That makes a lot more sense. My reading is that it was just plain fixed in 3.0.0. Are we not using 3.0.0? |
@ajs6f, Jena 3.0 fix basically renames a method called in Node.getLocalName to Util.splitNamespaceXML to denote XML specs are being followed, and adds a new class SplitIRI to play with Turtle. So, it's not fixed, you have to choose what to use depending on the desired serialisation? Sesame's similar(equally named) methods are more forgiving, but the problem will be present always when serialising to rdf/xml. |
I'd say that's fixed. Jena can't change the fact that the serializations disagree, the most it can do it provide you with the machinery to use both to the fullest. |
I think it's perfectly acceptable to have the RDF/XML serialization blow up if you put predicates in that don't have valid XML QNames. That would let people use anything valid with Turtle, but people who wanted to use RDF/XML would have to use something more restrictive. Logging these with WARN would be even better so people interested in using RDF/XML would be warned. |
Right, just so, and WARN makes sense. |
@@ -116,6 +116,13 @@ public final void shouldRejectFcrPredicates() throws RepositoryException { | |||
PropertyConverter.getPropertyNameFromPredicate(mockNode, p, nsMap); | |||
} | |||
|
|||
@Test(expected = RuntimeException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please catch a RepositoryRuntimeException
here.
Is the consensus that this ticket will be ready to go once the logging has been changed to WARN? |
@awoods I think the consensus is that we should update to Jena 3.0 and use SplitIRI.localname/namespace, and WARN if the localname isn't valid in RDF/XML (starts with a digit). |
My suspicion is that upgrading to Jena 3.0 will have sweeping impacts throughout the core codebase as well as in the |
There is an existing ticket for the Jena 3.x upgrade: https://jira.duraspace.org/browse/FCREPO-1672 |
I have bumped the priority of https://jira.duraspace.org/browse/FCREPO-1672 from |
I looked into that issue a few months ago and I'd be happy to revisit it. |
👍 @acoburn |
Also, I should mention that SplitIRI.namespaceTTL() is broken in Jena 3.0.0 -- I put in a (trivial) PR to fix this that's already been merged: apache/jena#102. So that should be fixed in 3.0.1. In the meantime, we can just use SplitIRI.namespace() and SplitIRI.localnameTTL(). |
If this PR is currently being used to hold place for another fix (Jena 3), please label it as wontfix. |
I think this can be closed. The upgrade to Jena 3.x was completed and that seemed to be all this was hanging on |
https://jira.duraspace.org/browse/FCREPO-1799