New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Iterator-based RdfStream with Java 8 Streams #962
Conversation
* @return the corresponding RdfStreamContext | ||
*/ | ||
public static RdfStreamContext fromModel(final Model model) { | ||
return new RdfStreamContext(StreamSupport.stream(spliteratorUnknownSize(model.listStatements(), NONNULL), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just Iter.asStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's present in jena 2.13.x. I know it's in 3.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we went to 3 a while ago?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope: https://jira.duraspace.org/browse/FCREPO-1672
I'm working on that, but I'm stuck on how string literals get mapped to/from the JCR layer w/r/t their Datatype. It must be a simple thing that I'm overlooking, but if you'd like to take over that ticket, you'd be welcome to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm on "vacation" with my new second child. If you can point me to a specific test, I could try to make it pass, but I'm not up for an epic, saga, book-length sequence of cantos, or other long-form piece of poetry.
Eh. I guess this works. I don't like the fact that the graph is an internal member. I don't think we have a good understanding of the nature of identity or equality for this type. But at least it gets us moving forward. |
I also don't like that the stream is an internal member, but java8 streams don't carry state. And if we need things like In my opinion, the entire
As for identity, what exactly is identity in the context of a |
This is why I want a subtype of |
Updated based on suggestions. This still uses an internal |
I guess I like this more, now, but I still don't think it captures the notion. I would much rather see a subtype of
The |
@ajs6f |
👍 |
* @author acoburn | ||
* @since Dec 4, 2015 | ||
*/ | ||
public enum RdfContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bad coupling in the type system? If I add a new algorithm for generating triples, do I have to add a class and a member here? Seems like we only need the class…
I don't understand the advantage of enumerating the algorithms for generating triples. |
The entire reason for creating this enum is so that the fcrepo-http-* modules don't reference fcrepo-modeshape classes directly. Otherwise, (and this is the current situation) it would not be possible to reuse the fcrepo-http-* modules with a non modeshape impl. On Feb 15, 2016, at 8:52 AM, A. Soroka <notifications@github.commailto:notifications@github.com> wrote: I don't understand the advantage of enumerating the algorithms for generating triples. — |
Can't we just get rid of the ability to select different triple generation algorithms? Just have a simple |
And how would the prefer headers work? Especially the inbound refs and/or child resources? Fetch all triples all the time and then filter? Wouldn't there be a significant performance penalty for that? On Feb 15, 2016, at 9:07 AM, A. Soroka <notifications@github.commailto:notifications@github.com> wrote: Can't we just get rid of the ability to select different triple generation algorithms? Just have a simple getTriples() without parameters? — |
If the concern is the |
} | ||
|
||
private final Function<Version, Iterator<Triple>> version2triples = new Function<Version, Iterator<Triple>> () { | ||
private final Function<Version, Stream<Triple>> version2triples = new Function<Version, Stream<Triple>> () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use lambda notation here?
Initial code review comments addressed |
* @author acoburn | ||
* @since Dec 4, 2015 | ||
*/ | ||
public enum RdfContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more what I would be hoping for. @awoods , @escowles , @barmintor can I get you to glance at this? Is this in line with our prospective moves towards developing versioning, fixity, etc. as extensions to LDP? I would think that anything that develops from a Prefer header value or as an extension based on LDP type would show up here in some reasonably explicit way.
I'm still not happy about having to expose this stuff using an enum
instead of polymorphism, but I don't want to drag us around and around with that right now. The question I want to be a little more sure about is whether we are capturing here the "flavors" of RDF that any Fedora impl would have to be able to supply, to the best of our current ability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we may need to iterate on the completeness of this enum
based on feedback from the spec-writing exercises and associated alt-F4-implementations, but that should not hold-up this solid initial cut.
I hope to finish my review of this entire ticket today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a lot of context, but 1) If this is supposed to be a complete list, shouldn't the minimal container Prefer value be here? 2) Doesn't using an enum here fairly sharply limit the reusability of the code? 3) Strictly speaking the Prefer header's 'include' and 'omit' params are multivalued lists, so I hope this enum is used in a way that supports that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barmintor, what do you mean by # 2:
"Doesn't using an enum here fairly sharply limit the reusability of the code?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is all why I would rather not use the enum
here, although I appreciate that @acoburn is just trying to get information passed around in a reasonable way. The context is: @acoburn is redoing RDF generation mechanics to use the new Java 8 Stream API, which is the general-purpose and well-supported version of what @cbeer and I did when we first built this stuff. In order to do that, he wants (quite reasonably) to provide some means by which clients can indicate what kind of RDF they want. There is no point to generating RDF and then filtering it, and it could be very expensive for some implementations. That's where this comes in. We were passing around actual values for classes (Foo.class
) but that couples the client directly to impls (boo!). The ideal, to my mind, would be to develop a good set of interface types that actually model the RDF generation flavors we expect to support, and then pass those around as types. No coupling, extensible, no problem. But that admittedly would slow this down and be more work for @acoburn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that using an enum will prevent the kind of future extensibility we want to encourage, but I think it's fine to address that in another ticket and not hold this up for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. My preference is for incremental steps. @acoburn has done a great job of initial decoupling from fcrepo-kernel-modeshape with this PR. I would be inclined to move forward with the enum
approach for this increment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask the question another way? I don't grok how the enum part has
anything to do with the stream/iterator part; that seems like it should
have been another ticket, and another PR. I trust @acoburn's assessment
about whether that's possible or not, but this seems like a PR that is
trying to to do a little too much. Maybe that's a consequence of mvn
juggling?
On Wed, Feb 17, 2016 at 10:47 AM, Andrew Woods notifications@github.com
wrote:
In fcrepo-kernel-api/src/main/java/org/fcrepo/kernel/api/RdfContext.java
#962 (comment):
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- /
+package org.fcrepo.kernel.api;
+
+/*- * A collection of RDF contexts that can be used to extract triples from FedoraResources
- * @author acoburn
- * @SInCE Dec 4, 2015
- */
+public enum RdfContext {Agreed. My preference is for incremental steps. @acoburn
https://github.com/acoburn has done a great job of initial decoupling
from fcrepo-kernel-modeshape with this PR. I would be inclined to move
forward with the enum approach for this increment.—
Reply to this email directly or view it on GitHub
https://github.com/fcrepo4/fcrepo4/pull/962/files#r53182947.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barmintor in one sense these are two separate tickets: 1) change RdfStream to be Stream-based instead of Iterator-based and 2) decouple the modeshape-based RdfStream implementations from the HTTP layers.
Approaching these separately, however, would mean completely rewriting the RdfStream implementations (all 18 of them), and then rewriting them again. IMO, this is like removing a band-aid: it is better to do it all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm sorry to be a gadfly here when @acoburn has done so much work.
Like I said, I am out of context for this ticket.
On Wed, Feb 17, 2016 at 11:02 AM, Benjamin Armintor armintor@gmail.com
wrote:
Can I ask the question another way? I don't grok how the enum part has
anything to do with the stream/iterator part; that seems like it should
have been another ticket, and another PR. I trust @acoburn's assessment
about whether that's possible or not, but this seems like a PR that is
trying to to do a little too much. Maybe that's a consequence of mvn
juggling?On Wed, Feb 17, 2016 at 10:47 AM, Andrew Woods notifications@github.com
wrote:In fcrepo-kernel-api/src/main/java/org/fcrepo/kernel/api/RdfContext.java
#962 (comment):
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- /
+package org.fcrepo.kernel.api;
+
+/*- * A collection of RDF contexts that can be used to extract triples from FedoraResources
- * @author acoburn
- * @SInCE Dec 4, 2015
- */
+public enum RdfContext {Agreed. My preference is for incremental steps. @acoburn
https://github.com/acoburn has done a great job of initial decoupling
from fcrepo-kernel-modeshape with this PR. I would be inclined to move
forward with the enum approach for this increment.—
Reply to this email directly or view it on GitHub
https://github.com/fcrepo4/fcrepo4/pull/962/files#r53182947.
Done with review. |
Updates applied. |
After discussion today, it seems like merging is the right thing to do. |
I'm on it... |
Demonstrate webac failing source
Resolved with: 523953d |
This is an implementation of a context-bearing RdfStream class using the Java8 stream API. The intention here is that this will replace the existing
org.fcrepo.kernel.api.utils.iterators.RdfStream
class.Relates to: https://jira.duraspace.org/browse/FCREPO-1312
The design of this implementation addresses two related issues. First, to make
RdfStream
behave like a Java 8Stream
rather than anIterator
.And secondly, to make the classes that implement these RDF streams (
org.fcrepo.kernel.modeshape.rdf.impl.*
) isolated to thefcrepo-kernel-modeshape
module. The current architecture places a hard dependency onfcrepo-kernel-modeshape
from thefcrepo-http-*
modules, which would preclude any alternate implementation. This PR goes most of the way toward severing that direct dependency. It does this by declaring an RdfContext enum in thefcrepo-kernel-api
module. The values defined there can be passed into theFedoraResource::getTriples
method, which are handled by the implementing (e.g. modeshape) module.