Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Iterator-based RdfStream with Java 8 Streams #962

Closed
wants to merge 6 commits into from
Closed

Replace Iterator-based RdfStream with Java 8 Streams #962

wants to merge 6 commits into from

Conversation

acoburn
Copy link
Contributor

@acoburn acoburn commented Dec 4, 2015

This is an implementation of a context-bearing RdfStream class using the Java8 stream API. The intention here is that this will replace the existing org.fcrepo.kernel.api.utils.iterators.RdfStream class.

Relates to: https://jira.duraspace.org/browse/FCREPO-1312

The design of this implementation addresses two related issues. First, to make RdfStream behave like a Java 8 Stream rather than an Iterator.

And secondly, to make the classes that implement these RDF streams (org.fcrepo.kernel.modeshape.rdf.impl.*) isolated to the fcrepo-kernel-modeshape module. The current architecture places a hard dependency on fcrepo-kernel-modeshape from the fcrepo-http-* modules, which would preclude any alternate implementation. This PR goes most of the way toward severing that direct dependency. It does this by declaring an RdfContext enum in the fcrepo-kernel-api module. The values defined there can be passed into the FedoraResource::getTriples method, which are handled by the implementing (e.g. modeshape) module.

* @return the corresponding RdfStreamContext
*/
public static RdfStreamContext fromModel(final Model model) {
return new RdfStreamContext(StreamSupport.stream(spliteratorUnknownSize(model.listStatements(), NONNULL), false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Iter.asStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's present in jena 2.13.x. I know it's in 3.x.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we went to 3 a while ago?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope: https://jira.duraspace.org/browse/FCREPO-1672

I'm working on that, but I'm stuck on how string literals get mapped to/from the JCR layer w/r/t their Datatype. It must be a simple thing that I'm overlooking, but if you'd like to take over that ticket, you'd be welcome to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm on "vacation" with my new second child. If you can point me to a specific test, I could try to make it pass, but I'm not up for an epic, saga, book-length sequence of cantos, or other long-form piece of poetry.

@ajs6f
Copy link
Contributor

ajs6f commented Dec 4, 2015

Eh. I guess this works. I don't like the fact that the graph is an internal member. I don't think we have a good understanding of the nature of identity or equality for this type. But at least it gets us moving forward.

@acoburn
Copy link
Contributor Author

acoburn commented Dec 4, 2015

I also don't like that the stream is an internal member, but java8 streams don't carry state. And if we need things like topic() and namespaces(), then the class will need to carry state with it and push the stream into a private member.

In my opinion, the entire RdfStream type would be as simple as:

interface RdfStream extends Stream<Triple> { }

As for identity, what exactly is identity in the context of a Stream? I certainly don't think there is any reasonable way to express that.

@ajs6f
Copy link
Contributor

ajs6f commented Dec 4, 2015

This is why I want a subtype of Stream<Triple>, which could carry state. ::namespaces I would rather see gone and ::session is a impl detail which should go, but ::topic is not incidental. It is crucial.

@acoburn
Copy link
Contributor Author

acoburn commented Dec 4, 2015

Updated based on suggestions. This still uses an internal Stream, and the Node topic is now a necessary part of the constructor.

@ajs6f
Copy link
Contributor

ajs6f commented Dec 4, 2015

I guess I like this more, now, but I still don't think it captures the notion. I would much rather see a subtype of Stream<Triple>. There is some grossness here:

RdfStreamContext triples = ...;
out.println("The first triple is: " + triples.stream().findFirst().orElse("not there!"));
Stream<Triple> stillThere = triples.stream();
assert stillThere.count() == 0;
// explode!

The findFirst is terminal on the internal stream, but leaves behind a "shell" RdfStreamContext. What exactly is that shell? For all practical purposes, it's the same thing as a "post-terminal" Stream. I don't like that. It's correct behavior given the Streams API, but it couldn't happen if RdfStreamContext < Stream<Triple>.

@acoburn
Copy link
Contributor Author

acoburn commented Dec 4, 2015

@ajs6f RdfStream is now defined as an interface with a context-bearing member. I think this captures what we both want. (of course, there is no implementation yet, but it's a start)

@ajs6f
Copy link
Contributor

ajs6f commented Dec 4, 2015

👍

@acoburn acoburn changed the title DO NOT MERGE Replace Iterator-based RdfStream with Stream-based interface Feb 14, 2016
@acoburn acoburn changed the title Replace Iterator-based RdfStream with Stream-based interface Replace Iterator-based RdfStream with Java 8 Streams Feb 14, 2016
* @author acoburn
* @since Dec 4, 2015
*/
public enum RdfContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bad coupling in the type system? If I add a new algorithm for generating triples, do I have to add a class and a member here? Seems like we only need the class…

@ajs6f
Copy link
Contributor

ajs6f commented Feb 15, 2016

I don't understand the advantage of enumerating the algorithms for generating triples.

@acoburn
Copy link
Contributor Author

acoburn commented Feb 15, 2016

The entire reason for creating this enum is so that the fcrepo-http-* modules don't reference fcrepo-modeshape classes directly. Otherwise, (and this is the current situation) it would not be possible to reuse the fcrepo-http-* modules with a non modeshape impl.

On Feb 15, 2016, at 8:52 AM, A. Soroka <notifications@github.commailto:notifications@github.com> wrote:

I don't understand the advantage of enumerating the algorithms for generating triples.


Reply to this email directly or view it on GitHubhttps://github.com//pull/962#issuecomment-184267190.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 15, 2016

Can't we just get rid of the ability to select different triple generation algorithms? Just have a simple getTriples() without parameters?

@acoburn
Copy link
Contributor Author

acoburn commented Feb 15, 2016

And how would the prefer headers work? Especially the inbound refs and/or child resources? Fetch all triples all the time and then filter? Wouldn't there be a significant performance penalty for that?

On Feb 15, 2016, at 9:07 AM, A. Soroka <notifications@github.commailto:notifications@github.com> wrote:

Can't we just get rid of the ability to select different triple generation algorithms? Just have a simple getTriples() without parameters?


Reply to this email directly or view it on GitHubhttps://github.com//pull/962#issuecomment-184274020.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 15, 2016

If the concern is the Prefer header, why doesn't the new enum contain direct matches to the available values for that header?

}

private final Function<Version, Iterator<Triple>> version2triples = new Function<Version, Iterator<Triple>> () {
private final Function<Version, Stream<Triple>> version2triples = new Function<Version, Stream<Triple>> () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use lambda notation here?

@acoburn
Copy link
Contributor Author

acoburn commented Feb 17, 2016

Initial code review comments addressed

* @author acoburn
* @since Dec 4, 2015
*/
public enum RdfContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more what I would be hoping for. @awoods , @escowles , @barmintor can I get you to glance at this? Is this in line with our prospective moves towards developing versioning, fixity, etc. as extensions to LDP? I would think that anything that develops from a Prefer header value or as an extension based on LDP type would show up here in some reasonably explicit way.

I'm still not happy about having to expose this stuff using an enum instead of polymorphism, but I don't want to drag us around and around with that right now. The question I want to be a little more sure about is whether we are capturing here the "flavors" of RDF that any Fedora impl would have to be able to supply, to the best of our current ability.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we may need to iterate on the completeness of this enum based on feedback from the spec-writing exercises and associated alt-F4-implementations, but that should not hold-up this solid initial cut.

I hope to finish my review of this entire ticket today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of context, but 1) If this is supposed to be a complete list, shouldn't the minimal container Prefer value be here? 2) Doesn't using an enum here fairly sharply limit the reusability of the code? 3) Strictly speaking the Prefer header's 'include' and 'omit' params are multivalued lists, so I hope this enum is used in a way that supports that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barmintor, what do you mean by # 2:
"Doesn't using an enum here fairly sharply limit the reusability of the code?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is all why I would rather not use the enum here, although I appreciate that @acoburn is just trying to get information passed around in a reasonable way. The context is: @acoburn is redoing RDF generation mechanics to use the new Java 8 Stream API, which is the general-purpose and well-supported version of what @cbeer and I did when we first built this stuff. In order to do that, he wants (quite reasonably) to provide some means by which clients can indicate what kind of RDF they want. There is no point to generating RDF and then filtering it, and it could be very expensive for some implementations. That's where this comes in. We were passing around actual values for classes (Foo.class) but that couples the client directly to impls (boo!). The ideal, to my mind, would be to develop a good set of interface types that actually model the RDF generation flavors we expect to support, and then pass those around as types. No coupling, extensible, no problem. But that admittedly would slow this down and be more work for @acoburn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using an enum will prevent the kind of future extensibility we want to encourage, but I think it's fine to address that in another ticket and not hold this up for that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. My preference is for incremental steps. @acoburn has done a great job of initial decoupling from fcrepo-kernel-modeshape with this PR. I would be inclined to move forward with the enum approach for this increment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask the question another way? I don't grok how the enum part has
anything to do with the stream/iterator part; that seems like it should
have been another ticket, and another PR. I trust @acoburn's assessment
about whether that's possible or not, but this seems like a PR that is
trying to to do a little too much. Maybe that's a consequence of mvn
juggling?

On Wed, Feb 17, 2016 at 10:47 AM, Andrew Woods notifications@github.com
wrote:

In fcrepo-kernel-api/src/main/java/org/fcrepo/kernel/api/RdfContext.java
#962 (comment):

  • * Unless required by applicable law or agreed to in writing, software
  • * distributed under the License is distributed on an "AS IS" BASIS,
  • * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  • * See the License for the specific language governing permissions and
  • * limitations under the License.
  • /
    +package org.fcrepo.kernel.api;
    +
    +/
    *
  • * A collection of RDF contexts that can be used to extract triples from FedoraResources
  • * @author acoburn
  • * @SInCE Dec 4, 2015
  • */
    +public enum RdfContext {

Agreed. My preference is for incremental steps. @acoburn
https://github.com/acoburn has done a great job of initial decoupling
from fcrepo-kernel-modeshape with this PR. I would be inclined to move
forward with the enum approach for this increment.


Reply to this email directly or view it on GitHub
https://github.com/fcrepo4/fcrepo4/pull/962/files#r53182947.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barmintor in one sense these are two separate tickets: 1) change RdfStream to be Stream-based instead of Iterator-based and 2) decouple the modeshape-based RdfStream implementations from the HTTP layers.

Approaching these separately, however, would mean completely rewriting the RdfStream implementations (all 18 of them), and then rewriting them again. IMO, this is like removing a band-aid: it is better to do it all at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm sorry to be a gadfly here when @acoburn has done so much work.
Like I said, I am out of context for this ticket.

On Wed, Feb 17, 2016 at 11:02 AM, Benjamin Armintor armintor@gmail.com
wrote:

Can I ask the question another way? I don't grok how the enum part has
anything to do with the stream/iterator part; that seems like it should
have been another ticket, and another PR. I trust @acoburn's assessment
about whether that's possible or not, but this seems like a PR that is
trying to to do a little too much. Maybe that's a consequence of mvn
juggling?

On Wed, Feb 17, 2016 at 10:47 AM, Andrew Woods notifications@github.com
wrote:

In fcrepo-kernel-api/src/main/java/org/fcrepo/kernel/api/RdfContext.java
#962 (comment):

  • * Unless required by applicable law or agreed to in writing, software
  • * distributed under the License is distributed on an "AS IS" BASIS,
  • * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  • * See the License for the specific language governing permissions and
  • * limitations under the License.
  • /
    +package org.fcrepo.kernel.api;
    +
    +/
    *
  • * A collection of RDF contexts that can be used to extract triples from FedoraResources
  • * @author acoburn
  • * @SInCE Dec 4, 2015
  • */
    +public enum RdfContext {

Agreed. My preference is for incremental steps. @acoburn
https://github.com/acoburn has done a great job of initial decoupling
from fcrepo-kernel-modeshape with this PR. I would be inclined to move
forward with the enum approach for this increment.


Reply to this email directly or view it on GitHub
https://github.com/fcrepo4/fcrepo4/pull/962/files#r53182947.

@awoods
Copy link

awoods commented Feb 17, 2016

Done with review.

@acoburn
Copy link
Contributor Author

acoburn commented Feb 17, 2016

Updates applied.

@awoods
Copy link

awoods commented Feb 18, 2016

Thanks for your patience, @acoburn, with all of the back-and-forth. I think you landed this in an excellent place.
Now the foundation is set for the enum debates.
@ajs6f, I will wait on your feedback before squashing and merging.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 18, 2016

After discussion today, it seems like merging is the right thing to do.

@awoods
Copy link

awoods commented Feb 18, 2016

I'm on it...

awoods pushed a commit to awoods/fcrepo-module-auth-xacml that referenced this pull request Feb 19, 2016
Demonstrate webac failing source
@awoods
Copy link

awoods commented Feb 19, 2016

Resolved with: 523953d

@awoods awoods closed this Feb 19, 2016
awoods pushed a commit to fcrepo-exts/fcrepo-module-auth-xacml that referenced this pull request Feb 19, 2016
@acoburn acoburn deleted the fcrepo-1312 branch February 19, 2016 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants