Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for named graphs in the processor classes #48

Merged
merged 2 commits into from Jan 30, 2015
Merged

added support for named graphs in the processor classes #48

merged 2 commits into from Jan 30, 2015

Conversation

acoburn
Copy link
Contributor

@acoburn acoburn commented Jan 28, 2015

https://jira.duraspace.org/browse/FCREPO-1310

This makes it easier for implementors to use named graphs with external triplestores.

@awoods
Copy link

awoods commented Jan 30, 2015

If the FCREPO_NAMED_GRAPH header is set, I gather that means that all Fedora-related updates will be inserted/deleted relative to that named graph, which would amount to a single Fedora named graph in the external triplestore, correct?
If so, the utility of that is then being able to potentially clean out all Fedora-related triples in isolation from other triples in the store?

@ajs6f
Copy link
Contributor

ajs6f commented Jan 30, 2015

Doesn't that depend on whether the header's value varies from request to request, or is a constant for the repository?

@awoods
Copy link

awoods commented Jan 30, 2015

I see where the header is being read, but not where it is being set... and therefore assume it is a configuration element.

@ajs6f
Copy link
Contributor

ajs6f commented Jan 30, 2015

Isn't the setting of the header exactly what we'd want to leave up to each site? Each site might want to partition its triples very differently, for all kinds of unpredictable-to-us reasons.

@awoods
Copy link

awoods commented Jan 30, 2015

Where would/does that logic exist for setting the graph-name header?

@ajs6f
Copy link
Contributor

ajs6f commented Jan 30, 2015

It would be, as I understand it, in the integration at a given site. Perhaps in elaborated Camel, or in some proxying element between the repo and Camel. I'd do it in Camel, myself, because you've already got it in play. We might want to include some defaulting behavior and document some recipes for partitioning into one graph per repo or one graph per resource.

@awoods
Copy link

awoods commented Jan 30, 2015

I agree that some documented defaulting behavior and recipes for assigning the graph-name header at different scopes would be helpful.
Otherwise, this particular PR is probably ready to go... unless there is anything that you can think of to put in here, @acoburn, that would make the picture more clear.

@acoburn
Copy link
Contributor Author

acoburn commented Jan 30, 2015

The header is not set anywhere in the camel component -- that is up to implementors. For instance, one may want to partition the fedora nodes into separate (possibly overlapping) named graphs. The default behavior is to use no named graph (i.e. everything goes into the default graph).

For instance, to partition into named graphs, based on a dynamically assigned property placeholder value:

from("activemq:topic:fedora")
  .filter(some-type-of-filter)
    .to("fcrepo:localhost:8080/rest")
    .setHeader(FcrepoHeaders.FCREPO_NAMED_GRAPH).simple("{{named.graph}}")
    .process(new SparqlUpdateProcessor())
    .to("http4:localhost:3030/ds/update");

Or, to partition based on some existing RDF property:

from("activemq:topic:fedora")
  .filter(some-type-of-filter)
    .to("fcrepo:localhost:8080/rest")
    .setHeader(FcrepoHeaders.FCREPO_NAMED_GRAPH)
      .xpath("/rdf:RDF/rdf:Description/ex:namedGraph/text()", String.class, ns)
    .process(new SparqlUpdateProcessor())
    .to("http4:localhost:3030/ds/update");

Or, you may want to have a "public" and a "private" graph in the triplestore:

from("activemq:topic:fedora")
  .to("fcrepo:localhost:8080/rest")
  .multicast("direct:public", "direct:private");

from("direct:public")
    .filter(some-predicate)
      .setHeader(FcrepoHeaders.FCREPO_NAMED_GRAPH)
        .constant("public")
      .process(new SparqlUpdateProcessor())
      .to("http4:localhost:3030/ds/update");

from("direct:private")
  .filter(some-other-predicate)
    .setHeader(FcrepoHeaders.FCREPO_NAMED_GRAPH)
      .constant("private")
    .process(new SparqlUpdateProcessor())
    .to("http4:localhost:3030/ds/update");

But the setting of the header is really up to the specific implementation -- it may be hard-coded; it may come from an RDF property; it may come from some dynamically assigned property.

@awoods
Copy link

awoods commented Jan 30, 2015

Thanks, @acoburn, those examples are helpful.

awoods pushed a commit that referenced this pull request Jan 30, 2015
added support for named graphs in the processor classes
@awoods awoods merged commit 40e2b42 into fcrepo-exts:master Jan 30, 2015
@acoburn
Copy link
Contributor Author

acoburn commented Jan 30, 2015

I'll add these to the documentation

@acoburn acoburn deleted the named-graph-support branch January 30, 2015 22:54
@Conal-Tuohy
Copy link

@acoburn thanks for the examples above. Does this, and more explanatory material, appear in the documentation? I haven't been able to find better documentation that appears on this github issue. I would like to be able to use each node's identifier as the name of the RDF graph.

@acoburn
Copy link
Contributor Author

acoburn commented Feb 21, 2017

@Conal-Tuohy I don't believe there are very good examples of this, but I am planning to implement something soon that will index each resource's triples into separate named graphs in a triplestore. Keep an eye on https://gitlab.amherst.edu/acdc/repository-extension-services for a new triplestore indexer in the next week or two.

@Conal-Tuohy
Copy link

Thanks @acoburn - I will keep an eye on that repo.

It would be most helpful though if any of this could be documented in the official documentation.

@acoburn
Copy link
Contributor Author

acoburn commented Feb 24, 2017

@Conal-Tuohy the named graph support is documented in the official documentation, though that documentation doesn't explain the patterns one might employ for using named graphs.

The key thing is to have this in your camel code:

.setHeader(FCREPO_NAMED_GRAPH).constant("info:myuri")

Or, to index each resource into its own named graph:

.setHeader(FCREPO_NAMED_GRAPH).header(FCREPO_URI)

A full example of indexing each resource into its own named graph is available here: https://gitlab.amherst.edu/acdc/repository-extension-services/blob/master/acrepo-connector-triplestore/src/main/java/edu/amherst/acdc/connector/triplestore/TriplestoreRouter.java

The destination triplestore can then be configured to make the default graph a union of all the datasets, which makes it possible to query across graphs. Alternately, one can query a particular named graph using the SELECT * FROM <graph-uri> WHERE { ... } syntax.

@Conal-Tuohy
Copy link

Thanks again @acoburn - your second example is I think the pattern I'm after (where the URI of the Fedora item is used as the name of the graph in the graph store). I will give that a try.

Regarding the documentation on the Wiki, I originally found some related documentation at https://wiki.duraspace.org/display/FEDORA471/Setup+Camel+Message+Integrations (and subordinate pages), but it had no mention of named graphs. There was a reference to fcrepo-camel-toolbox which is how I found my way to this github issue and discovered the named graph feature.

But searching again just now, using Google, I found the corresponding page for Fedora 4.2 does include material related to using named graphs in the external graph store. I don't know why the version of that page for Fedora 4.71 doesn't also include that documentation, but it doesn't. Is the feature not supported in the latest Fedora? Or is it just missing from the docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants