New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added support for named graphs in the processor classes #48
Conversation
added unit and integration tests
If the FCREPO_NAMED_GRAPH header is set, I gather that means that all Fedora-related updates will be inserted/deleted relative to that named graph, which would amount to a single Fedora named graph in the external triplestore, correct? |
Doesn't that depend on whether the header's value varies from request to request, or is a constant for the repository? |
I see where the header is being read, but not where it is being set... and therefore assume it is a configuration element. |
Isn't the setting of the header exactly what we'd want to leave up to each site? Each site might want to partition its triples very differently, for all kinds of unpredictable-to-us reasons. |
Where would/does that logic exist for setting the graph-name header? |
It would be, as I understand it, in the integration at a given site. Perhaps in elaborated Camel, or in some proxying element between the repo and Camel. I'd do it in Camel, myself, because you've already got it in play. We might want to include some defaulting behavior and document some recipes for partitioning into one graph per repo or one graph per resource. |
I agree that some documented defaulting behavior and recipes for assigning the graph-name header at different scopes would be helpful. |
The header is not set anywhere in the camel component -- that is up to implementors. For instance, one may want to partition the fedora nodes into separate (possibly overlapping) named graphs. The default behavior is to use no named graph (i.e. everything goes into the default graph). For instance, to partition into named graphs, based on a dynamically assigned property placeholder value:
Or, to partition based on some existing RDF property:
Or, you may want to have a "public" and a "private" graph in the triplestore:
But the setting of the header is really up to the specific implementation -- it may be hard-coded; it may come from an RDF property; it may come from some dynamically assigned property. |
Thanks, @acoburn, those examples are helpful. |
added support for named graphs in the processor classes
I'll add these to the documentation |
@acoburn thanks for the examples above. Does this, and more explanatory material, appear in the documentation? I haven't been able to find better documentation that appears on this github issue. I would like to be able to use each node's identifier as the name of the RDF graph. |
@Conal-Tuohy I don't believe there are very good examples of this, but I am planning to implement something soon that will index each resource's triples into separate named graphs in a triplestore. Keep an eye on https://gitlab.amherst.edu/acdc/repository-extension-services for a new triplestore indexer in the next week or two. |
Thanks @acoburn - I will keep an eye on that repo. It would be most helpful though if any of this could be documented in the official documentation. |
@Conal-Tuohy the named graph support is documented in the official documentation, though that documentation doesn't explain the patterns one might employ for using named graphs. The key thing is to have this in your camel code:
Or, to index each resource into its own named graph:
A full example of indexing each resource into its own named graph is available here: https://gitlab.amherst.edu/acdc/repository-extension-services/blob/master/acrepo-connector-triplestore/src/main/java/edu/amherst/acdc/connector/triplestore/TriplestoreRouter.java The destination triplestore can then be configured to make the default graph a union of all the datasets, which makes it possible to query across graphs. Alternately, one can query a particular named graph using the |
Thanks again @acoburn - your second example is I think the pattern I'm after (where the URI of the Fedora item is used as the name of the graph in the graph store). I will give that a try. Regarding the documentation on the Wiki, I originally found some related documentation at https://wiki.duraspace.org/display/FEDORA471/Setup+Camel+Message+Integrations (and subordinate pages), but it had no mention of named graphs. There was a reference to fcrepo-camel-toolbox which is how I found my way to this github issue and discovered the named graph feature. But searching again just now, using Google, I found the corresponding page for Fedora 4.2 does include material related to using named graphs in the external graph store. I don't know why the version of that page for Fedora 4.71 doesn't also include that documentation, but it doesn't. Is the feature not supported in the latest Fedora? Or is it just missing from the docs? |
https://jira.duraspace.org/browse/FCREPO-1310
This makes it easier for implementors to use named graphs with external triplestores.