Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Add reindexing service
  • Loading branch information
acoburn authored and Andrew Woods committed May 29, 2015
1 parent b498b20 commit 170c289
Show file tree
Hide file tree
Showing 38 changed files with 1,912 additions and 87 deletions.
15 changes: 15 additions & 0 deletions README.md
Expand Up @@ -39,6 +39,20 @@ indexes objects into an external Solr server.
This application listens to Fedora's event stream and
indexes objects into an external triplestore.

### Repository Re-Indexer

This application allows a user to initiate a re-indexing process
from any location within the Fedora node hierarchy, sending
re-indexing requests to a specified list of external applications
(e.g. fcrepo-indexing-solr and/or fcrepo-indexing-triplestore)

One can specify which applications/endpoints to send these
reindexing events, by POSTing a JSON array to the re-indexing
service:

curl -XPOST localhost:9080/reindexing/fedora/path -H"Content-Type: application/json" \
-d '["activemq:queue:solr.reindex","activemq:queue:triplestore.reindex"]'

## Building

To build these projects use this command
Expand All @@ -57,3 +71,4 @@ Then, you can add any combination of the following applications:
$> feature:install fcrepo-indexing-solr
$> feature:install fcrepo-indexing-triplestore
$> feature:install fcrepo-audit-triplestore
$> feature:install fcrepo-reindexer
3 changes: 2 additions & 1 deletion fcrepo-audit-triplestore/README.md
Expand Up @@ -25,7 +25,8 @@ This project can be deployed in an OSGi container. For example using
[Apache Karaf](http://karaf.apache.org), you can run the following
command from its shell:

osgi:install -s mvn:org.fcrepo.camel/audit-triplestore/{VERSION}
feature:repo-add mvn:org.fcrepo.camel/fcrepo-camel-toolbox/LATEST/xml/features
feature:install fcrepo-audit-triplestore

Or by copying the compiled bundle into `$KARAF_HOME/deploy`.

Expand Down
Expand Up @@ -21,6 +21,7 @@

import java.util.HashMap;
import java.util.Map;
import java.util.Properties;

import org.apache.camel.EndpointInject;
import org.apache.camel.Exchange;
Expand Down Expand Up @@ -64,6 +65,14 @@ protected String getBlueprintDescriptor() {
return "/OSGI-INF/blueprint/blueprint.xml";
}

@Override
protected Properties useOverridePropertiesWithPropertiesComponent() {
final Properties props = new Properties();
props.put("audit.container", auditContainer);
props.put("input.stream", "seda:foo");
return props;
}

@Test
public void testWithoutJms() throws Exception {

Expand Down
71 changes: 67 additions & 4 deletions fcrepo-camel-webapp/README.md
Expand Up @@ -16,11 +16,11 @@ To build this project with the solr and triplestore indexers, use

MAVEN_OPTS="-Xmx1024m" mvn install -Pis -Pit

To build this project with all three applications, use
To build this project with all four applications, use

MAVEN_OPTS="-Xmx1024m" mvn install -Pis -Pit -Pat
Note: The following syntax is also valid: `mvn install -Pis,it,at`
MAVEN_OPTS="-Xmx1024m" mvn install -Pis -Pit -Pat -Prs

Note: The following syntax is also valid: `mvn install -Pis,it,at,rs`

###Configuration

Expand Down Expand Up @@ -160,6 +160,10 @@ The camel URI for the incoming message stream.

input.stream=activemq:topic:fedora

The camel URI for a reindexing queue.

reindexing.stream=activemq:queue:reindexing

The baseUrl for the Solr server. If using Solr 4.x or better, the URL should include
the core name.

Expand All @@ -169,6 +173,65 @@ The timeframe (in milliseconds) within which new items should be committed to th

solr.commitWithin=10000

##Fedora Reindexing Service

This application implements a reindexing service for other components,
such as fcrepo-indexing-solr or fcrepo-indexing-triplestore.

###Building

To build this project use

MAVEN_OPTS="-Xmx1024m" mvn install -Prs

###Configuration

A number of application values can be configured externally, through
system properties. These include:

The prefix for the exposed REST endpoint

fcrepo.reindexing.prefix=/reindexing

The port used for the REST endpoint

fcrepo.dynamic.reindexing.port=9080

Alternately, the application can be configured by updating the `application.properties`
configuration file in the unpacked `WEB-INF/classes/application.properties` file.
The following values are available for configuration:

In the event of failure, the maximum number of times a redelivery will be attempted.

error.maxRedeliveries=10

If the fedora repository requires authentication, the following values
can be set:

fcrepo.authUsername=<username>
fcrepo.authPassword=<password>
fcrepo.authHost=<host realm>

The baseUrl for the fedora repository.

fcrepo.baseUrl=localhost:8080/fcrepo/rest

The JMS connection URI, used for connecting to a local or remote ActiveMQ broker.

jms.brokerUrl=tcp://localhost:61616

The camel URI for the internal processing queue.

input.stream=activemq:queue:reindexing

The prefix for the REST endpoint.

rest.prefix=/reindexing

The port for the REST endpoint.

rest.port=9080

##Further Information
For more help see the Apache Camel documentation

Expand Down
19 changes: 18 additions & 1 deletion fcrepo-camel-webapp/pom.xml
Expand Up @@ -25,8 +25,10 @@
<is.profile />
<!--Indexing Triplestore-->
<it.profile />
<!--Reindexing Service-->
<rs.profile />

<build.profile.names>${at.profile}${is.profile}${it.profile}</build.profile.names>
<build.profile.names>${at.profile}${is.profile}${it.profile}${rs.profile}</build.profile.names>
</properties>

<dependencies>
Expand Down Expand Up @@ -144,6 +146,21 @@
</dependencies>
</profile>

<profile>
<!--Reindexing Service module-->
<id>rs</id>
<properties>
<rs.profile>-rs</rs.profile>
</properties>
<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>fcrepo-reindexing</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>

</profiles>

</project>
7 changes: 7 additions & 0 deletions fcrepo-camel-webapp/src/main/resources/application.properties
Expand Up @@ -14,9 +14,11 @@ indexing.predicate=${fcrepo.onlyIndexableObjects:false}
fcrepo.defaultTransform=default
solr.baseUrl=${solr.base.url:localhost:8983/solr/collection1}
solr.commitWithin=10000
solr.reindex.stream=activemq:queue:solr.reindex

# Indexing Triplestore module
triplestore.namedGraph=
triplestore.reindex.stream=activemq:queue:triplestore.reindex

# Triplestore modules
triplestore.baseUrl=${fcrepo.audit.triplestore.baseUrl:localhost:3030/test/update}
Expand All @@ -25,3 +27,8 @@ triplestore.baseUrl=${fcrepo.audit.triplestore.baseUrl:localhost:3030/test/updat
event.baseUri=${fcrepo.audit.baseUri:http://example.com/event}
audit.container=${fcrepo.audit.container:/audit}

# Reindexing module
rest.prefix=${fcrepo.reindexing.prefix:/reindexing}
rest.port=${fcrepo.dynamic.reindexing.port:9080}
reindexing.stream=activemq:queue:reindexing

7 changes: 4 additions & 3 deletions fcrepo-indexing-solr/README.md
Expand Up @@ -22,12 +22,13 @@ This project can be deployed in an OSGi container. For example using
[Apache Karaf](http://karaf.apache.org), you can run the following
command from its shell:

osgi:install -s mvn:org.fcrepo.camel/indexing-solr/{VERSION}
feature:repo-add mvn:org.fcrepo.camel/fcrepo-camel-toolbox/LATEST/xml/features
feature:install fcrepo-indexing-solr

##Configuration

The application can be configured by creating a file in
`$KARAF_HOME/etc/org.fcrepo.camel.indexing.triplestore.cfg`. The following
`$KARAF_HOME/etc/org.fcrepo.camel.indexing.solr.cfg`. The following
values are available for configuration:

In the event of failure, the maximum number of times a redelivery will be attempted.
Expand All @@ -43,7 +44,7 @@ can be set:

The baseUrl for the fedora repository.

fcrepo.baseUrl=localhost:8080/fcrepo4/rest
fcrepo.baseUrl=localhost:8080/fcrepo/rest

The default `LDPath` transformation to use. This is overridden on a per-object
basis with the `indexing:hasIndexingTransformation` predicate.
Expand Down
7 changes: 6 additions & 1 deletion fcrepo-indexing-solr/pom.xml
Expand Up @@ -33,7 +33,7 @@
<groupId>org.apache.camel</groupId>
<artifactId>camel-http4</artifactId>
</dependency>

<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-camel</artifactId>
Expand Down Expand Up @@ -154,6 +154,11 @@
<artifactId>jetty-maven-plugin</artifactId>
<configuration>
<systemProperties>
<force>true</force>
<systemProperty>
<name>fcrepo.home</name>
<value>${project.build.directory}/fcrepo-data</value>
</systemProperty>
<systemProperty>
<name>solr.solr.home</name>
<value>${project.build.directory}/test-classes/solr</value>
Expand Down
Expand Up @@ -18,19 +18,17 @@
import static org.apache.camel.builder.PredicateBuilder.not;
import static org.apache.camel.builder.PredicateBuilder.or;
import static org.fcrepo.camel.FcrepoHeaders.FCREPO_TRANSFORM;
import static org.fcrepo.camel.HttpMethods.POST;
import static org.fcrepo.camel.JmsHeaders.EVENT_TYPE;
import static org.fcrepo.camel.JmsHeaders.IDENTIFIER;
import static org.fcrepo.camel.RdfNamespaces.INDEXING;
import static org.fcrepo.camel.RdfNamespaces.RDF;
import static org.fcrepo.camel.RdfNamespaces.REPOSITORY;
import static org.slf4j.LoggerFactory.getLogger;

import org.apache.camel.Exchange;
import org.apache.camel.LoggingLevel;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.builder.xml.Namespaces;
import org.apache.camel.builder.xml.XPathBuilder;
import org.fcrepo.camel.HttpMethods;
import org.fcrepo.camel.RdfNamespaces;
import org.slf4j.Logger;

/**
Expand All @@ -49,13 +47,18 @@ public class SolrRouter extends RouteBuilder {
*/
public void configure() throws Exception {

final Namespaces ns = new Namespaces("rdf", RDF);
ns.add("indexing", INDEXING);
final Namespaces ns = new Namespaces("rdf", RdfNamespaces.RDF);
ns.add("indexing", RdfNamespaces.INDEXING);
ns.add("ldp", RdfNamespaces.LDP);

final XPathBuilder indexable = new XPathBuilder(
String.format("/rdf:RDF/rdf:Description/rdf:type[@rdf:resource='%s']", INDEXING + "Indexable"));
String.format(
"/rdf:RDF/rdf:Description/rdf:type[@rdf:resource='%s']", RdfNamespaces.INDEXING + "Indexable"));
indexable.namespaces(ns);

final XPathBuilder children = new XPathBuilder("/rdf:RDF/rdf:Description/ldp:contains");
children.namespaces(ns);

/**
* A generic error handler (specific to this RouteBuilder)
*/
Expand All @@ -69,21 +72,35 @@ public void configure() throws Exception {
*/
from("{{input.stream}}")
.routeId("FcrepoSolrRouter")
.choice()
.when(header(EVENT_TYPE).isEqualTo(RdfNamespaces.REPOSITORY + "NODE_REMOVED"))
.to("direct:delete.solr")
.otherwise()
.to("direct:index.solr");

/**
* Handle re-index events
*/
from("{{solr.reindex.stream}}")
.routeId("FcrepoSolrReindex")
.to("direct:index.solr");

/**
* Based on an item's metadata, determine if it is indexable.
*/
from("direct:index.solr")
.routeId("FcrepoSolrIndexer")
.removeHeaders("CamelHttp*")
.filter(not(or(header(IDENTIFIER).startsWith(simple("{{audit.container}}/")),
header(IDENTIFIER).isEqualTo(simple("{{audit.container}}")))))
.to("fcrepo:{{fcrepo.baseUrl}}?preferOmit=PreferContainment")
.setHeader(FCREPO_TRANSFORM).xpath(hasIndexingTransform, String.class, ns)
.removeHeaders("CamelHttp*")
.choice()
.when(header(EVENT_TYPE).isEqualTo(REPOSITORY + "NODE_REMOVED"))
.to("direct:delete.solr")
.when(or(simple("{{indexing.predicate}} != 'true'"), indexable))
.to("direct:update.solr")
.otherwise()
.removeHeaders("CamelHttp*")
.to("fcrepo:{{fcrepo.baseUrl}}")
.setHeader(FCREPO_TRANSFORM).xpath(hasIndexingTransform, String.class, ns)
.removeHeaders("CamelHttp*")
.choice()
.when(or(simple("{{indexing.predicate}} != 'true'"), indexable))
.to("direct:update.solr")
.otherwise()
.to("direct:delete.solr");
.to("direct:delete.solr");

/**
* Remove an item from the solr index.
Expand All @@ -105,8 +122,9 @@ public void configure() throws Exception {
"Indexing Solr Object ${headers[CamelFcrepoIdentifier]} " +
"${headers[org.fcrepo.jms.identifier]}")
.to("fcrepo:{{fcrepo.baseUrl}}?transform={{fcrepo.defaultTransform}}")
.setHeader(Exchange.HTTP_METHOD).constant(POST)
.setHeader(Exchange.HTTP_METHOD).constant(HttpMethods.POST)
.setHeader(Exchange.HTTP_QUERY).simple("commitWithin={{solr.commitWithin}}")
.to("http4://{{solr.baseUrl}}/update");

}
}
Expand Up @@ -14,11 +14,12 @@
<cm:property name="fcrepo.authUsername" value=""/>
<cm:property name="fcrepo.authPassword" value=""/>
<cm:property name="fcrepo.authHost" value=""/>
<cm:property name="fcrepo.baseUrl" value="localhost:8080/fcrepo4/rest"/>
<cm:property name="fcrepo.baseUrl" value="localhost:8080/fcrepo/rest"/>
<cm:property name="fcrepo.defaultTransform" value="default"/>
<cm:property name="indexing.predicate" value="false"/>
<cm:property name="jms.brokerUrl" value="tcp://localhost:61616"/>
<cm:property name="input.stream" value="activemq:topic:fedora"/>
<cm:property name="solr.reindex.stream" value="activemq:queue:solr.reindex"/>
<cm:property name="solr.baseUrl" value="localhost:8983/solr/collection1"/>
<cm:property name="solr.commitWithin" value="10000"/>
<cm:property name="audit.container" value="/audit"/>
Expand All @@ -37,7 +38,7 @@
<property name="authHost" value="${fcrepo.authHost}"/>
</bean>

<camelContext xmlns="http://camel.apache.org/schema/blueprint">
<camelContext id="FcrepoSolrIndexer" xmlns="http://camel.apache.org/schema/blueprint">
<package>org.fcrepo.camel.indexing.solr</package>
</camelContext>

Expand Down

0 comments on commit 170c289

Please sign in to comment.