Navigation Menu

Skip to content

Commit

Permalink
Add support for migrating R and X 'local.fedora.server' urls.
Browse files Browse the repository at this point in the history
  • Loading branch information
mikedurbin authored and Andrew Woods committed May 15, 2015
1 parent b053cfe commit 3deb4d6
Show file tree
Hide file tree
Showing 17 changed files with 284 additions and 50 deletions.
22 changes: 11 additions & 11 deletions README.md
Expand Up @@ -20,27 +20,27 @@ Background work
* If so, you will need all of the export FOXML in a known directory.
* Will you be migrating from from a native fcrepo3 filesystem?
* If so, fcrepo3 should not be running, and you will need to determine if you're using legacy or akubra storage
* Determine your fcrepo4 url (ex: http://localhost:8080/rest/, http://yourHostName.ca:8080/fcrepo/rest/)
* There is currently only one implemented pid-mapping strategy, but you can configure it to put all of your migrated content under a given path ([line 90](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L90), sets that value to "migrated-fedora3").
* Determine your fcrepo4 url (ex: http://localhost:8080/rest/, http://yourHostName.ca:8080/fcrepo/rest/) ([line 140](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L140)
* There is currently only one implemented pid-mapping strategy, but you can configure it to put all of your migrated content under a given path ([line 93](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L93), sets that value to "migrated-fedora3").

Getting started:
* Clone the repository `https://github.com/fcrepo4-labs/migration-utils.git`
* Edit the Spring XML configuration in your editor of choice (`src/main/resources/spring/migration-bean.xml`).
* If you are migrating from exported FOXML, you will leave [line 9](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L9).
* If you are migrating from a native fcrepo4 file system, you will need to change `exportedFoxmlDirectoryObjectSource` to `nativeFoxmlDirectoryObjectSource` in [line 9](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L9).
* If you are migrating from a native fcrepo3 file system, you will need to set the paths to the `objectStore` and `datastreamStore` ([Lines 117-123](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L117-L123)).
* If you are migrating from exported FOXML, you will need to set the path to the directory you have them stored in ([Lines 125-127](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L125-L127)).
* Set your fcrepo4 url ([Lines 99-102](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L99-L102)).
* If you are migrating from a native fcrepo3 file system, you will need to change `exportedFoxmlDirectoryObjectSource` to `nativeFoxmlDirectoryObjectSource` in [line 9](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L9).
* If you are migrating from a native fcrepo3 file system, you will need to set the paths to the `objectStore` and `datastreamStore` ([Lines 143-139](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L143-L149)).
* If you are migrating from exported FOXML, you will need to set the path to the directory you have them stored in ([Lines 151-153](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L151-L153)).
* Set your fcrepo4 url ([Line 140](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L140)).
* If you would like to run the migration in test mode (console logging), you will leave [lines 11-16](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L11-L16) as is.
* If you would like to run the migration, you will need to comment out or remove [line 9](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L11), and uncomment [line 15](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L15).
* If you would like to run the migration, you will need to comment out or remove [line 9](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L11), and uncomment [line 15](https://github.com/fcrepo4-labs/migration-utils/blob/master/src/main/resources/spring/migration-bean.xml#L15).


To run the migration scenario you have configured in the Spring XML configuration file:
To run the migration scenario you have configured in the Spring XML configuration file:

```
mvn clean compile exec:java -Dexec.mainClass=org.fcrepo.migration.Migrator
```
mvn clean compile exec:java -Dexec.mainClass=org.fcrepo.migration.Migrator
```

## Additional Documentation

* [wiki](https://wiki.duraspace.org/display/FF/Fedora+3+to+4+Data+Migration)
18 changes: 18 additions & 0 deletions src/main/java/org/fcrepo/migration/ExternalContentURLMapper.java
@@ -0,0 +1,18 @@
package org.fcrepo.migration;

/**
* An interface defining a method to replace one URL (represented as a String) with another.
* In the context of migrating objects from fedora 3 to fedora 4, there may be a need to
* make programmatic updates to the URLs founds in External or Redirect datastreams. This
* interface is for that purpose.
*
* @author Mike Durbin
*/
public interface ExternalContentURLMapper {

/**
* Gets the String containing a URL that should be used instead of the given String
* for migrated external or redirect datastreams.
*/
public String mapURL(String url);
}
8 changes: 8 additions & 0 deletions src/main/java/org/fcrepo/migration/MigrationIDMapper.java
Expand Up @@ -33,4 +33,12 @@ public interface MigrationIDMapper {
*/
public String mapDatastreamPath(String pid, String dsid);

/**
* Gets the fedora 4 base URL. Paths returned by
* {@link #mapDatastreamPath} and {@link #mapObjectPath}
* appended to this value will be resolvable URLs in the
* fedora 4 repository.
*/
public String getBaseURL();

}
Expand Up @@ -18,13 +18,19 @@ public class ArchiveExportedFoxmlDirectoryObjectSource implements ObjectSource {
private File root;

private URLFetcher fetcher;

private String localFedoraServer;

/**
* archive exported foxml directory object source.
* @param exportDir the export directory
* @param localFedoraServer the domain and port for the server that hosted the fedora objects in the format
* "localhost:8080".
*/
public ArchiveExportedFoxmlDirectoryObjectSource(final File exportDir) {
public ArchiveExportedFoxmlDirectoryObjectSource(final File exportDir, final String localFedoraServer) {
this.root = exportDir;
this.fetcher = new HttpClientURLFetcher();
this.localFedoraServer = localFedoraServer;
}

/**
Expand All @@ -37,6 +43,6 @@ public void setFetcher(final URLFetcher fetcher) {

@Override
public Iterator<FedoraObjectProcessor> iterator() {
return new FoxmlDirectoryDFSIterator(root, fetcher);
return new FoxmlDirectoryDFSIterator(root, fetcher, localFedoraServer);
}
}
Expand Up @@ -52,6 +52,8 @@ public class Foxml11InputStreamFedoraObjectProcessor implements FedoraObjectProc

private URLFetcher fetcher;

private String localFedoraServer;

private InternalIDResolver idResolver;

private XMLStreamReader reader;
Expand All @@ -69,12 +71,15 @@ public class Foxml11InputStreamFedoraObjectProcessor implements FedoraObjectProc
* @param is the input stream
* @param fetcher the fetcher
* @param resolver the resolver
* @param localFedoraServer the host and port (formatted like "localhost:8080") of the fedora 3 server
* from which the content exposed by the "is" parameter comes.
* @throws XMLStreamException xml stream exception
*/
public Foxml11InputStreamFedoraObjectProcessor(final InputStream is, final URLFetcher fetcher,
final InternalIDResolver resolver) throws XMLStreamException {
final InternalIDResolver resolver, final String localFedoraServer) throws XMLStreamException {
this.fetcher = fetcher;
this.idResolver = resolver;
this.localFedoraServer = localFedoraServer;
final XMLInputFactory factory = XMLInputFactory.newFactory();
reader = factory.createXMLStreamReader(is);
reader.nextTag();
Expand Down Expand Up @@ -302,7 +307,11 @@ public Foxml11DatastreamVersion(final DatastreamInfo dsInfo,
dsContent = idResolver.resolveInternalID(attributes.get("REF"));
} else {
try {
dsContent = new URLCachedContent(new URL(attributes.get("REF")), fetcher);
String ref = attributes.get("REF");
if (ref.contains("local.fedora.server")) {
ref = ref.replace("local.fedora.server", localFedoraServer);
}
dsContent = new URLCachedContent(new URL(ref), fetcher);
} catch (final MalformedURLException e) {
throw new RuntimeException(e);
}
Expand Down
Expand Up @@ -27,25 +27,31 @@ public class FoxmlDirectoryDFSIterator implements Iterator<FedoraObjectProcessor
private InternalIDResolver resolver;
private URLFetcher fetcher;

private String localFedoraServer;

/**
* foxml directory DFS iterator.
* @param root the root file
* @param fetcher the fetcher
*/
public FoxmlDirectoryDFSIterator(final File root, final URLFetcher fetcher) {
public FoxmlDirectoryDFSIterator(final File root, final URLFetcher fetcher, final String localFedoraServer) {
stack = new Stack<List<File>>();
current = new ArrayList<File>(Arrays.asList(root.listFiles()));
this.fetcher = fetcher;
this.localFedoraServer = localFedoraServer;
}

/**
* foxml directory DFS iterator with three parameters
* @param root the root file
* @param resolver the resolver
* @param fetcher the fetcher
* @param localFedoraServer the domain and port for the server that hosted the fedora objects in the format
* "localhost:8080".
*/
public FoxmlDirectoryDFSIterator(final File root, final InternalIDResolver resolver, final URLFetcher fetcher) {
this(root, fetcher);
public FoxmlDirectoryDFSIterator(final File root, final InternalIDResolver resolver, final URLFetcher fetcher,
final String localFedoraServer) {
this(root, fetcher, localFedoraServer);
this.resolver = resolver;
}

Expand Down Expand Up @@ -79,7 +85,7 @@ public FedoraObjectProcessor next() {
} else {
try {
return new Foxml11InputStreamFedoraObjectProcessor(
new FileInputStream(current.remove(0)), fetcher, resolver);
new FileInputStream(current.remove(0)), fetcher, resolver, localFedoraServer);
} catch (final XMLStreamException e) {
throw new RuntimeException(e);
} catch (final FileNotFoundException e) {
Expand Down
Expand Up @@ -20,6 +20,8 @@ public class NativeFoxmlDirectoryObjectSource implements ObjectSource {

private File root;

private String localFedoraServer;

/**
* A constructor for use with the data storage directories that underly a
* fedora 3.x repository. First, this constructor will build an index of
Expand All @@ -28,12 +30,15 @@ public class NativeFoxmlDirectoryObjectSource implements ObjectSource {
* @param objectStore a directory containing just directories and FOXML files
* @param resolver an InternalIDResolver implementation that can resolve
* references to internally managed datastreams.
* @param localFedoraServer the domain and port for the server that hosted the fedora objects in the format
* "localhost:8080".
*/
public NativeFoxmlDirectoryObjectSource(final File objectStore,
final InternalIDResolver resolver) throws IOException {
final InternalIDResolver resolver, final String localFedoraServer) throws IOException {
this.root = objectStore;
this.resolver = resolver;
this.fetcher = new HttpClientURLFetcher();
this.localFedoraServer = localFedoraServer;
}

/**
Expand All @@ -46,7 +51,7 @@ public void setFetcher(final URLFetcher fetcher) {

@Override
public Iterator<FedoraObjectProcessor> iterator() {
return new FoxmlDirectoryDFSIterator(root, resolver, fetcher);
return new FoxmlDirectoryDFSIterator(root, resolver, fetcher, localFedoraServer);
}

}
Expand Up @@ -23,12 +23,14 @@
import org.fcrepo.client.FedoraResource;
import org.fcrepo.kernel.RdfLexicon;
import org.fcrepo.migration.DatastreamVersion;
import org.fcrepo.migration.ExternalContentURLMapper;
import org.fcrepo.migration.FedoraObjectVersionHandler;
import org.fcrepo.migration.MigrationIDMapper;
import org.fcrepo.migration.ObjectProperty;
import org.fcrepo.migration.ObjectReference;
import org.fcrepo.migration.ObjectVersionReference;
import org.fcrepo.migration.foxml11.DC;
import org.fcrepo.migration.urlmappers.SelfReferencingURLMapper;
import org.slf4j.Logger;

import javax.xml.bind.JAXBException;
Expand Down Expand Up @@ -66,14 +68,18 @@ public class BasicObjectVersionHandler implements FedoraObjectVersionHandler {

private boolean importRedirect;

private ExternalContentURLMapper externalContentUrlMapper;

/**
* Basic object version handler.
* @param repo the fedora repository
* @param idMapper the id mapper
*/
public BasicObjectVersionHandler(final FedoraRepository repo, final MigrationIDMapper idMapper) {
public BasicObjectVersionHandler(final FedoraRepository repo, final MigrationIDMapper idMapper,
final String localFedoraServer) {
this.repo = repo;
this.idMapper = idMapper;
this.externalContentUrlMapper = new SelfReferencingURLMapper(localFedoraServer, idMapper);
}

/**
Expand Down Expand Up @@ -137,7 +143,8 @@ public void processObjectVersions(final Iterable<ObjectVersionReference> version
|| (v.getDatastreamInfo().getControlGroup().equals("R") && !importRedirect)) {
repo.createOrUpdateRedirectDatastream(
idMapper.mapDatastreamPath(v.getDatastreamInfo().getObjectInfo().getPid(),
v.getDatastreamInfo().getDatastreamId()), v.getExternalOrRedirectURL());
v.getDatastreamInfo().getDatastreamId()),
externalContentUrlMapper.mapURL(v.getExternalOrRedirectURL()));
} else {
FedoraDatastream ds = dsMap.get(v.getDatastreamInfo().getDatastreamId());
if (ds == null) {
Expand Down
@@ -1,4 +1,4 @@
package org.fcrepo.migration.idmapers;
package org.fcrepo.migration.idmappers;

import org.fcrepo.migration.MigrationIDMapper;

Expand All @@ -16,6 +16,8 @@
*/
public class SimpleIDMapper implements MigrationIDMapper {

private String baseUrl;

private String rootPath;

private int charDepth;
Expand All @@ -24,7 +26,8 @@ public class SimpleIDMapper implements MigrationIDMapper {
* simple ID mapper.
* @param rootPath the root path
*/
public SimpleIDMapper(final String rootPath) {
public SimpleIDMapper(final String baseUrl, final String rootPath) {
this.baseUrl = baseUrl;
this.rootPath = rootPath;
charDepth = 2;
}
Expand Down Expand Up @@ -75,4 +78,9 @@ private String pidToPath(final String pid) {
public String mapDatastreamPath(final String pid, final String dsid) {
return pidToPath(pid) + '/' + dsid;
}

@Override
public String getBaseURL() {
return this.baseUrl;
}
}
@@ -0,0 +1,74 @@
package org.fcrepo.migration.urlmappers;

import org.fcrepo.migration.ExternalContentURLMapper;
import org.fcrepo.migration.MigrationIDMapper;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* An ExternalContentURLMapper implementation that updates redirects that point to the
* fedora repository in which they originated to the destination of that pointed-to resource
* in the fedora 4 repository to which the content is being migrated.
*
* For example, if "http://localhost:8080/fedora/objects/object:1/datastreams/POLICY" was a
* redirect datastream in fedora 3 that redirected to
* "http://localhost:8080/fedora/objects/policy:1/datastreams/XACML/content", this class would
* supply the URL for the content of the migrated XACML datastream on the migrated policy:1
* object.
*
* @author Mike Durbin
*/
public class SelfReferencingURLMapper implements ExternalContentURLMapper {

private static final String OLD_DS_CONTENT_URL_PATTERN = "http://{local-fedora-server}/fedora/get/([^/]+)/(.+)";
private static final String NEW_DS_CONTENT_URL_PATTERN
= "http://{local-fedora-server}/fedora/objects/([^/]+)/datastreams/([^/]+)/content";

private List<Pattern> contentPatterns;

/**
* A pattern that is compared after the content patterns, and if it matches,
* an exception is thrown. This is implemented to allow an error to be thrown
* if any unmatched URLs that reference the fedora 3 repository are found; a
* case that generally indicates a configuration error in the migration scenario.
*/
private Pattern invalidPattern;

private MigrationIDMapper idMapper;

/**
* Basic constructor.
* @param localFedoraServer the domain and port for the server that hosted the fedora objects in the format
* "localhost:8080".
* @param idMapper the MigrationIDMapper used for the current migration scenario
*/
public SelfReferencingURLMapper(final String localFedoraServer, final MigrationIDMapper idMapper) {
this.contentPatterns = new ArrayList<>();
this.contentPatterns.add(parsePattern(OLD_DS_CONTENT_URL_PATTERN, localFedoraServer));
this.contentPatterns.add(parsePattern(NEW_DS_CONTENT_URL_PATTERN, localFedoraServer));
this.idMapper = idMapper;

this.invalidPattern = parsePattern("http://{local-fedora-server}/fedora/.*", localFedoraServer);
}

private Pattern parsePattern(final String pattern, final String localFedoraServer) {
return Pattern.compile(pattern.replace("{local-fedora-server}", localFedoraServer));
}

@Override
public String mapURL(final String url) {
for (Pattern p : contentPatterns) {
final Matcher m = p.matcher(url);
if (m.matches()) {
return idMapper.getBaseURL() + idMapper.mapDatastreamPath(m.group(1), m.group(2));
}
}
if (invalidPattern.matcher(url).matches()) {
throw new IllegalArgumentException("Unhandled internal external fedora 3 URL. (" + url + ")");
}
return url;
}
}

0 comments on commit 3deb4d6

Please sign in to comment.