-
Notifications
You must be signed in to change notification settings - Fork 3
Description
This issue came to light when using a Pepper importer which relied on SCorpusGraphImpl's default implementation of naming documents:
String namePart = null;
namePart = document.getName();
if (Strings.isNullOrEmpty(namePart)) {
namePart = "doc_" + getCorpora().size();
}
GraphFactory.createIdentifier(document, URI.createURI(corpus.getId() + "/" + namePart).toString());
Relying on this implementation produced ConcurrentModificationExceptions during runtime, which seems to have been fixed when explicitly setting document names, in this case, falling back on PepperImporterImpl's default implementation of naming documents in #importCorpusStructureRec(URI currURI, SCorpus parent):
SDocument sDocument = null;
if (docFile.isDirectory()) {
sDocument = getCorpusGraph().createDocument(parent, currURI.lastSegment());
} else {
// if uri is a file, cut off file ending
sDocument = getCorpusGraph().createDocument(parent,
currURI.lastSegment().replace("." + currURI.fileExtension(), ""));
}Especially the line namePart = "doc_" + getCorpora().size(); doesn't quite sit right, as at any time a document named doc_n could be created while getCorpora().size() < n, and once getCorpora().size() == n we'd have two documents of the same name, which would be translated into the document ID, which in turn is used by Pepper to calculate execution paths. If the two documents sit in the same corpus, this is likely to trigger a ConcurrentModificationException on lists in either of the documents' lists of nodes, etc., as has been encountered above.