Skip to content

[SaltXMLExporter]: SCorpusGraphImpl's naming of documents seems broken #128

@sdruskat

Description

@sdruskat

This issue came to light when using a Pepper importer which relied on SCorpusGraphImpl's default implementation of naming documents:

String namePart = null;
namePart = document.getName();
if (Strings.isNullOrEmpty(namePart)) {
    namePart = "doc_" + getCorpora().size();
}
GraphFactory.createIdentifier(document, URI.createURI(corpus.getId() + "/" + namePart).toString());

Relying on this implementation produced ConcurrentModificationExceptions during runtime, which seems to have been fixed when explicitly setting document names, in this case, falling back on PepperImporterImpl's default implementation of naming documents in #importCorpusStructureRec(URI currURI, SCorpus parent):

SDocument sDocument = null;
if (docFile.isDirectory()) {
    sDocument = getCorpusGraph().createDocument(parent, currURI.lastSegment());
} else {
    // if uri is a file, cut off file ending
    sDocument = getCorpusGraph().createDocument(parent,
            currURI.lastSegment().replace("." + currURI.fileExtension(), ""));
}

Especially the line namePart = "doc_" + getCorpora().size(); doesn't quite sit right, as at any time a document named doc_n could be created while getCorpora().size() < n, and once getCorpora().size() == n we'd have two documents of the same name, which would be translated into the document ID, which in turn is used by Pepper to calculate execution paths. If the two documents sit in the same corpus, this is likely to trigger a ConcurrentModificationException on lists in either of the documents' lists of nodes, etc., as has been encountered above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions