Skip to content

Inconsistent skipping behavior in TextReuseCorpus #90

@tylerandrewscott

Description

@tylerandrewscott

I am encountering an issue using the TextReuseCorpus function where I feed in a vector of texts (using the "text = " option in the function, and: (1) receive a warning of skipped texts due to insufficient length on character strings that should be long enough; and (2) get a different number of skip warnings each time. I am reading in a large vector (>300,000) of texts, ranging from 155 to 9900 characters, and usually 30k to 150k are skipped for being too short. I can take these same skipped strings, run TextReuseCorpus on them, and they'll be fine this time around. Perhaps I'm simply doing something wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions