-
associated Journal of Open Source Software paper now published and should be used as reference.
-
contentmask()also accepts sentence-tokenised corpora (the outputs oftokenize_sents()) as input, this option also allowing parallel processing. -
chunk_texts()now outputs texts that keep the same spaces present in the original and no longer outputs texts with spaces around the punctuation marks. -
chunk_texts()now also accepts tokens objects as input and in that case returns chunks of sentences for which the total length is equal or greater than the one specified. -
vectorize()and two functions that call it (delta()andngram_tracing()) now have a new argument called 'cross_boundaries'. If FALSE, n-grams do not cross sentence boundaries (which was the default behaviour in previous versions). This change simply means that the user can now choose to cross sentence boundaries when making n-grams if they wish. The behaviour of these functions is therefore also now clearer. -
the progres bar is now optional for all authorship analysis functions (but default is set to TRUE).
-
minor bug fixes
-
contentmask()no longer has the option to replace ASCII; removed dependency ontextcleanpackage. -
contentmask()used with the "frames" algorithm now adopts the Universal POS-tags, making it more compatible with other languages. -
create_corpus()tests for the correct syntax of the file names and returns an error if not correct (plus showing which file names are incorrect). -
create_corpus()includes an argument to specify the encoding of the texts.
-
minor bug fixes
-
concordance()now can take sentences as input and will also show sentence boundaries -
lambdaG_visualize()can now the text heatmap either with sentences ordered by lambdaG values (default) or by the original order of the sentences in the text -
lambdaG_visualize()can now visualize negative lambdaG values in an html file -
ngram_tracing()contained a major bug when performing tests with multiple known authors which would lead to anomalously high and incorrect performance statistics. This has been fixed. -
performance()progress bar now can be optional -
performance()can run leave-one-out by author rather than just by text
- Fixed issues after CRAN review.
- Initial CRAN submission.