Skip to content

Latest commit

 

History

History
49 lines (25 loc) · 2.38 KB

File metadata and controls

49 lines (25 loc) · 2.38 KB

idiolect (development version)

  • associated Journal of Open Source Software paper now published and should be used as reference.

  • contentmask() also accepts sentence-tokenised corpora (the outputs of tokenize_sents()) as input, this option also allowing parallel processing.

  • chunk_texts() now outputs texts that keep the same spaces present in the original and no longer outputs texts with spaces around the punctuation marks.

  • chunk_texts() now also accepts tokens objects as input and in that case returns chunks of sentences for which the total length is equal or greater than the one specified.

  • vectorize() and two functions that call it (delta() and ngram_tracing()) now have a new argument called 'cross_boundaries'. If FALSE, n-grams do not cross sentence boundaries (which was the default behaviour in previous versions). This change simply means that the user can now choose to cross sentence boundaries when making n-grams if they wish. The behaviour of these functions is therefore also now clearer.

  • the progres bar is now optional for all authorship analysis functions (but default is set to TRUE).

idiolect 1.2.0

  • minor bug fixes

  • contentmask() no longer has the option to replace ASCII; removed dependency on textclean package.

  • contentmask() used with the "frames" algorithm now adopts the Universal POS-tags, making it more compatible with other languages.

  • create_corpus() tests for the correct syntax of the file names and returns an error if not correct (plus showing which file names are incorrect).

  • create_corpus() includes an argument to specify the encoding of the texts.

idiolect 1.1.1

  • minor bug fixes

  • concordance() now can take sentences as input and will also show sentence boundaries

  • lambdaG_visualize() can now the text heatmap either with sentences ordered by lambdaG values (default) or by the original order of the sentences in the text

  • lambdaG_visualize() can now visualize negative lambdaG values in an html file

  • ngram_tracing() contained a major bug when performing tests with multiple known authors which would lead to anomalously high and incorrect performance statistics. This has been fixed.

  • performance() progress bar now can be optional

  • performance() can run leave-one-out by author rather than just by text

idiolect 1.0.1

  • Fixed issues after CRAN review.

idiolect 1.0.0

  • Initial CRAN submission.