Adopt corpus_recast() inside vectorize() and then specify in the documentation of other functions that n-grams do not cross sentence boundaries.