I noticed that the perplexity is calculated in the following way:
ppl = math.exp(-ll/(n_sentences + n_words - n_oovs))
in eval.py
However, in the book Speech and Language Processing https://web.stanford.edu/~jurafsky/slp3/4.pdf, the perplexity is defined as
The perplexity of a language model on a test set is the inverse probability of the test set, normalized by the number of words.
I'm not sure why the sentence count is also added in addition to the number of words.
Also, I believe the normal treatment of oovs is to put them all as an <UNK> token and train the n-gram on that token as well. Is this just a simplification in vpyp?
Thank you.