Why is the sentence count also added when calculating perplexity?

I noticed that the perplexity is calculated in the following way:

`ppl = math.exp(-ll/(n_sentences + n_words - n_oovs))`

in `eval.py`

However, in the book Speech and Language Processing https://web.stanford.edu/~jurafsky/slp3/4.pdf, the perplexity is defined as 

> The perplexity of a language model on a test set is the inverse probability of the test set, normalized by the number of words.

I'm not sure why the sentence count is also added in addition to the number of words.

Also, I believe the normal treatment of oovs is to put them all as an `<UNK>` token and train the n-gram on that token as well. Is this just a simplification in `vpyp`?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is the sentence count also added when calculating perplexity? #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why is the sentence count also added when calculating perplexity? #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions