For very large corpora, it would be good to have a database backend so things don't have to stay in memory.