(LDA): sparse initialization rather than uniformly random initialization

The phenomenon of LDA training is that the first several training is very costly, this is largely due to the uniformly random initialization that the word-topic thus doc-topic is quite dense.

There are two approaches：
1. sparse initialization that constraints a word to only a part (like 1%) (randomly) of all topics, and for each tokens of that word, randomly sample from those constrained topics rather than all topics. 
2. First use part of corpus (like 1%) to train several iterations to initialize the word-topic distribution, which should be quite sparse than uniformly random initialization.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(LDA): sparse initialization rather than uniformly random initialization #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(LDA): sparse initialization rather than uniformly random initialization #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions