Skip to content

(LDA): sparse initialization rather than uniformly random initialization #37

@hucheng

Description

@hucheng

The phenomenon of LDA training is that the first several training is very costly, this is largely due to the uniformly random initialization that the word-topic thus doc-topic is quite dense.

There are two approaches:

  1. sparse initialization that constraints a word to only a part (like 1%) (randomly) of all topics, and for each tokens of that word, randomly sample from those constrained topics rather than all topics.
  2. First use part of corpus (like 1%) to train several iterations to initialize the word-topic distribution, which should be quite sparse than uniformly random initialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions