NMF topic models

Topic modeling with Non-negative matrix factorization

Minimizes the following loss function using multiplicative updates:

$||X-WH||^{2}_{Fro} + \alpha(||W||_1+||H||_1)$

Let D be no. of documents and V be the vocab size. X is (D x V) data matrix. Each row is a document and each column is a feature, e.g. textual/visual word. W is document-topic matrix of dimension (D x K) where K is the number of topics. H is topic-word matrix of dimension (K x V).

The function that does the NMF is called JAL_NMF. See toy_demo.py for example usage with fake toy data with dense X. For real data, X should be a sparse matrix, e.g. scipy.sparse.csr_matrix.

See topics.py for an example that loads a small set of text data ('text.txt'), forms sparse matrix X and infers the topics. Top words for each topics are printed out.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
jpp.py		jpp.py
text.txt		text.txt
topics.py		topics.py
toy_demo.py		toy_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NMF topic models

About

Uh oh!

Releases

Packages

Languages

cwenhaw/topic-models

Folders and files

Latest commit

History

Repository files navigation

NMF topic models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages