Skip to content

doc-topic distr. #19

@mhbodell

Description

@mhbodell

Outputen sparad av "save_doc_theta_estimate = true" har fel dimensioner och uutputen visar inte heller proportioner utan counts.

Detta står i README.txt-filen:

Save the a file with document topic theta estimates (will not include zeros)

Unlike Phi means which are sampled with thinning, theta means is just a simple

average of the topic counts in the last iteration divided by the number of

tokens in the document thus there is not theta_burnin or theta_thinning

save_doc_theta_estimate = true
doc_topic_theta_filename = doc_topic_theta.csv

Har en model med 200 ämnen men doc_theta_means filen har 400 kolumner och antal dokument som rader? Varför är antalet kolumner dubbla antalet ämnen i modellen?

Config-file:

configs = Spalias
no_runs = 1

[Spalias]
title = PCPLDA
description = 200 topics with alpha 0.2 and extended priorlist
dataset = data/fb_politics_news.txt
scheme = spalias_priors
seed = 1904
topics = 200
alpha = 0.2
beta = 0.01
iterations = 1500
rare_threshold = 0
batches = 4
topic_batches = 4
topic_interval = 500
start_diagnostic = 200
debug = 0
#log_type_topic_density = true
log_document_density = true
log_phi_density = true
phi_mean_filename = phi-mean.csv
phi_mean_burnin = 20
phi_mean_thin = 5
stoplist = nsc-test/PartiallyCollapsedLDA-8.4.0/stoplist-empty.txt
save_vocabulary = true
vocabulary_filename = lda_vocab.txt
topic_prior_filename = wfw/bash/priors/k200_v7.txt
keep_connecting_punctuation = true
log_topic_indicators = true
save_sampler = false
save_doc_theta_estimate = true
doc_topic_theta_filename = doc_topic_theta.csv
save_phi_mean = true

Jag bifogar en bild av delar av outputen så du ser hur den ser ut.

Screen Shot 2021-04-20 at 10 06 37

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions