Skip to content

Calculating CTC CPMI #3

@vantesy

Description

@vantesy

Hi, i recently found your measures in a literature review for my masterthesis and was trying to apply it on the 20Newsgroup dataset with the code you provided. I transformed everything using the preprocessing steps from the example notebook and inputed the topics (i got them from a BERTopic model) as you defined in the example (as a list of lists of topic words). After training the cpmi tree using colab for quite a long time on gpu power, i got the results for the ctc cpmi which were over 274.01 for my topics ( i had nearly 90 topics and the cpmi tree was calculated based on 86,716 segments.) I tried again with only a small percentage of the documents (resulting in 29,910 segments) and it resulted in a ctc cpmi score of 95.62. In your paper the ctc cpmi lies below zero for the BERTopic model and in the origine paper for the cpmi the score is also not higher than 20. I looked into the code but found no fault so i wonderd whether those result make sense or wether i need to do an other averaging step afterwards?

Thank you for your answer in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions