Skip to content

the cds formula (context distribution smoothing) is wrong #4

@xiaoouwang

Description

@xiaoouwang

The original formula is
image

However, the code implementation is reversed, see below (the sum is above)

self.d_alpha = np.sum(np.power([self.terms_counts[c] for c in self.terms_counts], self.cds_alpha))
bar.update()
self.terms_counts_cds_powered = {word: self.d_alpha / np.power(self.terms_counts[word], self.cds_alpha) for word in self.terms_counts}

I've made a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions