Skip to content

GMM classifier with multiple mixture components per accent #3

@sravanareddy

Description

@sravanareddy

This setup is a bit different from our previous experiment where we had a single GMM with each mixture component corresponding to an accent.

Instead, we'll build distinct GMM models for each accent. Rather than taking the average of the time frames for each speaker, keep all the frames.

Each GMM component is meant to (very approximately) represent a phone. Of course, we don't know what the phones are and which time slice is which phone -- the trick is to try to figure this out automatically with EM. Before Thursday, skim chapter 9 (9.1 and 9.2) in the Bishop PRML textbook to learn about clustering and EM.

We're not going to use the .predict() method of the GMM class at all, since that only tells use which component is the best fit. We don't care about this, since in our new setup, the components are the phones, and the GMMs as a whole are the accents.

Instead, when it comes to testing, compute the log probability (remember Naive Bayes?) of each frame of the test sample under each of the GMM models. The winning model is the one with the greatest overall likelihood across all frames.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions