Plan for multi pass n-best rescoring

[Guys, I have gym now so I'll submit this and write the rest of this later today. ]

I am creating an issue to describe a plan for multi-pass n-best-list rescoring.   This will also require
new code in k2, I'll create a separate issue.
The scenario is that we have a CTC or LF-MMI model and we do the 1st decoding pass from that.
Anything that we can do with lattices, we do first (e.g. including any FST-based LM rescoring).
Let the possibly-LM-rescored lattice be the starting point for the n-best rescoring process.

The first step is to generate a long n-best list for each lattice by calling RandomPaths() with a largish number, 
like 1000.  We then choose unique paths based on token sequences, where 'token' is whatever type of token 
we are using in the transformer and RNNLM-- probably word pieces.  That is, we use `inner_labels='tokens'` 
when doing the composition with the CTC topo when making the decoding graph, and these get propagated
to the lattices, so we can use lats.tokens and remove epsilons and pick the unique paths.

I think we could have a data structure called Nbest-- we could draft this in snowfall for now and later move
to k2--  that contains an Fsa and also a _k2.RaggedShape that dictates how each of the paths relate to the
original supervision segments.   But I guess we could draft this pipeline without the data structure.

Supposing we have the Nbest with ragged numbers of paths, we can then add epsilon self-loops and
intersect it with the lattices, after moving the 'tokens' to the 'labels' of the lattices; we'd then
get the 1-best path and remove epsilons so that we get an Nbest that has just the best path's 
tokens and no epsilons.
(We could define, in class Nbest, a form of intersect() that does the right thing when composing with an Fsa
representing an FsaVec; we might also define wrappers for some Fsa operations so they work also on Nbest).

So at this point we have an Nbest with ragged numbers of paths up to 1000 (depending how many unique
paths we got) and that is just a linear sequence of arcs, one per token; and it has costs defined per
token.  (It may also have other types of label and cost that were passively inherited).  The way we allocate
these costs, e.g. of epsilons and token-repeats, to each token will of course be a little arbitrary-- it's a function 
of how the epsilon removal algorithm works-- and we can try to figure out later on whether it needs to be changed
somehow.

We get the total_scores of this Nbest object; they will be used in determining which ones to use in the first
n-best list that we rescore.  We can define its total_scores() function so that it returns it as a ragged array, 
which it logically is.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan for multi pass n-best rescoring #232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plan for multi pass n-best rescoring #232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions