Skip to content
This repository was archived by the owner on Oct 13, 2022. It is now read-only.
This repository was archived by the owner on Oct 13, 2022. It is now read-only.

Plan for multi pass n-best rescoring #232

@danpovey

Description

@danpovey

[Guys, I have gym now so I'll submit this and write the rest of this later today. ]

I am creating an issue to describe a plan for multi-pass n-best-list rescoring. This will also require
new code in k2, I'll create a separate issue.
The scenario is that we have a CTC or LF-MMI model and we do the 1st decoding pass from that.
Anything that we can do with lattices, we do first (e.g. including any FST-based LM rescoring).
Let the possibly-LM-rescored lattice be the starting point for the n-best rescoring process.

The first step is to generate a long n-best list for each lattice by calling RandomPaths() with a largish number,
like 1000. We then choose unique paths based on token sequences, where 'token' is whatever type of token
we are using in the transformer and RNNLM-- probably word pieces. That is, we use inner_labels='tokens'
when doing the composition with the CTC topo when making the decoding graph, and these get propagated
to the lattices, so we can use lats.tokens and remove epsilons and pick the unique paths.

I think we could have a data structure called Nbest-- we could draft this in snowfall for now and later move
to k2-- that contains an Fsa and also a _k2.RaggedShape that dictates how each of the paths relate to the
original supervision segments. But I guess we could draft this pipeline without the data structure.

Supposing we have the Nbest with ragged numbers of paths, we can then add epsilon self-loops and
intersect it with the lattices, after moving the 'tokens' to the 'labels' of the lattices; we'd then
get the 1-best path and remove epsilons so that we get an Nbest that has just the best path's
tokens and no epsilons.
(We could define, in class Nbest, a form of intersect() that does the right thing when composing with an Fsa
representing an FsaVec; we might also define wrappers for some Fsa operations so they work also on Nbest).

So at this point we have an Nbest with ragged numbers of paths up to 1000 (depending how many unique
paths we got) and that is just a linear sequence of arcs, one per token; and it has costs defined per
token. (It may also have other types of label and cost that were passively inherited). The way we allocate
these costs, e.g. of epsilons and token-repeats, to each token will of course be a little arbitrary-- it's a function
of how the epsilon removal algorithm works-- and we can try to figure out later on whether it needs to be changed
somehow.

We get the total_scores of this Nbest object; they will be used in determining which ones to use in the first
n-best list that we rescore. We can define its total_scores() function so that it returns it as a ragged array,
which it logically is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions