Hi @patrick-wilken ,
I think it would be great to offer the opportunity to score statistical significance between two hypotheses. This can be done with bootstrap resampling, even though the main challenge would be to understand how to sample from SRT files, as there is (in general) no alignment between SRTs generated by two systems nor with the references. Do you have comments/ideas on how to do this? I can also assist with the implementation.
Thanks,
Marco