Using the code for tedlium2_v2 with merged apostroph#218
Using the code for tedlium2_v2 with merged apostroph#218
Conversation
| from i6_experiments.common.tools.sctk import compile_sctk | ||
|
|
||
| RASR_BINARY_PATH = compile_rasr_binaries_i6mode( | ||
| branch="apptainer_tf_2_8", configure_options=["--apptainer-patch=2023-05-08_tensorflow-2.8_v1"] |
There was a problem hiding this comment.
I know this is not your change, but I dont like merging something that will not work, since the branch does not exist anymore. I think we can just remove this and replace this by None and then let the assert do the work.
| RASR_BINARY_PATH = compile_rasr_binaries_i6mode( | ||
| branch="apptainer_tf_2_8", configure_options=["--apptainer-patch=2023-05-08_tensorflow-2.8_v1"] | ||
| ) # use most recent RASR |
There was a problem hiding this comment.
| RASR_BINARY_PATH = compile_rasr_binaries_i6mode( | |
| branch="apptainer_tf_2_8", configure_options=["--apptainer-patch=2023-05-08_tensorflow-2.8_v1"] | |
| ) # use most recent RASR | |
| RASR_BINARY_PATH = None # TODO please set RASR |
| SCTK_BINARY_PATH = compile_sctk(branch="v2.4.12") # use last published version | ||
| SCTK_BINARY_PATH.hash_overwrite = "TEDLIUM2_DEFAULT_SCTK_BINARY_PATH" | ||
|
|
||
| SRILM_PATH = tk.Path("/work/tools/users/luescher/srilm-1.7.3/bin/i686-m64/") |
There was a problem hiding this comment.
Similar to RASR I think this needs to be either changed to some u22 path or just asserted as False
| @@ -0,0 +1,423 @@ | |||
| from i6_core.features.filterbank import filter_width_from_channels | |||
There was a problem hiding this comment.
I will just hijack this line just to make sure: You talked about some possible changes to FSA and Silence etc. for GMM, are they already included here? Or is this for now a straight copy?
|
|
||
|
|
||
| def run_tedlium2_common_baseline( | ||
| alias_prefix="baselines/tedlium2/gmm/common_baseline", |
There was a problem hiding this comment.
should this have the same prefix as the other baseline?
This would cause overwriting if both are enabled. But I would not mind keeping the "true" alias to this file.
| from typing import Dict | ||
|
|
||
| from i6_experiments.common.datasets.tedlium2.constants import CONCURRENT | ||
| from i6_experiments.common.datasets.tedlium2_v2.corpus import get_corpus_object_dict |
There was a problem hiding this comment.
I am waiting for the referenced PR to be merged.
|
|
||
| from i6_experiments.common.datasets.tedlium2.constants import CONCURRENT | ||
| from i6_experiments.common.datasets.tedlium2_v2.corpus import get_corpus_object_dict | ||
| from i6_experiments.common.datasets.tedlium2_v2.lexicon import ( |
| from collections import defaultdict | ||
| from typing import Dict | ||
|
|
||
| from i6_experiments.common.datasets.tedlium2.constants import CONCURRENT |
There was a problem hiding this comment.
While I know this does not change, if we do a "full copy" shouldn't we for completeness also copy this? @JackTemaki opinions?
| from i6_core.corpus.convert import CorpusToTxtJob | ||
| from i6_core.lexicon.conversion import LexiconToWordListJob | ||
|
|
||
| from i6_experiments.common.datasets.tedlium2_v2.corpus import get_bliss_corpus_dict |
There was a problem hiding this comment.
As for data.py, some files are not pushed?
d5d3ebc to
cf5884a
Compare
This is the recipes version that uses the fix from here.