-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi! Maybe you can help me with the following:
After creating a conda environment with
conda create --name aw_value python=3.10.13
conda activate aw_value
pip install value-nlp
pip install datasets==2.20.0
I am calling
export TASK_NAME=sst2
export PYTHONHASHSEED=1234
python run_glue.py --model_name_or_path roberta-base --task_name $TASK_NAME --do_eval --max_seq_length 128 --per_device_train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 3 --output_dir output/$TASK_NAME/roberta_base --dialect "aave" --morphosyntax --do_train
and get the error
File "/home/uu_cs_nlpsoc/awegmann/StyleTokenizer/src/run_value.py", line 743, in <module>
main()
File "/home/uu_cs_nlpsoc/awegmann/StyleTokenizer/src/run_value.py", line 553, in main
raw_datasets = raw_datasets.map(
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
{
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
k: dataset.map(
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3161, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3552, in _map_single
batch = apply_function_on_filtered_inputs(
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/home/uu_cs_nlpsoc/awegmann/StyleTokenizer/src/run_value.py", line 517, in preprocess_function
conversions1 = [
File "/home/uu_cs_nlpsoc/awegmann/StyleTokenizer/src/run_value.py", line 518, in <listcomp>
dialect.convert_sae_to_dialect(example)
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/multivalue/BaseDialect.py", line 193, in convert_sae_to_dialect
self.update(string)
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/multivalue/BaseDialect.py", line 218, in update
self.coref_clusters = self.create_coref_cluster(string)
File "/hpc/local/Rocky8/uu_cs_nlpsoc/miniconda3/envs/aw_value/lib/python3.10/site-packages/multivalue/BaseDialect.py", line 237, in create_coref_cluster
assert [tok.text for tok in tokens] == [
AssertionError: Spacy and Stanza word mismatch
Any experience with this error? Does the run_glue.py still work for you in your env? I also had to delete the mapping in AfricanAmericanVernacular(mapping, ...).
FYI: I renamed run_glue to run_value.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels