-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
Hi,
After running the notebook, demo/tinytettuce.ipynb creates a dataset train_data.json. However, if I run the following code as described in the notebook
python scripts/train.py \
--ragtruth-path train_data.json \
--model-name jhu-clsp/ettin-encoder-68m \
--output-dir output/hallucination_detector \
--batch-size 4 \
--epochs 6 \
--learning-rate 1e-5
I get the error
Traceback (most recent call last):
File "LettuceDetect/scripts/train.py", line 155, in <module>
main()
File "LettuceDetect/scripts/train.py", line 98, in main
ragtruth_data = HallucinationData.from_json(json.loads(ragtruth_path.read_text()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "LettuceDetect/lettucedetect/datasets/hallucination_dataset.py", line 53, in from_json
samples=[HallucinationSample.from_json(sample) for sample in json_dict],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "LettuceDetect/lettucedetect/datasets/hallucination_dataset.py", line 38, in from_json
dataset=json_dict["dataset"],
~~~~~~~~~^^^^^^^^^^^
KeyError: 'dataset'
And if I add the key 'dataset', a new error arise with missing key 'language'. If 'language' is also added to the train_data, a new error arise
File "LettuceDetect/scripts/train.py", line 155, in <module>
main()
File "LettuceDetect/scripts/train.py", line 124, in main
train_loader = DataLoader(
^^^^^^^^^^^
File "LettuceDetect/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 388, in __init__
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "LettuceDetect/.venv/lib/python3.12/site-packages/torch/utils/data/sampler.py", line 156, in __init__
raise ValueError(
Do you have an updated working example with tinylettuce?
lettucedetect 0.1.8
python 3.12.3
Cheers,
Freja
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels