Skip to content

Example training fails with SafeRepresenter choking on .cache folder #55

@aarpon

Description

@aarpon

The example scripts/train.py scripts on the Fluo-N2DL-HeLa dataset crashes right after data preparation with:

Traceback (most recent call last):
  File "/DATA/Projects/trackastra-train/scripts/train.py", line 1179, in <module>
    vars = train(args)
  File "/DATA/Projects/trackastra-train/scripts/train.py", line 983, in train
    trainer.fit(model_lightning, datamodule=datamodule, ckpt_path=resume_path)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aaron/miniconda3/envs/trackastra-env/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 584, in fit
    call._call_and_handle_interrupt(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self,
        ^^^^^
    ...<6 lines>...
        weights_only,
        ^^^^^^^^^^^^^
    )
    ^
  File "/home/aaron/miniconda3/envs/trackastra-env/lib/python3.13/site-packages/lightning/pytorch/trainer/call.py", line 49, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)

...

  File "/home/aaron/miniconda3/envs/trackastra-env/lib/python3.13/site-packages/yaml/__init__.py", line 269, in safe_dump
    return dump_all([data], stream, Dumper=SafeDumper, **kwds)
  File "/home/aaron/miniconda3/envs/trackastra-env/lib/python3.13/site-packages/yaml/__init__.py", line 241, in dump_all
    dumper.represent(data)
    ~~~~~~~~~~~~~~~~^^^^^^

...

  File "/home/aaron/miniconda3/envs/trackastra-env/lib/python3.13/site-packages/yaml/representer.py", line 232, in represent_undefined
    raise RepresenterError("cannot represent an object", data)
yaml.representer.RepresenterError: ('cannot represent an object', PosixPath('/DATA/Projects/trackastra-train/scripts/.cache'))

The SafeRepresenter chokes on the cache path, because its type is pathlib.PosixPath and the Representer does not now how to handle it.

One way to fix this is to store the path as str in the training_args:

# a modelcheckpoint that uses TrackingTransformer.save() to save the model
class MyModelCheckpoint(pl.pytorch.callbacks.Callback):
    def __init__(self, logdir, training_args: dict, monitor: str = "val_loss"):
        self._logdir = Path(logdir)
        self._monitor = monitor
        self._best = np.inf
        # Patch issue with type of cachedir causing SafeRepresenter to raise an Exception in represent_undefined()
        training_args["cachedir"] = str(training_args["cachedir"])  # This apparently fixes the issue
        self._training_args = training_args

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions