Model not learning when training or fine-tuning

Dear creators of Spotiflow,

first of all, congratulations for creating this fantastic tool, used it may time out of the box and it worked flawlessly.

However, currently I am trying to fine-tune (or train) the model to help me with detection of synapses (BRP labeled) and the results are really poor (model is not learning anything, see bellow). At this point I have tried so many different things/parameter setting and it's really hard to keep track of everything. Do you have tips on what to try to make the model actually learn? My data is of course very imbalanced (as all spot detection problems would be I guess). Training images are of varying sizes (typically something like 100x200x200). Data was originally unisotropic ([0.17 0.17 0.5]), but has been resampled to isotropic ([0.17 0.17 0.17]), could this be the issue?

Training log:
```
INFO:spotiflow.model.spotiflow:Training config is: SpotiflowTrainingConfig(
	batch_size=32
	crop_size=64
	crop_size_depth=32
	early_stopping_patience=0
	finetuned_from=synth_3d
	flow_loss_f=l1
	heatmap_loss_f=bce
	loss_levels=None
	lr=0.0003
	lr_reduce_patience=10
	num_epochs=100
	num_train_samples=None
	optimize_threshold=True
	optimizer=adamw
	pos_weight=10
	smart_crop=True
)
Normalizing images: 100%|██████████| 270/270 [00:16<00:00, 16.03it/s]WARNING:spotiflow.data.spots:Some images are smaller than the crop size ((32, 32, 32)). Will center pad with zeros.

Normalizing images: 100%|██████████| 68/68 [00:04<00:00, 16.81it/s]WARNING:spotiflow.data.spots:Some images are smaller than the crop size ((32, 32, 32)). Will center pad with zeros.
WARNING:spotiflow.model.spotiflow:Deterministic training is currently not supported in 3D mode. Disabling.

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 3080') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:spotiflow.model.trainer:Creating logdir models/3d_custom_individual_neurons and saving training config...

  | Name            | Type      | Params | Mode 
------------------------------------------------------
0 | model           | Spotiflow | 35.5 M | train
1 | _flow_loss_func | L1Loss    | 0      | train
------------------------------------------------------
35.5 M    Trainable params
0         Non-trainable params
35.5 M    Total params
141.960   Total estimated model params size (MB)
325       Modules in train mode
0         Modules in eval mode
Epoch 0: 100%|██████████| 9/9 [00:02<00:00,  3.65it/s, v_num=0]            INFO:spotiflow.model.trainer:Saved best model with val_loss=0.463.
Epoch 1: 100%|██████████| 9/9 [00:04<00:00,  1.96it/s, v_num=0, val_loss=0.463, val_f1=0.110, val_acc=0.101, heatmap_loss=0.304, flow_loss=0.194, train_loss=0.498]INFO:spotiflow.model.trainer:Saved best model with val_loss=0.423.
Epoch 3: 100%|██████████| 9/9 [00:05<00:00,  1.74it/s, v_num=0, val_loss=0.443, val_f1=0.0525, val_acc=0.050, heatmap_loss=0.289, flow_loss=0.180, train_loss=0.469]INFO:spotiflow.model.trainer:Saved best model with val_loss=0.396.
Epoch 9: 100%|██████████| 9/9 [00:03<00:00,  2.62it/s, v_num=0, val_loss=0.398, val_f1=0.170, val_acc=0.168, heatmap_loss=0.256, flow_loss=0.160, train_loss=0.416] INFO:spotiflow.model.trainer:Saved best model with val_loss=0.365.
Epoch 37: 100%|██████████| 9/9 [00:03<00:00,  2.54it/s, v_num=0, val_loss=0.407, val_f1=0.132, val_acc=0.132, heatmap_loss=0.262, flow_loss=0.159, train_loss=0.421]  INFO:spotiflow.model.trainer:Saved best model with val_loss=0.343.
Epoch 63: 100%|██████████| 9/9 [00:03<00:00,  2.49it/s, v_num=0, val_loss=0.443, val_f1=0.103, val_acc=0.103, heatmap_loss=0.263, flow_loss=0.162, train_loss=0.425]  INFO:spotiflow.model.trainer:Saved best model with val_loss=0.326.
Epoch 99: 100%|██████████| 9/9 [00:09<00:00,  0.99it/s, v_num=0, val_loss=0.395, val_f1=0.191, val_acc=0.191, heatmap_loss=0.259, flow_loss=0.157, train_loss=0.416]  `Trainer.fit` stopped: `max_epochs=100` reached.
INFO:spotiflow.model.spotiflow:Will use device: cuda:0
optimizing threshold: 100%|██████████| 11/11 [00:00<00:00, 106.54it/s]
optimizing threshold: 100%|██████████| 11/11 [00:00<00:00, 118.41it/s]INFO:spotiflow.model.spotiflow:Best thresholds: (np.float64(0.42),)

INFO:spotiflow.model.spotiflow:Best F1-score: (np.float64(0.0),)
INFO:spotiflow.model.trainer:Saved last model with optimized thresholds.
INFO:spotiflow.model.spotiflow:Will use device: cuda:0
optimizing threshold: 100%|██████████| 11/11 [00:00<00:00, 138.77it/s]
optimizing threshold: 100%|██████████| 11/11 [00:00<00:00, 148.77it/s]INFO:spotiflow.model.spotiflow:Best thresholds: (np.float64(0.452),)
INFO:spotiflow.model.spotiflow:Best F1-score: (np.float64(0.0),)

INFO:spotiflow.model.trainer:Saved best model with optimized thresholds.
Epoch 99: 100%|██████████| 9/9 [00:16<00:00,  0.56it/s, v_num=0, val_loss=0.395, val_f1=0.191, val_acc=0.191, heatmap_loss=0.259, flow_loss=0.157, train_loss=0.416]
INFO:spotiflow.model.spotiflow:Training finished.
```

I am looking for any hints on how to improve my training. I can also gladly provide some training examples to get a better impression for the data.

Thank you very much and kind regards,
Blaž


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model not learning when training or fine-tuning #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model not learning when training or fine-tuning #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions