Fragility in Training and Theoretical Guarantees in Non-Autonomous Settings

Firstly, thank you for your excellent work on extending your PLNet framework to a HNN structure. The theoretical results look really clean and and I have been working on replicating your results on alternative datasets.

I have been adapting your Port-Hamiltonian structure (Section III-B of the paper) to a non-autonomous Input-Output (I/O) modeling setting, where I use an encoder to derive latent state embeddings from windowed measurements of inputs and outputs. However, I have encountered two problems and I was wondering if you have noticed these during your works as well: 

1. I have noticed significant training instability with the Stable Port-Hamiltonian Neural Dynamics (pH-SHND) model from I/O data. In particular, there seems to be frequent gradient explosions after initially what are low validation/training losses (an example is given below):

```shell
Training model: IO_pH_SHND
Epoch   1/100 | Train Loss: 0.075884 | Val Loss: 0.047648
Epoch  10/100 | Train Loss: 0.001505 | Val Loss: 0.001205
Epoch  20/100 | Train Loss: 0.000694 | Val Loss: 0.002039
Epoch  30/100 | Train Loss: 0.000336 | Val Loss: 0.000529
Epoch  40/100 | Train Loss: 0.001453 | Val Loss: 0.000416
Epoch  50/100 | Train Loss: 0.000158 | Val Loss: 0.000122
``` 

While I can reduce the chances of this happening with careful (and precise) tuning of the cosine learning rate scheduler and implementing early stopping - compared to architectures like embedding the Input Convex Neural Network (ICNN) within an encoder-decoder structure, my first impressions seems to be that the Port-Hamiltonian SHND is particularly fragile. 

I would appreciate any insights into whether this fragility has been observed on your end, or whether the model is known to be more sensitive in a non-autonomous context.

2. A more theoretical question -Does introducing an encoder (mapping windowed observations to a latent state) compromise the guarantees around passivity, dissipativity, or contraction? My intuition says that if the encoder is fixed or learned jointly, it should still preserve the structural constraints of the overall input-output dynamics as the encoder is essentially acting as a state estimator here instead of actually adding something to the overall structure. But I would love to hear your thoughts regarding this. 

Thanks for your time! 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fragility in Training and Theoretical Guarantees in Non-Autonomous Settings #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fragility in Training and Theoretical Guarantees in Non-Autonomous Settings #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions