Skip to content

Fragility in Training and Theoretical Guarantees in Non-Autonomous Settings #1

@HarshlyDOOM

Description

@HarshlyDOOM

Firstly, thank you for your excellent work on extending your PLNet framework to a HNN structure. The theoretical results look really clean and and I have been working on replicating your results on alternative datasets.

I have been adapting your Port-Hamiltonian structure (Section III-B of the paper) to a non-autonomous Input-Output (I/O) modeling setting, where I use an encoder to derive latent state embeddings from windowed measurements of inputs and outputs. However, I have encountered two problems and I was wondering if you have noticed these during your works as well:

  1. I have noticed significant training instability with the Stable Port-Hamiltonian Neural Dynamics (pH-SHND) model from I/O data. In particular, there seems to be frequent gradient explosions after initially what are low validation/training losses (an example is given below):
Training model: IO_pH_SHND
Epoch   1/100 | Train Loss: 0.075884 | Val Loss: 0.047648
Epoch  10/100 | Train Loss: 0.001505 | Val Loss: 0.001205
Epoch  20/100 | Train Loss: 0.000694 | Val Loss: 0.002039
Epoch  30/100 | Train Loss: 0.000336 | Val Loss: 0.000529
Epoch  40/100 | Train Loss: 0.001453 | Val Loss: 0.000416
Epoch  50/100 | Train Loss: 0.000158 | Val Loss: 0.000122

While I can reduce the chances of this happening with careful (and precise) tuning of the cosine learning rate scheduler and implementing early stopping - compared to architectures like embedding the Input Convex Neural Network (ICNN) within an encoder-decoder structure, my first impressions seems to be that the Port-Hamiltonian SHND is particularly fragile.

I would appreciate any insights into whether this fragility has been observed on your end, or whether the model is known to be more sensitive in a non-autonomous context.

  1. A more theoretical question -Does introducing an encoder (mapping windowed observations to a latent state) compromise the guarantees around passivity, dissipativity, or contraction? My intuition says that if the encoder is fixed or learned jointly, it should still preserve the structural constraints of the overall input-output dynamics as the encoder is essentially acting as a state estimator here instead of actually adding something to the overall structure. But I would love to hear your thoughts regarding this.

Thanks for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions