Use during training time

Hi. 

Thank you for this great contribution. I would like to know if you can use MRFI during training time, e.g. to perform fault-tolerant training. As I understood, faults are injected as Pytorch hooks, but will these propagate the gradient correctly during training time considering the fault injection (decouple output from input for the outputs affected by the injection)? 

Taking a look at the code, I see the forward hooks added at mrfi.MRFI.__add_hoks(), but I don't see that any backward hooks are implemented. So I assume that at this moment MRFI can't be used at training time. Please correct me if I'm wrong.

If that's the case, are there any plans to implement this feature? It would be a very useful feature to implement fault-tolerant training pipelines.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use during training time #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use during training time #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions