Skip to content

Use during training time #2

@MALONSO-ARC

Description

@MALONSO-ARC

Hi.

Thank you for this great contribution. I would like to know if you can use MRFI during training time, e.g. to perform fault-tolerant training. As I understood, faults are injected as Pytorch hooks, but will these propagate the gradient correctly during training time considering the fault injection (decouple output from input for the outputs affected by the injection)?

Taking a look at the code, I see the forward hooks added at mrfi.MRFI.__add_hoks(), but I don't see that any backward hooks are implemented. So I assume that at this moment MRFI can't be used at training time. Please correct me if I'm wrong.

If that's the case, are there any plans to implement this feature? It would be a very useful feature to implement fault-tolerant training pipelines.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions