Currently to do inference with a model, one has to construct and then destruct a model using the inference.predict module. There should be an option to have the model returned to the user to live in memory, as this is much more resource efficient (e.g on HuggingFace)