Skip to content

Running a Pre-Optimized Model with MAX Serving #94

@MarkBruns

Description

@MarkBruns

The power of the MAX platform will be demonstrated by running a pre-optimized model from the Modular model repository. Using the max serve command, an OpenAI-compatible API endpoint will be started locally, serving a model like Llama 3. The performance (tokens per second) of this endpoint will be observed and compared to other inference methods, showcasing the benefits of Modular's optimizations.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions