Running a Pre-Optimized Model with MAX Serving

The power of the MAX platform will be demonstrated by running a pre-optimized model from the Modular model repository. Using the max serve command, an OpenAI-compatible API endpoint will be started locally, serving a model like Llama 3. The performance (tokens per second) of this endpoint will be observed and compared to other inference methods, showcasing the benefits of Modular's optimizations.