Skip to content

[torch-frontend] add stablehlo IRs for Mixtral model.#254

Open
Vremold wants to merge 4 commits intobytedance:mainfrom
Vremold:wjw/mixtral-decoder-stablehlo
Open

[torch-frontend] add stablehlo IRs for Mixtral model.#254
Vremold wants to merge 4 commits intobytedance:mainfrom
Vremold:wjw/mixtral-decoder-stablehlo

Conversation

@Vremold
Copy link
Collaborator

@Vremold Vremold commented May 16, 2024

In this PR, we provide stablehlo IR of a single Mixtral decoder layer using ByteIR stack. The IR is elided by --mlir-elide-resource-strings-if-larger=1000 option, so not all dialect resources storing the model weights are displayed in the IR.

Note: we have some local patches to make the compilation succeed.

  1. We eliminate torch.runtime.assert in stablehlo conversion, as we haven't decided how to handle it.
  2. We need patches of PR 3322 and PR 3085 in torch-mlir

@Vremold Vremold requested a review from qingyunqu May 16, 2024 17:01
@liwenchangbdbz liwenchangbdbz added the enhancement New feature or request label May 21, 2024
@Vremold
Copy link
Collaborator Author

Vremold commented May 30, 2024

Update at 2024.05.31.

We add stablehlo IR of a whole Mixtral 8x7B model. Note, to save compilation time and memory consumption, we convert the large weights into splat DenseElementsAttrs. See frontends/torch-frontend/examples/inference/mixtral/infer_single_mixtral.py for how to run.

@Vremold Vremold changed the title [torch-frontend] add elided stablehlo IR for a single Mixtral decoder layer [torch-frontend] add stablehlo IRs for Mixtral model. May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants