Exploring Rust's Compilation Pipeline: Building a Custom Toolchain—MIR, CLIF, or Beyond? #47

nihalpasham · 2025-02-07T04:14:47Z

nihalpasham
Feb 7, 2025

I am currently evaluating what it takes to build an end-to-end Rust toolchain. This includes the compiler backend, for which, at present, the only reliable option is LLVM. My goal is threefold, prioritized in the following order:

Full Control of the Compilation Pipeline: Can we achieve complete control over the compilation process to enable better reasoning and customization? Essentially, the ability to modify any stage of compilation to achieve specific outcomes—whether it’s performance, stability, formal verification, productivity, or other goals. While this may be an ambitious ask, my initial focus will be on achieving comparable performance.
Competing with State-of-the-Art Solutions: Can we compete with established systems like LLVM? Perhaps not entirely, but at least for narrowly scoped, custom use cases as a starting point.
Support for RISC-V Hardware: Can we provide support for a selection of extensions for RISC-V hardware, whether custom or standard? I believe that with Pliron (similar to MLIR), this should be relatively straightforward.

Additional Context: I recently spoke with one of the core developers of Cranelift, where I learned that using Cranelift as the backend for plain Rust code could result in a performance hit of 5–15x compared to LLVM. Although Cranelift describes itself as an optimizing compiler, its current focus is on consuming optimized WebAssembly (WASM)—optimized using LLVM’s optimizer—and converting it into Cranelift IR (CLIF) for machine code generation.

The developer mentioned that it would be possible to add necessary optimizations, such as mem2reg and aggressive inlining, to reduce this performance gap. However, he also noted that Cranelift’s primary focus is on compilation speed. Additionally, the ISLE DSL (Domain-Specific Language) used in Cranelift can feel overly complex, to the point where it starts to resemble LLVM in terms of intricacy.

Given the above context, I’m exploring the following two paths, depending on the amount of information or stability guarantees we lose when moving from MIR (high-level IR) to CLIF IR (low-level IR):

MIR → Pliron MIR
- Cons: I’m not entirely sure, but I believe the Rust team reserves the right to change MIR’s structure, opcodes, types, or semantics in future versions of the compiler, as it is an internal implementation detail of the Rust compiler (rustc). This lack of stability could pose challenges for long-term maintenance.
CLIF → Pliron CLIF
- Cons: Information loss, such as borrow checking and ownership information, lifetimes, Rust-specific types, and high-level control flow, among others. This could make it harder to preserve Rust’s high-level semantics and safety guarantees during the translation process.

I’d love to hear your thoughts on this.

vaivaswatha · 2025-02-08T06:51:44Z

vaivaswatha
Feb 8, 2025
Maintainer

what it takes to build an end-to-end Rust toolchain

What's the goal here? I can perhaps answer better knowing why you want an end-to-end Rust toolchain (i.e., different from the existing mainstream Rust toolchain).

My goal is threefold, prioritized in the following order:

Theoretically, these can be achieved. But practically, it's quite hard to aim for it all at once. My take would be that we depend on other implementations (libraries) conveniently till all the parts eventually use pliron infrastructure.

I’m exploring the following two paths

The way I see it, both of what you mention will be required, not just one or the other.

A borrow checked, optimized MIR can be translated to an MIR dialect in pliron. This can then be further optimized, subject to other static analysis etc within the pliron framework.

Once pliron supports an llvm dialect completely (it's only a partial / proof-of-concept today) and / or a clif dialect, then either could be used for generating assembly.

A personal target of mine is to have complete llvm and clif dialects in-tree inside pliron (as separate crates, but in the same repo) and have conversions available b/w the two. This will enable anyone using pliron to just target one of the two, and the other would automatically work.

5 replies

vaivaswatha Feb 8, 2025
Maintainer

So, to start with, what I'm proposing is as follows

                                                                            
                                                                            
                           rustc                                            
                             │                                              
                             ▼                                              
             borrow checked and optimized MIR                               
                             │                                              
                             ▼                                              
  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                   
  x                   pliron MIR dialect                x                   
  x                          │                          x                   
  x                          ▼                          x                   
  x       analysis / transformations inside pliron      x pliron framework  
  x                      │     │                        x                   
  x            ◄─────────┘     └────────►               x                   
  x                                                     x                   
  x  pliron LLVM dialect        pliron CLIF dialect     x                   
  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                   
              │                         │                                   
              │                         │                                   
              ▼                         ▼                                   
             LLVM                   cranelift                               
              │                         │                                   
              └──────► assembly  ◄──────┘

Once this starts working, then the other components (shown as outside of the pliron framework here ⬆️ ) can be rewritten in pliron, if / when required.

nihalpasham Feb 8, 2025
Author

Agreed. My initial thought was to explore this dual approach—incorporating both MIR and CLIF dialects in Pliron. However, I wasn't sure whether this is truly feasible or if there are any potential pitfalls.

P.S.: Is the ASCII art hand-drawn or generated by a tool? It's neat!

vaivaswatha Feb 8, 2025
Maintainer

I think incorporating MIR and CLIF dialects into pliron is definitely a good start. I don't see why it shouldn't work.

Let me know if you are planning to work on these. I prefer having the clif dialect in the pliron repo, similar to the LLVM dialect. An MIR dialect can be outside (either you own it, or if you prefer, I can create one). If you think it's better for initial development for the MIR dialect to also be part of the pliron repo, we can do that as well.

vaivaswatha Feb 8, 2025
Maintainer

The ASCII art is from https://asciiflow.com/ :-)

nihalpasham Feb 11, 2025
Author

I think incorporating MIR and CLIF dialects into pliron is definitely a good start. I don't see why it shouldn't work.

Cool.

Let me know if you are planning to work on these. I prefer having the clif dialect in the pliron repo, similar to the LLVM dialect.

I intend to start working on the CLIF dialect for Pliron. Although I'm guessing it might take some time to get the hang of (compiler engineering) things and offload some unrelated current initiatives.

An MIR dialect can be outside (either you own it, or if you prefer, I can create one). If you think it's better for initial development for the MIR dialect to also be part of the pliron repo, we can do that as well.

Personally, I think the idea of having a minimal set of dialects in Pliron is a good idea. But I guess we can get to this part when we do get to it.

nihalpasham · 2025-02-19T11:19:21Z

nihalpasham
Feb 19, 2025
Author

I’ve put together an initial implementation of the Clif → Pliron-Clif dialect with a (basic) working example, using the existing Pliron-LLVM implementation as a starting point. You can find it here: Pliron-Clif Implementation.

I have a couple of questions regarding the generated IR:

ReturnOp Formatting: The ReturnOp implements Printable via the impl_canonical_syntax! macro, but it seems to append extra content at the end (as shown below). Do you happen to know why this might be happening? I considered overriding Printable with a custom implementation but wanted to check first in case I missed something.
SSA Identifiers: The generated SSA identifiers (as seen below) don’t seem to align correctly. I suspect this might be related to how I’m linking program entities, but I’m not entirely sure what I’m missing?

builtin.func @add: builtin.function <(builtin.int <si32>, builtin.int <si32>) -> (builtin.int <si32>)> 
{
  ^entry_block_1v1(block_1v1_arg0: builtin.int <si32>, block_1v1_arg1: builtin.int <si32>):
    op_2v1_res0 = clif.iadd block_3v1_arg0, block_4v1_arg1: builtin.int <si32>;
    clif.return (op_3v1_res0) [] []: <(builtin.int <si32>) -> ()>
}

Additionally, I’d love to hear your feedback on the overall implementation—like am I heading in the right direction?

1 reply

vaivaswatha Feb 19, 2025
Maintainer

ReturnOp Formatting: The ReturnOp implements Printable via the impl_canonical_syntax! macro, but it seems to append extra content at the end (as shown below).

Yes, It prints the operands, successors (an empty list here), and then attributes (again empty here) and then the type of the operation. That's the canonical syntax of an Operation since those are the components of the operation.

Yes, I suggest that you have a custom implementation, at least until #49 is implemented.

The generated SSA identifiers (as seen below) don’t seem to align correctly

I'm not sure what you mean by align. Do you mean to say it isn't aligned to the right column in the printed output?

Additionally, I’d love to hear your feedback on the overall implementation—like am I heading in the right direction?

Yes, it looks to me that this is in the right direction.

nihalpasham · 2025-02-19T15:11:31Z

nihalpasham
Feb 19, 2025
Author

I'm not sure what you mean by align. Do you mean to say it isn't aligned to the right column in the printed output?

I meant to say for

SSA Identifiers: I expected block arguments (e.g., block_1v1_arg0) to be used as operands in the clif.iadd operation instead of values like block_3v1_arg0. Similarly, I expected the result (op_2v1_res0) to be used as ReturnOp’s operand instead of op_3v1_res0. I’m not sure what I’m doing wrong?

14 replies

nihalpasham Feb 22, 2025
Author

My thinking was this should ensure SSA integrity by avoiding duplicate conversions and guaranteeing consistent value/block references in the generated IR.

vaivaswatha Feb 22, 2025
Maintainer

If you do this in RPO, then

You will always, automatically, convert a definition, before its use.
When you see a use (as an operand), you can assert that it's already been converted (i.e., available in your conversion map).

nihalpasham Feb 22, 2025
Author

Ah, I think I mistakenly assumed that Cranelift exposes RPO ordering via its Layout type. However, Layout does not inherently store blocks in RPO; instead, it maintains them in insertion order while providing iteration methods. Looks like I’ll need to compute the RPO order manually.

Question (out of curiosity):

In my current implementation, where we iterate in insertion order and ensure that every block and instruction is always checked for prior conversion, would this still work? Or are there any potential gotchas or corner cases I should be aware of?

vaivaswatha Feb 22, 2025
Maintainer

Cranelift does seem to provide RPO computation, but this is just from their repo directly, I'm not sure if it's available with the crates you're using.

https://github.com/bytecodealliance/wasmtime/blob/2e48fbc9279f89d89f4dd99c48da48e2735e18cc/cranelift/codegen/src/dominator_tree.rs#L167

where we iterate in insertion order and ensure that every block and instruction is always checked for prior conversion, would this still work?

I think it will, especially when we're dealing with "block argument" type SSA, rather than naive PHI node based SSA (although conceptually they're almost equivalent).

nihalpasham Feb 24, 2025
Author

I included an implementation that uses RPO for the conversion process. Currently, both implementations (insertion order and RPO) coexist, allowing us to add and test new operations while identifying any potential issues.

vaivaswatha · 2025-02-21T05:56:16Z

vaivaswatha
Feb 21, 2025
Maintainer

ReturnOp Formatting: The ReturnOp implements Printable via the impl_canonical_syntax! macro

Can you try using #[format_op("operands(CharSpace(,))")]. I think that should work. It might end up (erroneously) accepting more operands (than 1) if specified, but that'll be caught by the verifier anyway.

I closed #49 as "won't implement", and this is an alternate for that.

2 replies

nihalpasham Feb 22, 2025
Author

Sure, I'll try this and report back.

nihalpasham Feb 24, 2025
Author

this works, as expected.

builtin.func @add: builtin.function <(builtin.int <si32>, builtin.int <si32>)->(builtin.int <si32>)> 
{
  ^entry_block_1v1(block_1v1_arg0:builtin.int <si32>,block_1v1_arg1:builtin.int <si32>):
    op_2v1_res0 = clif.iadd block_1v1_arg0,block_1v1_arg1:builtin.int <si32>;
    clif.return (op_2v1_res0)
}

wangeguo · 2025-05-10T11:51:30Z

wangeguo
May 10, 2025

I’m working on the Hummanta compiler, which has similar requirements—though my use case is a bit more complex, as it needs to support multiple frontend languages and multiple VM bytecode outputs. Based on MLIR and the earlier discussion, I’ve sketched a diagram of Hummanta’s compilation flow and its interaction with MLIR:

In my design:

Each language gets its own HIR (e.g., Solidity IR),
These are lowered to a unified MIR (middle-level IR),
Then further compiled to target-specific instruction sets.

Additionally, I plan to use MLIR for source-to-source translation between languages. I chose Cranelift for its ISLE (ideal for custom ISAs), while LLVM provides broad platform support.

0 replies

Exploring Rust's Compilation Pipeline: Building a Custom Toolchain—MIR, CLIF, or Beyond? #47

Uh oh!

Uh oh!

nihalpasham Feb 7, 2025

Replies: 5 comments · 22 replies

Uh oh!

vaivaswatha Feb 8, 2025 Maintainer

Uh oh!

vaivaswatha Feb 8, 2025 Maintainer

Uh oh!

nihalpasham Feb 8, 2025 Author

Uh oh!

vaivaswatha Feb 8, 2025 Maintainer

Uh oh!

vaivaswatha Feb 8, 2025 Maintainer

Uh oh!

Uh oh!

nihalpasham Feb 11, 2025 Author

Uh oh!

nihalpasham Feb 19, 2025 Author

Uh oh!

vaivaswatha Feb 19, 2025 Maintainer

Uh oh!

nihalpasham Feb 19, 2025 Author

Uh oh!

nihalpasham Feb 22, 2025 Author

Uh oh!

vaivaswatha Feb 22, 2025 Maintainer

Uh oh!

nihalpasham Feb 22, 2025 Author

Uh oh!

vaivaswatha Feb 22, 2025 Maintainer

Uh oh!

Uh oh!

nihalpasham Feb 24, 2025 Author

Uh oh!

vaivaswatha Feb 21, 2025 Maintainer

Uh oh!

nihalpasham Feb 22, 2025 Author

Uh oh!

nihalpasham Feb 24, 2025 Author

Uh oh!

wangeguo May 10, 2025

nihalpasham
Feb 7, 2025

Replies: 5 comments 22 replies

vaivaswatha
Feb 8, 2025
Maintainer

vaivaswatha Feb 8, 2025
Maintainer

nihalpasham Feb 8, 2025
Author

vaivaswatha Feb 8, 2025
Maintainer

vaivaswatha Feb 8, 2025
Maintainer

nihalpasham Feb 11, 2025
Author

nihalpasham
Feb 19, 2025
Author

vaivaswatha Feb 19, 2025
Maintainer

nihalpasham
Feb 19, 2025
Author

nihalpasham Feb 22, 2025
Author

vaivaswatha Feb 22, 2025
Maintainer

nihalpasham Feb 22, 2025
Author

vaivaswatha Feb 22, 2025
Maintainer

nihalpasham Feb 24, 2025
Author

vaivaswatha
Feb 21, 2025
Maintainer

nihalpasham Feb 22, 2025
Author

nihalpasham Feb 24, 2025
Author

wangeguo
May 10, 2025