-
Notifications
You must be signed in to change notification settings - Fork 28
Add Windows x64 ABI support for amd64 target #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Introduced a new target structure for Windows x64 calling convention in `amd64/targ.c`. - Updated `main.c` to include the new Windows x64 target in the target mapping. - Implemented the Windows x64 ABI logic in `amd64/win64.c`, including argument classification, return value handling, and stack management. - Created a dedicated emitter for Windows x64 in `amd64/win64_emit.c`, mirroring existing functionality while adhering to the Windows calling conventions. - Added necessary utility functions for argument handling and register management specific to the Windows ABI.
|
The block-entry remap step in
Repro (QBE IL): export function w $gcd(w %a, w %b) {
@b1
%x =w copy %a
%y =w copy %b
jmp @b2
@b2
%y2 =w phi @b1 %y, @b3 %y3
%x2 =w phi @b1 %x, @b3 %x3
%c =w cnew %y2, 0
jnz %c, @b3, @b4
@b3
%tmp =w copy %y2
%rem =w rem %x2, %y2
%y3 =w copy %rem
%x3 =w copy %tmp
jmp @b2
@b4
ret %x2
}Before patch, Fix:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds an amd64_win64 target to bring Windows x64 ABI lowering and emission into the codebase, aiming for feature parity with existing amd64 SysV support (calls/returns/varargs) and enabling Windows support tracked in #4.
Changes:
- Introduces a new Win64 ABI lowering implementation (
amd64/win64.c) and a dedicated Win64 emitter (amd64/win64_emit.c). - Wires the new target into target selection (
main.c,amd64/targ.c,amd64/all.h) and build (Makefile). - Includes related backend robustness fixes in register allocation (
rega.c) and memory optimization (mem.c).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rega.c | Improves reg allocation edge-copy handling and adds diagnostics for missing mappings. |
| mem.c | Restricts stack-slot promotion to allocs whose uses stay within the same block. |
| main.c | Registers the new amd64_win64 target in the CLI target map. |
| amd64/win64.c | Implements Win64 ABI lowering (args, returns, varargs, shadow space). |
| amd64/win64_emit.c | New Win64-specific emitter with Win64 save/restore + frame layout. |
| amd64/targ.c | Adds Target T_amd64_win64 wiring for ABI/isel/emitter hooks. |
| amd64/all.h | Exposes Win64 ABI/emitter symbols to the amd64 compilation unit set. |
| Makefile | Adds Win64 sources to amd64 build list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void | ||
| amd64_win64_abi(Fn *fn) | ||
| { | ||
| Blk *b; | ||
| Ins *i, *i0, *ip; |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces a new ABI/emitter target (amd64_win64), but the existing test harness (tools/test.sh / make check) only exercises the default target and arm64. Consider adding a way to run/compile the ABI tests for -t amd64_win64 (even if execution requires cross tooling like mingw/wine) so regressions in Win64 lowering/varargs are caught automatically.
| off = xmm_base + 16 * save_xmm; | ||
| for (r=&amd64_win64_rclob[0]; *r>=0; r++) | ||
| if (*r >= XMM0 && (fn->reg & BIT(*r))) { | ||
| off -= 16; | ||
| fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n", | ||
| off + 16, | ||
| regtoa(*r, SLong)); |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The epilogue restores XMM callee-saved registers using a decrementing offset while iterating the rclob list in forward order, but the prologue saves them in forward order with increasing offsets. With more than one saved XMM register this will reload values into the wrong registers. Restore should either iterate the saved XMM registers in reverse order or compute the same per-register offset used during saving.
| off = xmm_base + 16 * save_xmm; | |
| for (r=&amd64_win64_rclob[0]; *r>=0; r++) | |
| if (*r >= XMM0 && (fn->reg & BIT(*r))) { | |
| off -= 16; | |
| fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n", | |
| off + 16, | |
| regtoa(*r, SLong)); | |
| off = 0; | |
| for (r=&amd64_win64_rclob[0]; *r>=0; r++) | |
| if (*r >= XMM0 && (fn->reg & BIT(*r))) { | |
| fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n", | |
| xmm_base + off + 16, | |
| regtoa(*r, SLong)); | |
| off += 16; |
| off = xmm_base + 16 * save_xmm; | ||
| for (r=&amd64_win64_rclob[0]; *r>=0; r++) | ||
| if (*r >= XMM0 && (fn->reg & BIT(*r))) { | ||
| off -= 16; | ||
| fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n", | ||
| off + 16, | ||
| regtoa(*r, SLong)); | ||
| } | ||
| off = gpr_base + 8 * save_gpr; | ||
| for (r=&amd64_win64_rclob[0]; *r>=0 && *r < XMM0; r++) | ||
| if (fn->reg & BIT(*r)) { | ||
| off -= 8; | ||
| fprintf(f, "\tmovq -%"PRIu64"(%%rbp), %%%s\n", | ||
| off + 8, |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPR restore loop has the same ordering issue as the XMM restore: registers are saved in rclob order with increasing offsets, but restored in rclob order while walking offsets backwards. This will swap the restored values between callee-saved registers. Use a consistent mapping (either restore with increasing offsets in the same order, or reverse the register iteration).
| off = xmm_base + 16 * save_xmm; | |
| for (r=&amd64_win64_rclob[0]; *r>=0; r++) | |
| if (*r >= XMM0 && (fn->reg & BIT(*r))) { | |
| off -= 16; | |
| fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n", | |
| off + 16, | |
| regtoa(*r, SLong)); | |
| } | |
| off = gpr_base + 8 * save_gpr; | |
| for (r=&amd64_win64_rclob[0]; *r>=0 && *r < XMM0; r++) | |
| if (fn->reg & BIT(*r)) { | |
| off -= 8; | |
| fprintf(f, "\tmovq -%"PRIu64"(%%rbp), %%%s\n", | |
| off + 8, | |
| off = xmm_base; | |
| for (r=&amd64_win64_rclob[0]; *r>=0; r++) | |
| if (*r >= XMM0 && (fn->reg & BIT(*r))) { | |
| off += 16; | |
| fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n", | |
| off, | |
| regtoa(*r, SLong)); | |
| } | |
| off = gpr_base; | |
| for (r=&amd64_win64_rclob[0]; *r>=0 && *r < XMM0; r++) | |
| if (fn->reg & BIT(*r)) { | |
| off += 8; | |
| fprintf(f, "\tmovq -%"PRIu64"(%%rbp), %%%s\n", | |
| off, |
|
|
||
| /* Only GPR callee-saves are modeled to match the shared emitter/reg model. */ | ||
| int amd64_win64_rclob[] = { | ||
| RBX, RBP, RSI, RDI, R12, R13, R14, R15, |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amd64_win64_rclob includes RBP, but the emitter already saves/restores RBP via pushq %rbp / leave, and RBP is also in T.rglob so it will always appear as live and get included in fn->reg. This can lead to redundant save/restore and, worse, restoring %rbp from a frame slot before other restores/leave can corrupt subsequent frame-relative loads. Exclude RBP from amd64_win64_rclob (and treat it like sysv does).
| RBX, RBP, RSI, RDI, R12, R13, R14, R15, | |
| RBX, RSI, RDI, R12, R13, R14, R15, |
| /* Windows x64 calling convention summary: | ||
| * - Four register argument slots: RCX, RDX, R8, R9 (GPR) or XMM0..3 (FP) chosen per argument type. | ||
| * - Caller must reserve 32 bytes of shadow space for every call. | ||
| * - Return: scalars in RAX/XMM0. Aggregates > 8 bytes use a hidden sret pointer in RCX; callee returns that pointer in RAX. | ||
| * - We currently treat all XMM registers as caller-saved (safe, but not optimal) to avoid changing the shared register model. | ||
| * - Varargs use the shadow space to access register arguments. |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header comment says all XMM registers are treated as caller-saved, but the implementation models XMM6–XMM15 as callee-saved (amd64_win64_rclob) and the emitter saves/restores them. Please update the comment to match the actual behavior (or adjust the save/clobber sets if the comment is the intended design).
amd64/targ.c.main.cto include the new Windows x64 target in the target mapping.amd64/win64.c, including argument classification, return value handling, and stack management.amd64/win64_emit.c, mirroring existing functionality while adhering to the Windows calling conventions.Closes #4