Skip to content

Conversation

@itsfuad
Copy link

@itsfuad itsfuad commented Dec 25, 2025

  • Introduced a new target structure for Windows x64 calling convention in amd64/targ.c.
  • Updated main.c to include the new Windows x64 target in the target mapping.
  • Implemented the Windows x64 ABI logic in amd64/win64.c, including argument classification, return value handling, and stack management.
  • Created a dedicated emitter for Windows x64 in amd64/win64_emit.c, mirroring existing functionality while adhering to the Windows calling conventions.
  • Added necessary utility functions for argument handling and register management specific to the Windows ABI.

Closes #4

- Introduced a new target structure for Windows x64 calling convention in `amd64/targ.c`.
- Updated `main.c` to include the new Windows x64 target in the target mapping.
- Implemented the Windows x64 ABI logic in `amd64/win64.c`, including argument classification, return value handling, and stack management.
- Created a dedicated emitter for Windows x64 in `amd64/win64_emit.c`, mirroring existing functionality while adhering to the Windows calling conventions.
- Added necessary utility functions for argument handling and register management specific to the Windows ABI.
@itsfuad
Copy link
Author

itsfuad commented Dec 27, 2025

The block-entry remap step in rega.c can assign a live temp to a register already used by another live temp in the same block.

  • On amd64, this can break div/rem loops where two phi values are live at the loop header, causing the dividend and divisor to share a register (wrong results or infinite loops).

Repro (QBE IL):

export function w $gcd(w %a, w %b) {
@b1
	%x =w copy %a
	%y =w copy %b
	jmp @b2
@b2
	%y2 =w phi @b1 %y, @b3 %y3
	%x2 =w phi @b1 %x, @b3 %x3
	%c =w cnew %y2, 0
	jnz %c, @b3, @b4
@b3
	%tmp =w copy %y2
	%rem =w rem %x2, %y2
	%y3 =w copy %rem
	%x3 =w copy %tmp
	jmp @b2
@b4
	ret %x2
}

Before patch, qbe -dR shows %x2 and %y2 mapped to the same register at @b2, which makes idiv use the same reg for dividend/divisor. After patch, they stay in distinct registers.

Fix:

  • In rega.c step “emit copies shared by multiple edges,” skip remapping a temp to a register that is already assigned to another live temp in the block.
  • Only update m->r and the bitset when the target register is free.

Copilot AI review requested due to automatic review settings February 11, 2026 07:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an amd64_win64 target to bring Windows x64 ABI lowering and emission into the codebase, aiming for feature parity with existing amd64 SysV support (calls/returns/varargs) and enabling Windows support tracked in #4.

Changes:

  • Introduces a new Win64 ABI lowering implementation (amd64/win64.c) and a dedicated Win64 emitter (amd64/win64_emit.c).
  • Wires the new target into target selection (main.c, amd64/targ.c, amd64/all.h) and build (Makefile).
  • Includes related backend robustness fixes in register allocation (rega.c) and memory optimization (mem.c).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rega.c Improves reg allocation edge-copy handling and adds diagnostics for missing mappings.
mem.c Restricts stack-slot promotion to allocs whose uses stay within the same block.
main.c Registers the new amd64_win64 target in the CLI target map.
amd64/win64.c Implements Win64 ABI lowering (args, returns, varargs, shadow space).
amd64/win64_emit.c New Win64-specific emitter with Win64 save/restore + frame layout.
amd64/targ.c Adds Target T_amd64_win64 wiring for ABI/isel/emitter hooks.
amd64/all.h Exposes Win64 ABI/emitter symbols to the amd64 compilation unit set.
Makefile Adds Win64 sources to amd64 build list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +556 to +560
void
amd64_win64_abi(Fn *fn)
{
Blk *b;
Ins *i, *i0, *ip;
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a new ABI/emitter target (amd64_win64), but the existing test harness (tools/test.sh / make check) only exercises the default target and arm64. Consider adding a way to run/compile the ABI tests for -t amd64_win64 (even if execution requires cross tooling like mingw/wine) so regressions in Win64 lowering/varargs are caught automatically.

Copilot uses AI. Check for mistakes.
Comment on lines +518 to +524
off = xmm_base + 16 * save_xmm;
for (r=&amd64_win64_rclob[0]; *r>=0; r++)
if (*r >= XMM0 && (fn->reg & BIT(*r))) {
off -= 16;
fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n",
off + 16,
regtoa(*r, SLong));
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epilogue restores XMM callee-saved registers using a decrementing offset while iterating the rclob list in forward order, but the prologue saves them in forward order with increasing offsets. With more than one saved XMM register this will reload values into the wrong registers. Restore should either iterate the saved XMM registers in reverse order or compute the same per-register offset used during saving.

Suggested change
off = xmm_base + 16 * save_xmm;
for (r=&amd64_win64_rclob[0]; *r>=0; r++)
if (*r >= XMM0 && (fn->reg & BIT(*r))) {
off -= 16;
fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n",
off + 16,
regtoa(*r, SLong));
off = 0;
for (r=&amd64_win64_rclob[0]; *r>=0; r++)
if (*r >= XMM0 && (fn->reg & BIT(*r))) {
fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n",
xmm_base + off + 16,
regtoa(*r, SLong));
off += 16;

Copilot uses AI. Check for mistakes.
Comment on lines +518 to +531
off = xmm_base + 16 * save_xmm;
for (r=&amd64_win64_rclob[0]; *r>=0; r++)
if (*r >= XMM0 && (fn->reg & BIT(*r))) {
off -= 16;
fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n",
off + 16,
regtoa(*r, SLong));
}
off = gpr_base + 8 * save_gpr;
for (r=&amd64_win64_rclob[0]; *r>=0 && *r < XMM0; r++)
if (fn->reg & BIT(*r)) {
off -= 8;
fprintf(f, "\tmovq -%"PRIu64"(%%rbp), %%%s\n",
off + 8,
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPR restore loop has the same ordering issue as the XMM restore: registers are saved in rclob order with increasing offsets, but restored in rclob order while walking offsets backwards. This will swap the restored values between callee-saved registers. Use a consistent mapping (either restore with increasing offsets in the same order, or reverse the register iteration).

Suggested change
off = xmm_base + 16 * save_xmm;
for (r=&amd64_win64_rclob[0]; *r>=0; r++)
if (*r >= XMM0 && (fn->reg & BIT(*r))) {
off -= 16;
fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n",
off + 16,
regtoa(*r, SLong));
}
off = gpr_base + 8 * save_gpr;
for (r=&amd64_win64_rclob[0]; *r>=0 && *r < XMM0; r++)
if (fn->reg & BIT(*r)) {
off -= 8;
fprintf(f, "\tmovq -%"PRIu64"(%%rbp), %%%s\n",
off + 8,
off = xmm_base;
for (r=&amd64_win64_rclob[0]; *r>=0; r++)
if (*r >= XMM0 && (fn->reg & BIT(*r))) {
off += 16;
fprintf(f, "\tmovdqu -%"PRIu64"(%%rbp), %%%s\n",
off,
regtoa(*r, SLong));
}
off = gpr_base;
for (r=&amd64_win64_rclob[0]; *r>=0 && *r < XMM0; r++)
if (fn->reg & BIT(*r)) {
off += 8;
fprintf(f, "\tmovq -%"PRIu64"(%%rbp), %%%s\n",
off,

Copilot uses AI. Check for mistakes.

/* Only GPR callee-saves are modeled to match the shared emitter/reg model. */
int amd64_win64_rclob[] = {
RBX, RBP, RSI, RDI, R12, R13, R14, R15,
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amd64_win64_rclob includes RBP, but the emitter already saves/restores RBP via pushq %rbp / leave, and RBP is also in T.rglob so it will always appear as live and get included in fn->reg. This can lead to redundant save/restore and, worse, restoring %rbp from a frame slot before other restores/leave can corrupt subsequent frame-relative loads. Exclude RBP from amd64_win64_rclob (and treat it like sysv does).

Suggested change
RBX, RBP, RSI, RDI, R12, R13, R14, R15,
RBX, RSI, RDI, R12, R13, R14, R15,

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +8
/* Windows x64 calling convention summary:
* - Four register argument slots: RCX, RDX, R8, R9 (GPR) or XMM0..3 (FP) chosen per argument type.
* - Caller must reserve 32 bytes of shadow space for every call.
* - Return: scalars in RAX/XMM0. Aggregates > 8 bytes use a hidden sret pointer in RCX; callee returns that pointer in RAX.
* - We currently treat all XMM registers as caller-saved (safe, but not optimal) to avoid changing the shared register model.
* - Varargs use the shadow space to access register arguments.
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says all XMM registers are treated as caller-saved, but the implementation models XMM6–XMM15 as callee-saved (amd64_win64_rclob) and the emitter saves/restores them. Please update the comment to match the actual behavior (or adjust the save/clobber sets if the comment is the intended design).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows Support

1 participant