iOS PC Game Emulation: Comprehensive Technical Analysis

Executive Summary

Running x86 Windows PC games on iOS is theoretically possible but extraordinarily difficult. After deep analysis of every component in the stack, the most viable architecture is:

Windows x86_64 Game (.exe)
        |
[FEX-Emu] — x86_64 → ARM64 JIT translation (with iOS W^X compliance)
        |
[Wine] — Windows API → Darwin/POSIX API translation (native ARM64 via xtajit)
        |
[DXMT] — DirectX 11 → Metal (direct, no Vulkan intermediary)
[VKD3D-Proton → MoltenVK] — DirectX 12 → Vulkan → Metal (near-term)
[New DX12→Metal layer] — DirectX 12 → Metal (long-term, best perf)
[MoltenVK] — Vulkan → Metal (for native Vulkan games)
[ANGLE] — OpenGL ES → Metal (for simple OpenGL games)
        |
[Metal] — Apple GPU
        |
iPhone Hardware (A17 Pro / A18 / A19)

Estimated feasibility by game category:

DX11 games (Skyrim, Witcher 3, etc.): Possible with significant work
DX9 games (older titles): Possible, easier
DX12 games (RDR2, Cyberpunk): Difficult but viable via two paths (see Section 4)
Vulkan-native games: Possible via MoltenVK
OpenGL games: Limited to ES 3.0 feature set via ANGLE

JIT on iOS — The Foundation
x86 → ARM64 Translation — FEX-Emu vs Box64
Wine on iOS — Windows API Translation
Graphics Stack — DX/GL/VK → Metal
Apple's Own Tools — Rosetta 2, GPTK, D3DMetal
Architectural Options Compared
Recommended Architecture
iOS-Specific Tricks and Hacks
Performance Projections
Implementation Roadmap
Open Questions and Risks

1. JIT on iOS — The Foundation

How JIT Works on iOS

JIT is the single most critical enabler for this entire project. Without JIT, x86 emulation would fall back to pure interpretation, which is 10-50x slower.

Pre-iOS 26 (iOS 17.4 – 18.x):

StikDebug/StikJIT connects to the device's own debugserver via a loopback VPN
Sends vAttach;<pid> to attach debugserver to the target process
The kernel sets CS_DEBUGGED flag on the process
Immediately sends D (detach)
CS_DEBUGGED persists after detach — the app now has JIT capability
The app can use MAP_JIT + pthread_jit_write_protect_np() for JIT code

iOS 26+ (TXM — Trusted Execution Monitor):

Simple attach-detach no longer works
The debugger must remain attached for the lifetime of JIT usage
The app embeds BRK (breakpoint) instructions at points needing executable memory
The debugger intercepts these and uses _M<size>,rx or page-by-page M commands to mark memory executable
StikDebug runs JavaScript scripts that handle this in a loop

Memory Model: W^X (Write XOR Execute)

iOS strictly enforces W^X — a page can never be simultaneously writable and executable.

Pre-TXM approach:

// Allocate JIT region (only ONE per process)
void *jit_mem = mmap(NULL, size, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_JIT, -1, 0);

// Write code (writable, not executable)
pthread_jit_write_protect_np(false);
memcpy(jit_mem, generated_code, code_size);

// Make executable (executable, not writable)
pthread_jit_write_protect_np(true);
sys_icache_invalidate(jit_mem, code_size);

// Execute
((void(*)())jit_mem)();

MeloNX's Dual-Mapping Trick (critical innovation):

Same physical memory, two virtual mappings:
  Mapping A: PROT_READ | PROT_WRITE   (for writing generated code)
  Mapping B: PROT_READ | PROT_EXEC    (for executing code)

Created via: mach_make_memory_entry_64() + vm_map()
Advantage: No need to toggle W/X per-thread — write to A, execute from B

This dual-mapping approach is essential for a JIT-heavy workload like x86 emulation because:

Avoids the performance cost of pthread_jit_write_protect_np() toggling
Allows concurrent write (by compiler thread) and execute (by execution thread)
Works with VM_LEDGER_FLAG_NO_FOOTPRINT (memory doesn't count against app limits!)

TXM (iOS 26+) approach via BreakpointJIT:

1. Allocate RW memory normally
2. Write JIT code to it
3. Trigger BRK instruction with address/size in registers
4. Attached debugger intercepts, marks pages executable via debugserver protocol
5. Execute from now-RX pages

JIT Limitations

Constraint	Pre-iOS 26	iOS 26+ (TXM)
`MAP_JIT` regions	1 per process	N/A (debugger-managed)
Simultaneous RWX	No (W^X enforced)	No
Debugger required	Only during setup	Must stay attached
`get-task-allow`	Required	Required
Maximum JIT memory	~512MB practical (MeloNX limit)	Limited by debugger
PAC interaction	Orthogonal (JIT engine handles)	Same
Thread safety	pthread_jit_write_protect_np is per-thread	Dual mapping preferred

TPRO (Thread Permission Region Override) — iOS 17.4+

LiveContainer revealed a newer mechanism: os_thread_self_restrict_tpro_to_rw() / os_thread_self_restrict_tpro_to_ro(). TPRO allows per-thread permission overrides for memory regions. This could potentially be used as an alternative to pthread_jit_write_protect_np() for more granular control.

2. x86 → ARM64 Translation

FEX-Emu (Recommended)

Architecture:

x86/x86-64 bytes → Frontend Decoder → OpcodeDispatcher → IR (SSA) →
Optimization Passes → ARM64 JIT Backend ("Splatter") → Native ARM64

Why FEX-Emu over Box64:

Feature	FEX-Emu	Box64
JIT memory model	Abstracts via AllocatorHooks (W^X adaptable)	Requires RWX (fatal on iOS)
Platform portability	Windows port exists (proven pluggable OS layer)	Linux-only
Syscall abstraction	`SyscallHandler` interface, OS-pluggable	Hardcoded Linux syscalls
IR	Full SSA IR with optimization passes	Direct x86→ARM64, 4-pass
Self-modifying code	Write-protect + fault handler	Also fault-based, but RWX fallback
Thunking	Sophisticated library forwarding system	Library wrapping (~270 libs)
Code quality	Modern C++20, Clang-only	C with some C++

Box64's fatal flaw: It allocates ALL JIT memory as PROT_READ|PROT_WRITE|PROT_EXEC simultaneously across 6+ code paths. Adapting this to iOS's W^X would require rewriting the entire Dynarec memory management, including live code patching that modifies executable memory in-place. FEX-Emu is dramatically more portable.

Porting FEX-Emu to iOS — Work Breakdown

1. Darwin Syscall Layer (~40% of effort)

Implement DarwinSyscallHandler (following the WindowsSyscallHandler pattern)
Map Linux syscalls to BSD/Mach equivalents:
- clone → pthread_create (different threading model)
- futex → os_unfair_lock or __ulock_wait/__ulock_wake
- epoll → kqueue
- brk → not available on Darwin (use mmap)
- /proc/ → sysctl + Mach APIs
- mmap flag differences (MAP_ANONYMOUS vs MAP_ANON)
Emulate procfs: /proc/cpuinfo, /proc/self/maps, etc.

2. W^X JIT Compliance (~15% of effort)

Modify CodeBuffer class to use dual-mapped memory (MeloNX pattern)
Replace mmap(PROT_READ|PROT_WRITE|PROT_EXEC) with dual mapping:
- RW view for code generation
- RX view for execution
Add TXM/BreakpointJIT support path for iOS 26+
Modify the Dispatcher to use RX view for execution loop

3. Signal/Exception Handling (~15% of effort)

Adapt to Darwin's __darwin_mcontext64 structure
Consider using Mach exception ports instead of Unix signals
Port deferred signal mechanism (mprotect-based fault page)
ARM64 register access in signal handlers needs Darwin-specific code

4. Memory Allocator (~10% of effort)

Replace Linux mmap flags with Darwin equivalents
Replace prctl(PR_SET_VMA) with Mach VM naming
Handle MADV_DONTNEED behavior difference (Linux zeros pages, Darwin may not)
Use mach_vm_allocate/vm_allocate for large allocations

5. Thread Management (~10% of effort)

Replace clone-based threading with pthread APIs
Port TLS (Thread-Local Storage) setup
Handle Darwin thread lifecycle differences

6. iOS-Specific Thunks (~10% of effort)

Replace Linux GPU thunks (OpenGL/Vulkan/SDL2) with iOS equivalents
Metal/MoltenVK thunks for graphics
CoreAudio thunks for audio
UIKit thunks for windowing/input

Apple Silicon Hardware Assists for x86 Emulation

From Rosetta 2 reverse engineering, Apple Silicon has hardware features that dramatically accelerate x86 emulation:

Hardware TSO mode: A bit in ACTLR_EL1 switches the CPU from ARM's weak memory ordering to x86's Total Store Ordering. This eliminates memory barrier instructions. ~9% performance vs native ARM ordering, but essential for x86 correctness. FEX-Emu already handles TSO in software; hardware TSO would be a massive speedup.
Custom flag computation: Apple added undocumented instructions that compute x86 parity flag (PF) and adjust flag (AF) directly, stored as bits 26-27 of NZCV. Without this, software emulation requires ~5x more instructions.
FEAT_FlagM/FlagM2: Standard ARM extensions for flag manipulation:
- CFINV: Inverts carry flag (x86 subtract-with-borrow)
- RMIF: Moves register bits into flags
- SETF8/SETF16: 8/16-bit flag behavior
- AXFLAG/XAFLAG: FP condition flag conversion
FEAT_AFP: Non-standard FP behavior matching x86 NaN handling

Can we use these on iOS? The standard ARM extensions (FlagM, FlagM2) are accessible from userspace. The TSO mode bit requires EL1 (kernel) access — on macOS, sysctl hw.optional.arm.FEAT_TSO reports availability and Rosetta 2 sets it per-process. On iOS, this is not exposed to apps. We would need to use FEX-Emu's software TSO emulation (atomics + barriers), which is slower but correct. If someone found a way to toggle TSO mode on iOS (perhaps via the hypervisor framework?), it would be a game-changer.

3. Wine on iOS — Windows API Translation

Wine's Architecture (Relevant to iOS)

Wine uses a PE/Unix split architecture:

PE side: Windows DLLs compiled as PE binaries (Windows calling conventions)
Unix side: Native .so libraries that implement the actual functionality
__wine_syscall_dispatcher: Bridges PE calls to Unix functions via a table-driven dispatch

Key Wine components and iOS feasibility:

Component	What it does	iOS Feasibility
`ntdll.dll`	Windows NT kernel interface	Core PE DLL, runs in emulator
`kernel32.dll`	Win32 API base	Runs in emulator, calls ntdll
`user32.dll`	Window management	Needs iOS display driver
`gdi32.dll`	Graphics Device Interface	Needs iOS graphics driver
`wineserver`	Windows kernel state (handles, mutex, registry)	Must run as thread (no fork on iOS)
`winemac.drv`	macOS display driver	Template for `wineios.drv`
`winex11.drv`	X11 display driver	Not applicable

Wine Display Driver for iOS (`wineios.drv`)

Wine's driver model is modular — all drivers implement a consistent interface defined in wine/gdi_driver.h. The macOS driver (winemac.drv) uses Cocoa/Quartz natively and serves as the template. An iOS driver would:

Use UIKit instead of AppKit for window management
Use CAMetalLayer for rendering surface
Use GameController.framework for input
Use Core Audio for audio (same as macOS)
Map Windows window management concepts to UIKit views

Wine Process Model on iOS

Major challenge: Wine uses fork() extensively for process creation, and iOS does not allow fork().

Solutions:

In-process wineserver: Run wineserver as a thread within the main process instead of a separate daemon. This is architecturally possible but requires careful refactoring of Wine's IPC model.
Single-process Wine: All "Windows processes" run as threads in a single host process. This mirrors how the Windows port of Wine handles things.
MSync for synchronization: Wine already has macOS-specific synchronization (msync) using Mach semaphores, which is faster than the Linux esync/fsync approach.

Wine WoW64 — Running 32-bit Games

Wine 11.0's WoW64 mode runs 32-bit Windows apps on a purely 64-bit system by thunking 32-bit API calls to 64-bit internally. This is critical for iOS because:

iOS has never supported 32-bit processes
Many PC games are 32-bit
WoW64 eliminates the need for any 32-bit libraries on the host

The xtajit API — Plugging in FEX-Emu

Wine has a dedicated API for plugging in x86 emulators called xtajit (modeled after Windows' own xtajit.dll). The emulator DLL must export:

BTCpuProcessInit — Initialize the emulator
BTCpuSimulate — Main emulation loop (never returns)
BTCpuGetBopCode — Get callback mechanism

FEX-Emu already provides libwow64fex.dll as an xtajit implementation. This means:

Wine handles all API translation natively (in ARM64)
FEX-Emu only emulates application code (x86 → ARM64)
Wine's hundreds of DLLs run at native speed
Only the game's own code runs through the emulator

This is a massive performance advantage over emulating the entire Wine stack.

4. Graphics Stack — DX/GL/VK → Metal

Option Matrix

Game API	Best iOS Path	Layers	Est. Overhead	Feasibility
DirectX 11	DXMT → Metal	1	5-15%	Best option, needs porting
DirectX 9	DXMT (partial) or custom	1-2	10-20%	Moderate
DirectX 12	VKD3D → MoltenVK → Metal (near-term) or new DX12→Metal layer (long-term)	2 or 1	15-30%	Viable — two paths (see below)
Vulkan	MoltenVK → Metal	1	5-15%	Works today on iOS
OpenGL	ANGLE → Metal	1	5-20%	Works today on iOS (ES 3.0 only)

DXMT — The Best Path for DirectX 11 (Recommended)

DXMT translates DX11 directly to Metal, bypassing Vulkan entirely. Key advantages:

Direct Metal targeting: No Vulkan intermediary means one less translation layer
AIR bitcode shaders: Converts DXBC → Apple AIR (LLVM-based) → Metal library. This matches the native Metal shader compilation pipeline.
Geometry shader support: Converts geometry shaders to Metal mesh/object shaders — solving the fatal DXVK blocker
Tessellation support: Maps to Metal's tessellation pipeline
Non-Wine build mode: Has dxmt_native (nativemetal) that doesn't require Wine
macOS-only currently but the Metal APIs it uses are identical on iOS

Porting DXMT to iOS requires:

Change -sdk macosx to -sdk iphoneos in Metal shader compilation
Replace Cocoa.h with UIKit/UIKit.h
Replace MTLCopyAllDevices() (macOS-only) with MTLCreateSystemDefaultDevice()
Add ARM64 as a supported CPU family in meson.build
Test against iOS Metal feature sets

DXVK — Blocked Without GS, But Unblockable

DXVK requires geometryShader = true as a mandatory Vulkan feature (line 801 of dxvk_device_info.cpp). MoltenVK does not currently support geometry shaders because Metal has no native GS stage. However, this is solvable via multiple approaches:

Ryujinx's VTG-as-Compute technique — convert GS to compute shaders using storage buffers. Works on ALL Apple GPUs, proven on iOS. Could be implemented inside MoltenVK's SPIR-V translation layer or within DXVK itself.
GS→mesh shader emulation — referencing DXMT's dxbc_converter_gs.cpp. Requires A17 Pro+ for mesh shaders.
MoltenVK PR #1815 working on this upstream.

If GS support is added to MoltenVK (via any approach), DXVK becomes a viable alternative to DXMT for DX11 on iOS. DXMT remains preferred (one fewer translation layer), but DXVK→MoltenVK becomes a solid fallback.

MoltenVK — Solid Foundation

MoltenVK has first-class iOS support:

Vulkan 1.4 on iOS
Minimum iOS 14
Full descriptor indexing on Apple3+ with Tier 2 argument buffers
BC texture compression on A14+/M1+ (critical for PC game assets)
App Store compatible (no private APIs in default mode)
SPIR-V → MSL runtime conversion via SPIRV-Cross

ANGLE — Production-Ready for OpenGL

ANGLE's Metal backend is production-ready on iOS (used by Chrome):

OpenGL ES 2.0 and 3.0: Complete
Works on iOS 12+
Desktop OpenGL not supported (only ES)

DirectX 12 — Two Viable Paths

DX12 on iOS is difficult but achievable through two complementary approaches:

Path A: VKD3D-Proton → MoltenVK → Metal (Near-Term)

VKD3D-Proton translates DX12 to Vulkan, then MoltenVK translates Vulkan to Metal. Detailed requirement analysis:

Hard requirements that MoltenVK satisfies (A17 Pro+):

VKD3D-Proton Requirement	MoltenVK Status
Vulkan 1.3	Supported (MoltenVK 1.3+)
`VK_EXT_robustness2` (nullDescriptor)	Supported
`VK_KHR_push_descriptor`	Supported
`VK_EXT_vertex_attribute_divisor`	Supported
`VK_EXT_custom_border_color`	Supported
`VK_EXT_depth_clip_enable`	Supported
`samplerMirrorClampToEdge`	Supported (Apple7+ / A15+)
`shaderDrawParameters`	Supported
`textureCompressionBC`	Supported (A15+ / Apple GPU Family 8+)
`vulkanMemoryModel` + `DeviceScope`	Advertised (Metal relaxed-only — correctness risk)
`tessellationShader`	Supported (via compute + Metal tessellation)
DXR / Raytracing	Not supported, but fully optional — games fall back
Sparse resources	Limited on iOS — caps device at FL 11.0/11.1

The sole real blocker: geometryShader

VKD3D-Proton requires geometryShader = true in its baseline profile (DX12 FL 11.0 mandates it). MoltenVK does not currently support geometry shaders because Metal has no native GS stage. However, there are multiple proven approaches to resolve this:

Ryujinx/MeloNX's VTG-as-Compute approach (proven, shipping on iOS today). Ryujinx completely bypasses the geometry shader problem by converting the entire vertex+geometry pipeline to compute shaders at the IR level. This is a three-phase pipeline:
- Phase 1 (Vertex as Compute): Vertex shader runs as a compute dispatch — loads attributes from buffer textures, writes outputs to a storage buffer.
- Phase 2 (Geometry as Compute): Geometry shader runs as a compute dispatch — reads Phase 1's output buffer, EmitVertex becomes storage buffer writes, EndPrimitive writes -1 sentinel indices for primitive restart. Uses a topology remap buffer for invocation→vertex mapping.
- Phase 3 (Passthrough Vertex): Synthetic vertex shader reads compute outputs, feeds the fragment shader via normal DrawIndexed with primitive restart.
- Requires no mesh shaders, no transform feedback, no geometry shader Vulkan feature — just compute shaders and storage buffers, which MoltenVK fully supports on all Apple GPUs. Shipping and working on iOS today in MeloNX.
- Key source files: VertexToCompute.cs, GeometryToCompute.cs, VtgAsComputeState.cs in the Ryujinx shader compiler.
- This technique can be implemented at the MoltenVK SPIR-V→MSL layer, or within VKD3D-Proton/DXVK themselves.
Fork MoltenVK and add GS→mesh shader support, using DXMT as reference. DXMT solves this in src/airconv/dxbc_converter_gs.cpp — converts geometry shaders to Metal mesh/object shaders. The core technique transfers to MoltenVK (input is SPIR-V via SPIRV-Cross instead of DXBC). Requires mesh shader support (A17 Pro+ / Apple GPU Family 9).
MoltenVK PR #1815 is actively implementing GS emulation upstream. When this lands, the blocker is removed without forking.
Fork VKD3D-Proton with graceful GS degradation. Many DX12 games don't heavily use geometry shaders — DX12 encourages compute shader alternatives. Report GS as unsupported and let games fall back.

The Ryujinx VTG-as-Compute approach is the most immediately applicable because it works on ALL Apple GPUs (no mesh shader requirement), is battle-tested on iOS, and can be implemented at multiple layers of the stack.

Ryujinx/MeloNX's Other MoltenVK Workarounds (Applicable to Our Project)

MeloNX contains a comprehensive set of MoltenVK workarounds that we should adopt:

Workaround	What It Does	Why
Transform feedback as storage buffers	Replaces `VK_EXT_transform_feedback` with direct storage buffer writes	MoltenVK TF polyfill may be incomplete
Null descriptors disabled	Avoids `robustness2.nullDescriptor` on MoltenVK	Implementation quirks
Push descriptors capped at 8	Limits `maxPushDescriptors` to 8 regardless of device report	MoltenVK stability
Fragment output specialization	Specializes fragment outputs based on render target format	Metal doesn't auto-match output types to attachments
Reduced shader precision	Removes `NoContraction`, allows Metal fast-math	Performance gain, with guards on div-by-zero patterns
Integer blend disabled	Force-disables blend for integer attachments	Metal doesn't support integer blending
Vertex attribute stride clamping	Reduces attribute format when exceeding buffer stride	Metal glitches on oversized attributes
Storage buffer cross-stage visibility	Makes bindings visible to ALL active stages	MoltenVK bug on pre-iOS 17
Barrier handling	Skips render pass breaks at image barriers	Metal has its own hazard tracking
Memory type override	Reports `DedicatedMemory` instead of `UnifiedMemory`	MoltenVK memory model mapping issues
Portability subset	Triangle fans → triangle lists, no point mode, no LOD bias	`VK_KHR_portability_subset` limitations

Performance: Double translation (DX12→VK→Metal) adds real overhead, but because Wine and the translation layers themselves run as native ARM64 (via the xtajit/ARM64EC hybrid model), only the game's own CPU code is emulated. The GPU translation layers run at full native speed. Estimated 15-30% overhead over native Metal, which is acceptable for lighter DX12 titles.

Other risks:

VK_EXT_dynamic_rendering_unused_attachments (mandatory in baseline profile) — MoltenVK support uncertain
vulkanMemoryModel correctness — Metal only supports memory_order_relaxed, which could cause subtle rendering artifacts in games relying on proper Vulkan memory ordering
Build system requires patching (only Linux/Windows currently)

Path B: Open-Source DX12 → Metal Layer (Long-Term, Best Performance)

Build a new open-source DX12-to-Metal translation layer, inspired by Apple's proprietary D3DMetal. This eliminates the Vulkan intermediary entirely.

Apple gives away the hardest part for free: The public Metal Shader Converter handles ~40-50% of the work:

All shader compilation: DXIL (SM 6.0-6.6) → Metal IR
Root signature → argument buffer layout generation
Geometry/tessellation emulation via Metal mesh shaders
Ray tracing pipeline construction
Supports iOS 17+ natively — produces standard .metallib files
Ships with metal_irconverter_runtime.h providing descriptor table management, draw call emission helpers

What remains to build (~50-60%) — the runtime API translation layer:

Component	Maps To	Reference Code
`ID3D12Device`	`MTLDevice` wrapper	vkd3d-proton `d3d12_device.c`
`ID3D12CommandQueue`	`MTLCommandQueue`	DXMT `dxmt_command_queue.hpp`
`ID3D12GraphicsCommandList`	`MTLCommandBuffer` + encoders	vkd3d-proton `d3d12_command_list.c`
`ID3D12DescriptorHeap`	Argument Buffers Tier 2	Metal Shader Converter runtime
`ID3D12RootSignature`	Argument buffer layout	Metal Shader Converter handles this
`ID3D12PipelineState`	`MTLRenderPipelineState` / `MTLComputePipelineState`	DXMT pipeline code
`ID3D12Resource`	`MTLBuffer` / `MTLTexture`	DXMT resource management
`ID3D12Fence`	`MTLSharedEvent`	Relatively direct
`IDXGISwapChain`	`CAMetalLayer` + `MTLDrawable`	DXMT `dxmt_presenter.cpp`
Resource barriers	Metal sync primitives	DXMT synchronization model

Scope: ~50,000-100,000 lines of C++. Two excellent reference implementations exist: vkd3d-proton (~200K LOC) for the DX12 API surface, and DXMT for Metal-specific translation patterns (command queue architecture, binding model, Wine integration). DXMT's architecture — triple-threaded command model, lambda-based command recording, argument encoding context — directly informs how to build this.

Note on D3DMetal RE: No public reverse engineering effort of Apple's D3DMetal exists (the attesor repo is actually Rosetta 2 RE, not D3DMetal). Apple's license prohibits RE of D3DMetal itself. However, the public Metal Shader Converter documentation, DXMT's open-source implementation, and vkd3d-proton's DX12 API coverage provide more than enough architectural reference to build a clean-room implementation.

Recommended DX12 Strategy

Approach	Effort	Performance	Timeline
VKD3D-Proton → MoltenVK (after GS emulation)	Medium	~70-85% native Metal	Near-term (depends on MoltenVK PR #1815)
Fork VKD3D-Proton with GS degradation	Medium	~70-85% (GS games break)	Near-term
New DX12 → Metal layer	Very Large (2-4 eng-years)	~85-95% native Metal	Long-term
Both in parallel	Best strategy	Best of both	Start VKD3D path now, build DX12→Metal long-term

Metal Feature Availability (A17 Pro / A18 / A19)

Modern iPhone GPUs support:

Metal 3.x with MSL 3.1+
Mesh shaders (critical for DXMT geometry shader emulation)
Hardware raytracing
Tier 2 argument buffers (1M+ descriptors)
BC texture compression
SIMD permute/reduction/quad operations
Tessellation, layered rendering
Sampler clamp to border, mirror clamp to edge
128 textures per stage
Non-uniform threadgroups

5. Apple's Own Tools — Rosetta 2, GPTK, D3DMetal

Rosetta 2 — What We Can Learn

Rosetta 2 achieves ~80% native performance through:

AOT translation: Entire binaries pre-translated, cached in /var/db/oah/
Hardware TSO mode: Eliminates memory barriers (~9% cost vs native)
Custom flag hardware: Parity/adjust flags computed in hardware
Peephole optimization: Dead code elimination, constant folding, instruction combining
Translation ratio: ~1.64x code size expansion

What we can apply:

FEX-Emu should use AOT translation for known game binaries (pre-translate during "install")
If we can access hardware TSO on iOS, use it (major speedup)
FEX-Emu's IR already supports many of the same optimizations
The dual AOT+JIT approach (AOT for known code, JIT for dynamic code) is the right strategy

GPTK — Architecture Reference

GPTK = Wine (CrossOver source) + D3DMetal framework. On macOS:

x86_64 Windows Game → Rosetta 2 (CPU) → Wine (API) → D3DMetal (GPU) → Metal

On iOS, Rosetta 2 isn't available, so we replace it with FEX-Emu. D3DMetal is proprietary, so we use DXMT for DX11 and VKD3D-Proton→MoltenVK (near-term) or a new open-source DX12→Metal layer (long-term) for DX12. The architecture becomes:

x86_64 Windows Game → FEX-Emu (CPU) → Wine (API) → DXMT/VKD3D/DX12→Metal (GPU) → Metal

Metal Shader Converter — Usable on iOS

Apple's Metal Shader Converter converts DXIL (DirectX IL) to Metal IR. The output works on iOS — it produces standard .metallib files. This could potentially be used alongside or instead of DXMT's own shader conversion:

DXMT converts DXBC → AIR (its own path)
Metal Shader Converter converts DXIL → Metal IR (Apple's path)
Both produce Metal-compatible shader libraries

CrossOver — Reference Implementation

CrossOver combines:

Wine with macOS driver
D3DMetal for DX11/DX12
DXMT for DX11/DX10 (collaborated with 3Shain)
DXVK fallback for DX9 → Vulkan → MoltenVK
MSync for macOS-native synchronization

This validates our proposed stack: Wine + DXMT + MoltenVK is a proven combination.

6. Architectural Options Compared

Option A: Direct Port (FEX-Emu + Wine + DXMT on iOS) — RECOMMENDED

Windows Game (.exe)
    → FEX-Emu (x86→ARM64 JIT, ported to iOS)
    → Wine (Windows→POSIX API, with wineios.drv)
    → DXMT/MoltenVK/ANGLE (Graphics→Metal)
    → iOS/Metal

Pros: Best performance, least layers, proven by CrossOver on macOS Cons: Largest porting effort, every component needs iOS adaptation

Option B: ARM64 Linux VM + FEX-Emu + Wine

Windows Game (.exe)
    → FEX-Emu (running inside Linux VM)
    → Wine (running inside Linux VM)
    → Virtio-GPU / virglrenderer
    → Host iOS Metal

Pros: FEX-Emu and Wine run unmodified on Linux Cons: VM overhead, virtio-GPU performance terrible for gaming, iOS hypervisor framework limitations, no GPU passthrough on iOS

Option C: Full x86 Linux Emulation (QEMU-style)

Windows Game (.exe)
    → Wine (inside x86 Linux, no emulation needed for Wine itself)
    → Linux x86 kernel (emulated)
    → QEMU/UTM (x86→ARM64 full system emulation)
    → iOS

Pros: Everything runs unmodified Cons: Full system emulation is catastrophically slow for gaming (~5-10% native speed)

Option D: ARM64 Windows via Wine (no x86 emulation)

ARM64 Windows Game or Re-compiled Game
    → Wine (Windows→iOS API translation only)
    → DXMT/MoltenVK (Graphics→Metal)
    → iOS/Metal

Pros: No CPU emulation overhead, best possible performance Cons: Very few ARM64 Windows games exist, games would need recompilation

Verdict

Option A is the only viable path for real gaming performance. Option B adds too much overhead. Option C is too slow. Option D has no game library. The rest of this document focuses on Option A.

7. Recommended Architecture

Full Stack Diagram

┌─────────────────────────────────────────────────┐
│                 iOS Swift App                     │
│  (UIKit, CAMetalLayer, GameController, CoreAudio) │
├─────────────────────────────────────────────────┤
│              JIT Management Layer                 │
│  (StikDebug/BreakpointJIT, Dual-Mapped Memory)   │
├─────────────────────────────────────────────────┤
│                Wine (ARM64 PE)                    │
│  ntdll, kernel32, user32, gdi32, wineserver      │
│  wineios.drv (UIKit display + Metal + GameCtrl)   │
├──────────┬──────────────────┬──────────┬─────────┤
│  DXMT    │ VKD3D→MoltenVK   │ MoltenVK │  ANGLE  │
│(DX11→MTL)│ (DX12→VK→MTL)    │(VK→MTL)  │(GL→MTL) │
│          │ or DX12→MTL new  │          │         │
├──────────┴──────────────────┴──────────┴─────────┤
│              FEX-Emu (x86→ARM64)                  │
│  Frontend Decoder → IR → ARM64 JIT               │
│  Darwin Syscall Handler                           │
│  x86 Linux RootFS (bundled libraries)             │
├─────────────────────────────────────────────────┤
│              iOS / Metal / Darwin Kernel           │
│         (A17 Pro / A18 Pro / A19 Pro)             │
└─────────────────────────────────────────────────┘

Key Design Decisions

Wine runs natively (ARM64) — Wine's DLLs are compiled as ARM64 PE binaries. Only the game's x86 code runs through FEX-Emu. Wine's xtajit API bridges the two.
Single-process model — Everything runs in one iOS process. The wineserver runs as a thread. Game "processes" are threads. This is necessary because iOS doesn't allow fork().
Dual-mapped JIT memory — Use MeloNX's mach_make_memory_entry_64 + vm_map technique for W^X-compliant JIT with zero toggling overhead.
DXMT for DX11 — Direct Metal translation, handles geometry shaders via mesh shaders, ~5-15% overhead.
DX12 dual strategy — VKD3D-Proton→MoltenVK as near-term path (pending geometry shader emulation in MoltenVK), with a new open-source DX12→Metal layer as the long-term high-performance path. Apple's public Metal Shader Converter handles shader compilation (DXIL→Metal IR) for both approaches.
Bundled x86 RootFS — Ship a minimal x86 Linux sysroot with required libraries (libc, libstdc++, etc.) for the guest game to link against. FEX-Emu's thunking system forwards GPU/audio calls to native iOS APIs.
NativeAOT or pre-compiled components — Following MeloNX's pattern, compile Wine and FEX-Emu into native ARM64 dylibs that the Swift app loads.

Component Interaction Flow

1. User selects game → Swift app launches game setup
2. Swift app initializes JIT (StikDebug connection or BreakpointJIT)
3. Swift app initializes FEX-Emu with Darwin syscall handler
4. FEX-Emu loads game .exe from bundled rootfs
5. Game .exe imports Windows DLLs → Wine's ntdll intercepts
6. Wine's xtajit delegates x86 code to FEX-Emu JIT
7. Wine's API calls (file I/O, threading, etc.) → Darwin syscalls (native speed)
8. Game's DX11 calls → DXMT → Metal (native speed via thunking)
   Game's DX12 calls → VKD3D→MoltenVK or DX12→Metal → Metal (native speed)
9. Game's audio → Wine → CoreAudio (native speed via thunking)
10. Display output → CAMetalLayer → Screen

8. iOS-Specific Tricks and Hacks

Trick 1: Dual-Mapped JIT Memory (from MeloNX)

// Create memory entry for physical backing
mach_make_memory_entry_64(mach_task_self(), &size, 0,
    MAP_MEM_NAMED_CREATE | VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE,
    &mem_entry, MACH_PORT_NULL);

// Map RW view (for writing JIT code)
vm_map(mach_task_self(), &rw_addr, size, 0, VM_FLAGS_ANYWHERE,
    mem_entry, 0, FALSE,
    VM_PROT_READ | VM_PROT_WRITE,  // current prot
    VM_PROT_READ | VM_PROT_WRITE,  // max prot
    VM_INHERIT_DEFAULT);

// Map RX view (for executing JIT code)
vm_map(mach_task_self(), &rx_addr, size, 0, VM_FLAGS_ANYWHERE,
    mem_entry, 0, FALSE,
    VM_PROT_READ | VM_PROT_EXECUTE,  // current prot
    VM_PROT_READ | VM_PROT_EXECUTE,  // max prot
    VM_INHERIT_DEFAULT);

Trick 2: VM_LEDGER_FLAG_NO_FOOTPRINT

MeloNX uses VM_LEDGER_FLAG_NO_FOOTPRINT when creating memory entries. This makes the JIT memory not count against the app's memory limit (Jetsam). Critical for allocating the large JIT caches needed for x86 emulation.

Trick 3: LiveContainer's dlopen for Component Loading

LiveContainer's technique of converting MH_EXECUTE to MH_DYLIB and loading via dlopen could be used to:

Load pre-compiled Wine components as dylibs
Load graphics translation layers dynamically
Hot-swap emulator components without app rebuilds

In JIT-less mode, re-sign components with ZSign using the host app's certificate.

Trick 4: Hardware TSO (Speculative)

If we can find a way to enable hardware TSO on iOS:

Apple Silicon has a bit in ACTLR_EL1 for TSO mode
Rosetta 2 uses this on macOS
On iOS, sysctl hw.optional.arm.FEAT_TSO might report availability
The Hypervisor.framework might allow setting TSO for a virtual CPU
If accessible, this would eliminate ALL memory barrier overhead for x86 TSO emulation

Trick 5: AOT Pre-Translation

For known game binaries:

Pre-translate x86 code to ARM64 during "game installation"
Cache translated blocks in the app's data container
On subsequent runs, load pre-translated code directly
Only fall back to JIT for dynamically generated code
This amortizes the JIT compilation cost

Trick 6: Metal Shader Pre-Compilation

During installation, pre-convert all DX11 shaders to Metal libraries
Use Metal's newLibraryWithURL: to load pre-compiled .metallib files
Eliminates shader compilation stuttering during gameplay
DXMT's AIR bitcode path is ideal for this

Trick 7: Increased Memory Entitlements

Two critical entitlements (MeloNX uses both):

com.apple.developer.kernel.increased-memory-limit — Raises the Jetsam physical RAM threshold (how much real memory before iOS kills the app)
com.apple.developer.kernel.extended-virtual-addressing — Expands virtual address space from ~4GB to ~64GB (needed for mapping large guest address spaces, JIT caches, rootfs)

These are different things: increased-memory-limit = more physical RAM allowed, extended-virtual-addressing = more address space to map (even if most is uncommitted). FEX-Emu will need both — x86-64 apps use large VA spaces, and x86 emulation + Wine + graphics layers consume significant physical RAM.

Problem: Free Apple Developer accounts can provision increased-memory-limit but NOT extended-virtual-addressing. Paid accounts ($99/year) can provision both.

Solution: GetMoreRam (used by MeloNX)

GetMoreRam is a sideloadable app that modifies App ID capabilities on Apple's Developer Portal. The flow:

Sideload Mythic with any sideloader (SideStore, AltStore, etc.)
Sideload the GetMoreRam/Entitlement IPA
Open GetMoreRam → Settings → sign in with the same Apple ID used for sideloading
App IDs → Refresh → select Mythic's app ID (e.g., com.mythic.emulator.XXXXXX)
Tap Add Increased Memory Limit (and Extended Virtual Addressing if available)
Delete and reinstall Mythic with the same IPA

What happens under the hood: GetMoreRam uses Apple's Developer Portal API to add the capability to the registered App ID. On reinstall, the sideloader generates a new provisioning profile from Apple's servers which now includes the capability. The entitlement in the binary matches the profile, and iOS honors it.

Alternative: PlumeImpactor — another tool for entitlement injection if GetMoreRam doesn't work.

Alternative: StikDebug's process_control_disable_memory_limit — StikDebug's idevice FFI library has a function that disables Jetsam memory limits at runtime via the debugger protocol (no entitlement needed, but requires StikDebug to stay connected). Not currently exposed in StikDebug's UI but could be.

Runtime detection: The app should check entitlements at runtime and adjust behavior:

checkAppEntitlement("com.apple.developer.kernel.increased-memory-limit")
checkAppEntitlement("com.apple.developer.kernel.extended-virtual-addressing")

Trick 8: LRU JIT Cache with Eviction

MeloNX implements LRU cache eviction at 85% utilization for JIT blocks. For PC games with large code footprints:

Track block access frequency
Evict least-recently-used translated blocks when approaching limit
Re-translate evicted blocks on demand
Hot blocks stay in cache permanently

9. Performance Projections

Overhead Breakdown

Layer	Estimated Overhead	Notes
FEX-Emu x86→ARM64	30-50% of native	With software TSO; 15-25% with hardware TSO
Wine API translation	5-10%	Runs natively, only API marshaling
DXMT DX11→Metal	5-15%	Direct Metal, AIR bitcode shaders
MoltenVK VK→Metal	5-15%	Well-optimized, iOS-native
iOS overhead	3-5%	Memory pressure, thermal throttling

Composite Performance Estimate

DX11 game on iPhone 16 Pro (A18 Pro):

Native x86 PC performance at 100%
After FEX-Emu: ~50-65% (software TSO) or ~70-80% (hardware TSO)
After Wine: ~47-60% or ~65-75%
After DXMT: ~40-55% or ~55-70%
After iOS overhead: ~38-52% or ~52-67%

For comparison, Winlator on Snapdragon 8 Elite with Box64:

Achieves roughly 40-60% of native x86 performance on demanding games
Games like RDR2 achieve playable (25-35fps) at lower settings

Our iOS target: ~35-55% of native x86 performance with software TSO. This would make many DX11 games playable (30fps+) on A18 Pro at reduced settings, similar to Winlator on Android.

Game Tier Predictions

Game Tier	Example Games	Expected Perf	Playability
Light DX9	Half-Life 2, Portal	60fps+	Excellent
Medium DX9/11	Skyrim, Fallout NV	30-60fps	Good
Heavy DX11	Witcher 3, GTA V	20-40fps	Playable
Very Heavy DX11	RDR2 (DX11 mode)	15-30fps	Marginal
Light DX12	Forza Horizon (low)	20-35fps	Playable (via VKD3D→MoltenVK)
Heavy DX12	Cyberpunk 2077	10-20fps	Marginal (via VKD3D→MoltenVK)
Heavy DX12	Cyberpunk 2077	15-30fps	Playable (via future DX12→Metal)

10. Implementation Roadmap

Phase 1: Foundation (JIT + FEX-Emu Core)

JIT Infrastructure
- Implement dual-mapped memory allocator (MeloNX pattern)
- Build BreakpointJIT framework for iOS 26+ TXM support
- Create StikDebug integration for JIT enablement
- Test W^X code generation lifecycle
FEX-Emu iOS Port
- Remove Linux platform check from CMakeLists.txt
- Implement DarwinSyscallHandler (basic syscalls: mmap, open, read, write, close, stat)
- Port memory allocator to use Mach VM APIs
- Port signal handling to Darwin mcontext
- Build ARM64 JIT backend with W^X dual-mapping
- Test with simple x86 Linux binaries (hello world, basic computation)
iOS App Shell
- Swift app with CAMetalLayer
- JIT enablement UI (StikDebug pairing)
- File browser for selecting games
- Logging system with real-time console

Phase 2: Wine Integration

Wine Core on iOS
- Compile Wine for ARM64 Darwin (starting from macOS Wine codebase)
- Implement in-process wineserver (thread-based)
- Implement basic wineios.drv (display to CAMetalLayer)
- Integrate xtajit with FEX-Emu
- Test with simple Windows console apps via Wine
Wine WoW64
- Enable WoW64 mode for 32-bit game support
- Test with 32-bit Windows apps

Phase 3: Graphics

DXMT iOS Port
- Port build system to target iOS
- Replace macOS APIs with iOS equivalents
- Test shader conversion (DXBC → AIR → Metal) on iOS
- Integrate with Wine's DX11 API
MoltenVK Integration
- Configure MoltenVK for iOS with private API support
- Create VkMetalSurfaceEXT from CAMetalLayer
- Test with Vulkan demos
ANGLE Integration
- Build ANGLE with Metal backend for iOS
- Create OpenGL → Metal path for GL games

Phase 4: Polish and Optimization

Performance Optimization
- AOT pre-translation for game binaries
- Metal shader pre-compilation
- JIT cache tuning (size, eviction policy)
- Profile and optimize hot paths
- Explore hardware TSO access
Input/Audio/UI
- GameController framework integration
- On-screen controls
- CoreAudio integration for Wine audio
- Game library management UI
- Settings for resolution, performance tuning

Phase 5: DX12 Support

DX12 via VKD3D-Proton → MoltenVK (Near-Term)
- Fork VKD3D-Proton, patch build system for non-Linux
- Test with MoltenVK on iOS — identify and work around feature gaps
- If MoltenVK PR #1815 (geometry shader emulation) hasn't landed, fork with graceful GS degradation
- Validate transform feedback polyfill, memory model correctness
- Test with DX12 games that don't rely on geometry shaders (many modern titles)
DX12 → Metal Direct Layer (Long-Term)
- Build open-source DX12-to-Metal runtime using Metal Shader Converter for shaders
- Use vkd3d-proton as DX12 API reference, DXMT as Metal translation reference
- Start with core API surface: Device, CommandQueue, CommandList, DescriptorHeap, PSO
- Descriptor heap → Argument Buffer Tier 2 mapping (hardest part)
- Resource barriers → Metal synchronization
- Iteratively add DX12 features guided by game compatibility testing

Phase 6: Advanced Features

Steam/Launcher Support
- Get Steam client running via Wine
- Handle Steam DRM (Steamworks)
- Game download and management
Performance Optimization Pass
- AOT pre-translation for game binaries
- Metal shader pre-compilation and caching
- Hardware TSO investigation
- Profile-guided JIT optimization

11. Open Questions and Risks

Critical Risks

iOS 26+ TXM stability: The BRK-based JIT approach on TXM is new and potentially fragile. StikDebug's iOS 26 support is still evolving. If Apple further restricts this, the entire project is blocked.
Memory limits: Even with increased-memory-limit, iOS Jetsam may kill the app under memory pressure. PC games can use 4-8GB+ RAM; iPhones have 6-8GB total shared between all apps.
Thermal throttling: Sustained heavy CPU/GPU load on an iPhone will trigger thermal throttling, reducing performance significantly after initial minutes.
Apple policy changes: Apple could patch the JIT enablement mechanisms at any time. The project inherently relies on techniques Apple hasn't explicitly endorsed.
Wine complexity: Wine is a massive, complex project. The iOS port will encounter countless edge cases in API translation that require individual attention.

Open Questions

Can hardware TSO be accessed on iOS? This would provide a ~20-30% performance boost for x86 emulation. Needs investigation via Hypervisor.framework or undocumented sysctl.
Can DXMT's geometry shader → mesh shader conversion work on all target games? Some edge cases may not convert correctly.
When will MoltenVK PR #1815 (geometry shader emulation) land? This unblocks VKD3D-Proton→MoltenVK for DX12. If delayed, a fork of VKD3D-Proton with graceful GS degradation is the interim path.
How will Steam DRM interact with the emulation stack? Steam's anti-cheat and DRM may detect the emulated environment.
What is the realistic JIT cache size limit on iOS? MeloNX uses 512MB; PC games may need more.
Can LiveContainer's re-signing approach work for Wine DLLs in JIT-less mode? This could provide a fallback path without JIT for simpler apps.

Things You Didn't Mention That Are Important

Gamescope is NOT needed on iOS. Gamescope is a Wayland compositor for Linux. On iOS, the app owns its own CAMetalLayer and renders directly to it. No compositor is needed.
Proton vs vanilla Wine: Proton is mostly Wine + DXVK + VKD3D + Steam integration patches. For iOS, we want vanilla Wine (or CrossOver's fork) + DXMT instead of DXVK, since DXVK doesn't work with MoltenVK's geometry shader limitation.
iSH (iOS Shell) precedent: iSH runs a Linux shell on iOS using a usermode x86 emulator. It uses a JIT on jailbroken devices but falls back to interpretation on stock iOS. It demonstrates that Linux syscall emulation on iOS is feasible (their kernel/ directory implements a Darwin-hosted Linux kernel).
UTM (iOS VM): UTM runs QEMU on iOS with Apple's Hypervisor.framework. It demonstrates full system emulation but at very low performance. Our approach (usermode emulation) is fundamentally faster than UTM's full system emulation.
Dynarmic: The ARM64 → ARM64 JIT used by some Switch emulators. Not directly applicable to x86 emulation but demonstrates iOS JIT patterns.
Wine on Android (precedent): Projects like Winlator prove that Wine can run on non-desktop ARM64 platforms. The Android + Box64 + Wine stack is directly analogous to our iOS + FEX-Emu + Wine stack.

FilesExpand file tree

ARCHITECTURE_ANALYSIS.md

Latest commit

History