YanAIEngine 🚀

YanAIEngine is an absolute titan of a multimodal inference engine built natively for Apple Silicon. It bypasses high-level frameworks to demonstrate direct control over the GPU via Swift and the Metal Shading Language (MSL), enabling state-of-the-art inference at scale.

Core Philosophy: The Native Advantage

Modern AI models live and die by their memory bandwidth. YanAIEngine focuses on the Silicon & Kernel Layer, exploiting the Apple Unified Memory Architecture (UMA) to achieve zero-copy data sharing between the CPU and GPU, and implementing algorithmic breakthroughs to break the memory wall.

Key Milestones Completed

Architecture

Component	Description
`Tensor.swift`	The foundation. Manages page-aligned CPU/GPU shared memory with zero-copy buffer sharing.
`MetalEngine.swift`	The control plane. Handles device discovery, command queues, and kernel caching.
`Scheduler.swift`	The brain. Manages continuous batching and the Speculative Decoding draft-verify loop.
`LlamaModel.swift`	Llama 3/4 orchestrator. Optimized for $O(1)$ inference with `forwardStep` and batch verification.
`SpeculativeSampler.swift`	The verify engine. Implements rejection sampling to validate draft tokens against the target model.
`PagedKVCache.swift`	Virtual mapping. Uses Page Tables to map logical sequences to physical blocks in the pool.
`BlockAllocator.swift`	Physical pool. Pre-allocates VRAM blocks (16 tokens) to eliminate fragmentation.
`SigLIPEncoder.swift`	Vision Transformer (ViT). Encodes raw pixels into high-density visual embeddings.
`MultimodalProjector.swift`	The bridge. Aligns visual latent spaces with the LLM's dimensional space.
`MoERouter.swift`	Gating network. Sparsely dispatches tokens to expert-partitioned Feed-Forward Networks.
`InferenceServer.swift`	The API layer. Asynchronous server exposing a unified Gemini/OpenAI-compatible interface.
`gemm.metal`	The math. Hand-optimized C++/MSL kernels for maximum compute utilization.

Performance & Infrastructure

Breaking the Bandwidth Wall (Speculative Decoding)

During autoregressive decoding, GPU compute cores often sit idle while waiting for massive weight matrices to be fetched from memory. YanAIEngine implements Speculative Decoding (Goal #21): a technique where a tiny, lightning-fast "draft" model guesses the next several tokens, and the massive "target" model verifies them in parallel. This converts a memory-bound sequential problem into a compute-bound parallel one, often doubling or tripling generation speed on local hardware.

$O(1)$ Throughput (PagedAttention & Continuous Batching)

By virtualizing the KV Cache, we eliminate the need for contiguous VRAM. The engine chops memory into small "Pages" managed by a Block Allocator, allowing the Scheduler to interleave processing for many users simultaneously. This solves the "Memory Wall" (fragmentation) and enables high-concurrency serving without performance degradation.

Multimodal Reasoning (Vision-Language Fusion)

YanAIEngine is fully multimodal. It uses a SigLIP Vision Encoder to process image patches, which are then fused with text tokens via a Multimodal Projector. This allows the engine to "see" and "read" simultaneously, enabling visual question answering and complex scene reasoning.

Quick Start

Swift CLI Demo

# Processes a prompt and generates text natively on the GPU
swift run yanaiengine

Gemini-Compatible Server

# Boot the HTTP server on port 8080
swift run yanaiengine --server

Querying the Multimodal API

# Example: Ask about an image using the Gemini API schema
curl http://localhost:8080/v1beta/models/yanai-model:generateContent \
    -X POST -H "Content-Type: application/json" \
    -d '{
      "contents": [{
        "parts": [
          {"text": "What is in this image?"},
          {"inline_data": {"mime_type": "image/png", "data": "BASE64_DATA"}}
        ]
      }]
    }'

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Sources/yanaiengine		Sources/yanaiengine
yanaiengine.xcodeproj		yanaiengine.xcodeproj
.gitignore		.gitignore
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YanAIEngine 🚀

Core Philosophy: The Native Advantage

Key Milestones Completed

Architecture

Performance & Infrastructure

Breaking the Bandwidth Wall (Speculative Decoding)

$O(1)$ Throughput (PagedAttention & Continuous Batching)

Multimodal Reasoning (Vision-Language Fusion)

Quick Start

Swift CLI Demo

Gemini-Compatible Server

Querying the Multimodal API

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YanAIEngine 🚀

Core Philosophy: The Native Advantage

Key Milestones Completed

Architecture

Performance & Infrastructure

Breaking the Bandwidth Wall (Speculative Decoding)

$O(1)$ Throughput (PagedAttention & Continuous Batching)

Multimodal Reasoning (Vision-Language Fusion)

Quick Start

Swift CLI Demo

Gemini-Compatible Server

Querying the Multimodal API

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages