- Design Overview
- Three-Layer Architecture
- Memory Layout
- Lock-Free Ring Buffer
- Platform Abstraction
- Message Flow
- Error Handling Strategy
- Performance Characteristics
SwiftChannel implements a zero-copy, lock-free IPC mechanism using shared memory and atomic operations. The design prioritizes:
- Low latency: Sub-microsecond message passing
- High throughput: Millions of messages per second
- Zero-copy: Direct memory access, no serialization
- Easy integration: Header-only sender API
- Sender Optimization: The hot path (sending) is header-only and inlined
- Lock-Free: SPSC ring buffer with atomic indices
- Cache-Friendly: Aligned to cache lines, minimal false sharing
- Platform Neutral: Abstraction over Windows/POSIX primitives
βββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β (User Code - Sender/Receiver) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Header-Only Sender API β β Zero friction
β β’ Inline send() β
β β’ Ring buffer writes β
β β’ No allocations, no syscalls β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Compiled Core Runtime β β Stable ABI
β β’ Receiver implementation β
β β’ Channel lifecycle β
β β’ Handshake & versioning β
β β’ Platform abstraction β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β OS Primitives β
β Windows: File Mapping β
β POSIX: shm_open + mmap β
βββββββββββββββββββββββββββββββββββββββββββ
Files: include/swiftchannel/sender/*.hpp
This layer provides the fast-path sending logic:
sender.hpp: Main Sender classring_buffer.hpp: Lock-free SPSC ring buffermessage.hpp: Type-safe message wrappersconfig.hpp: Configuration structures
Key characteristics:
- β No linking required
- β Fully inlined
- β Zero syscalls in fast path
- β Compile-time optimizations
Files: src/receiver/, src/ipc/
This layer handles:
- Shared memory creation/opening
- Channel lifecycle management
- Receiver polling/dispatch
- Handshake and version negotiation
- Statistics and diagnostics
Key characteristics:
- β Stable ABI (can be precompiled)
- β Platform abstraction
- β Resource management
- β Error handling
Files: src/platform/windows/, src/platform/posix/
Platform-specific implementations:
- Shared memory allocation
- Memory mapping
- Process synchronization (if needed)
ββββββββββββββββββββββββββββββββββββββββββββββββ β Start of shared memory
β SharedMemoryHeader (128 bytes, aligned) β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β magic: 0x53574946 ("SWIF") β β
β β version: uint32 β β
β β ring_buffer_size: uint64 β β
β β write_index: atomic<uint64> β β β Producer writes here
β β read_index: atomic<uint64> β β β Consumer reads here
β β sender_pid: uint32 β β
β β receiver_pid: uint32 β β
β β flags: uint64 β β
β β reserved[8]: uint64 β β
β ββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββ€ β Cache-line aligned
β β
β Ring Buffer Data β
β (power-of-2 size) β
β β
β Message 1: β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β MessageHeader (32 bytes) β β
β β ββββββββββββββββββββββββββββββββββββ β β
β β β magic: 0x53574946 β β β
β β β size: uint32 β β β
β β β sequence: uint64 β β β
β β β timestamp: uint64 β β β
β β β checksum: uint32 β β β
β β ββββββββββββββββββββββββββββββββββββ β β
β β Payload (variable size, 8-byte aligned) β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β
β Message 2: ... β
β β
β (wraps around at ring_buffer_size) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββ
- Header Alignment: SharedMemoryHeader is cache-line aligned to prevent false sharing
- Ring Buffer: Power-of-2 size enables fast modulo using bitmask
- Message Alignment: Each message payload is 8-byte aligned
- Wrap-Around: Ring buffer wraps at boundaries, handled transparently
The ring buffer uses two atomic indices:
write_index: Updated by sender (producer)read_index: Updated by receiver (consumer)
// Pseudo-code for sender
uint64_t current_write = write_index.load(relaxed);
uint64_t current_read = read_index.load(acquire); // β Memory barrier
uint64_t available = ring_size - (current_write - current_read);
if (available >= message_size) {
// Write message to ring[current_write % ring_size]
write_index.store(current_write + message_size, release); // β Memory barrier
return SUCCESS;
}
return BUFFER_FULL;// Pseudo-code for receiver
uint64_t current_read = read_index.load(relaxed);
uint64_t current_write = write_index.load(acquire); // β Memory barrier
if (current_read < current_write) {
// Read message from ring[current_read % ring_size]
read_index.store(current_read + message_size, release); // β Memory barrier
return SUCCESS;
}
return BUFFER_EMPTY;- acquire: Ensures all previous writes by the other thread are visible
- release: Ensures all writes before this point are visible to other threads
- relaxed: No synchronization, used for local reads
This is the minimum necessary synchronization for correctness.
// write_index and read_index are in separate cache lines
struct alignas(64) CacheAligned<atomic<uint64_t>>;This prevents false sharing: updates to one index don't invalidate the other's cache line.
Shared Memory: Named File Mapping
HANDLE CreateFileMappingW(
INVALID_HANDLE_VALUE, // Use page file
NULL, // Security
PAGE_READWRITE, // Access
size_high, size_low, // Size
L"Local\\SwiftChannel_name" // Name
);
void* MapViewOfFile(handle, FILE_MAP_ALL_ACCESS, 0, 0, size);Shared Memory: POSIX shm
int fd = shm_open(
"/swiftchannel_name", // Name (must start with /)
O_CREAT | O_RDWR, // Flags
0666 // Permissions
);
ftruncate(fd, size);
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);Both implementations are hidden behind:
class SharedMemory {
static Result<SharedMemory> create_or_open(name, size, create);
void* data();
void close();
};Application
β
sender.send(msg)
β
[Header-only inline code]
β
Check buffer space
β
Write MessageHeader
β
Write payload
β
Atomic update write_index (release)
β
Return to application
Latency: ~50-200 nanoseconds (no syscalls!)
Application
β
receiver.start(handler)
β
[Compiled runtime]
β
Poll loop:
β
Atomic load write_index (acquire)
β
Check if data available
β
Read MessageHeader
β
Validate magic/checksum
β
Read payload
β
Atomic update read_index (release)
β
Call handler(data, size)
β
Repeat
template<typename T>
class Result {
ErrorCode error_;
T value_;
public:
bool is_ok() const;
bool is_error() const;
ErrorCode error() const;
T& value();
};- Channel Errors: Not found, already exists, full, closed
- Message Errors: Too large, invalid, corrupted
- Memory Errors: Out of memory, mapping failed
- System Errors: Permission denied, resource busy
- Version Errors: Protocol mismatch
auto result = sender.send(msg);
if (result.is_error()) {
switch (result.error()) {
case ErrorCode::ChannelFull:
// Handle backpressure
break;
case ErrorCode::MessageTooLarge:
// Split message or increase max size
break;
default:
// Fatal error
break;
}
}| Operation | Typical Latency | Notes |
|---|---|---|
| Send (fast path) | 50-200 ns | Header-only, no syscalls |
| Send (buffer full) | 50-200 ns | Just returns error |
| Receive (data available) | 100-300 ns | Includes memcpy |
| Receive (no data) | 50-100 ns | Just atomic load |
| Scenario | Throughput | Notes |
|---|---|---|
| Small messages (64B) | 10-20M msg/sec | Limited by CPU |
| Medium messages (1KB) | 2-5M msg/sec | Memory bandwidth |
| Large messages (64KB) | 100-500K msg/sec | Memory bandwidth limited |
- Fixed: 128 bytes (SharedMemoryHeader)
- Per message: 32 bytes (MessageHeader)
- Ring buffer: User-configurable (typically 64KB - 16MB)
- Sender: Near-zero (just memory writes)
- Receiver: Depends on polling strategy
- Spin-loop: 100% of one core
- Yield: Minimal when idle
- Sleep: Trades latency for CPU
- Multi-producer support: MPSC ring buffer variant
- Zero-copy large messages: Separate buffer pool
- Backpressure API: Blocking send with timeout
- RDMA backend: For RDMA-capable networks
- Monitoring: Built-in metrics and tracing
- Custom allocators: For ring buffer memory
- Message filters: Pre-processing on receive
- Compression: Transparent payload compression
- Encryption: Optional payload encryption
| Feature | SwiftChannel | Unix Sockets | Pipes | Message Queues |
|---|---|---|---|---|
| Latency | Very Low | Medium | Medium | High |
| Throughput | Very High | Medium | Medium | Low |
| Zero-copy | β | β | β | β |
| Easy integration | β | β | β | β |
| Cross-platform | β | Partial | β | Partial |
| Type-safety | β | β | β | β |
- Lock-Free Programming: Preshing on Programming
- Memory Ordering: C++ Memory Model
- SPSC Queues: Dmitry Vyukov's MPMC Queue
- Shared Memory: POSIX Programmer's Manual