CXL is All You Need

A comprehensive, curated repository for Compute Express Link (CXL) Covering Research Papers, Specifications, Simulation/Emulation tools, Benchmarks, and Resources for Type 1, 2, & 3 Devices.

Introduction

Inspired by the transformative impact of the "Attention Is All You Need" paper in AI, "CXL is All You Need" aims to be the definitive hub for the next revolution in computer architecture: Compute Express Link (CXL).

As data centers evolve towards disaggregated and heterogeneous computing, CXL has emerged as the industry-standard interconnect offering high-bandwidth, low-latency connectivity with memory coherency. This repository aggregates scattered resources into a single, organized knowledge base for researchers, system architects, and developers.

Whether you are looking for the latest IEEE/ACM research papers, setting up a QEMU-based simulation environment, exploring Type 3 memory expansion, or benchmarking latency and throughput, you will find the essential resources here.

Why CXL?

The traditional server architecture is facing a bottleneck known as the "Memory Wall" and efficient resource utilization challenges. CXL addresses these critical issues through three main pillars:

1. Breaking the Memory Wall

Modern workloads (AI/ML, In-memory DB) demand more memory capacity and bandwidth than traditional DRAM slots can provide.

Expansion: CXL allows CPUs to access memory attached via PCIe-like interfaces with near-DRAM latency.
Bandwidth: It enables scaling memory bandwidth independently of the main memory channels.

2. Cache Coherency & Heterogeneity

In a heterogeneous computing era (CPU, GPU, FPGA, SmartNIC), efficient data sharing is key.

Coherency: CXL maintains cache coherency between the host CPU and accelerators without complex software overhead.
Unified Interface: It supports three protocols (CXL.io, CXL.cache, CXL.mem) to cover various use cases from caching devices (Type 1) to memory buffers (Type 3).

3. Resource Disaggregation & Pooling

CXL enables the shift from server-centric to data-centric architecture.

Memory Pooling: Unused memory in one server can be pooled and assigned to another host dynamically, reducing stranded memory and TCO (Total Cost of Ownership).
Composable Infrastructure: Hardware resources can be composed on-the-fly based on workload requirements.

Official Specifications

CXL consortium

CXL 1.x

CXL 1.0 and 1.1 represent the nascent stage of the technology, based on the PCIe 5.0 physical layer (32 GT/s). A key characteristic of this phase is that it is limited to Single Node configurations. That is, it supports only a 1:1 connection structure between one host processor and one CXL device. Therefore, CXL 1.1 is primarily used for Direct Attached memory expansion or simple accelerator connections without switches.

CXL 2.x

CXL 2.0, announced in 2020, is a pivotal specification driving commercial adoption. While retaining the PCIe 5.0 (32 GT/s) physical layer, it introduced CXL Switching to significantly enhance system flexibility.
Switching & Pooling: Single-level switches allow a host to access multiple devices, or multiple hosts to share a memory pool. This is the core technology enabling dynamic resource allocation.
Multi-Logical Device (MLD): Supports partitioning a single physical CXL memory device into up to 16 logical devices, enabling different hosts to use distinct partitions simultaneously.
Security: CXL IDE (Integrity and Data Encryption) was introduced to ensure data integrity and confidentiality at the link level.
Global Persistent Flush: Standardized functionality to safely store data during system power loss.

CXL 3.x

CXL 3.0, released in August 2022, achieved a leap in data transfer speed and scalability. Based on the PCIe 6.0 physical layer, bandwidth doubled to 64 GT/s, and the signaling method changed from NRZ to PAM4 (Pulse Amplitude Modulation 4-level).
Fabric Capabilities: Going beyond single-level switching, it supports Multi-level Switching, enabling complex fabric topologies like Spine-Leaf. This allows for large-scale system configurations extending beyond the rack.
Peer-to-Peer (P2P): Supports direct data exchange between devices without passing through the host CPU. This maximizes efficiency for GPU-to-GPU data transfer or storage-to-memory communication.
Memory Sharing: While CXL 2.0 pooling involved "partitioning," CXL 3.0 supports true "sharing." Hardware-based coherency management allows multiple hosts to access and modify the same memory address space (Cache Line) concurrently.
CXL 3.1, released in late 2023, further strengthens fabric scalability and security.
Port Based Routing (PBR): Improved routing mechanisms for massive fabric expansion.
Trusted Security Protocol (TSP): Extends security scope to accelerators and memory devices, supporting virtualization-based Trusted Execution Environments (TEE) and enabling safe processing of Confidential Computing workloads.
Extended Metadata: Expanded metadata fields to exchange more sophisticated state information between hosts and devices.

CXL 4.0

PDF

PHY & Electrical Specs

The physical layer of CXL relies heavily on PCIe technology.
Signal & Speed: CXL 1.x/2.0 uses PCIe 5.0 (32 GT/s), while CXL 3.x uses PCIe 6.0 (64 GT/s). The PAM4 signaling introduced in CXL 3.0 transfers twice the data per clock but requires stronger Forward Error Correction (FEC) to ensure signal integrity.
Flit Structure: CXL uses fixed-size Flits (Flow Control Units) for low-latency communication. CXL 1.1/2.0 uses 68-byte Flits, whereas CXL 3.0 introduces 256-byte Flits for high bandwidth and FEC support. However, it selectively supports 68-byte mode for latency-sensitive applications.
Latency: By bypassing the heavy block layer of PCIe and optimizing the transaction layer, CXL achieves significantly lower latency than standard PCIe. It typically adds only about 20-40ns of latency compared to local DDR memory, exhibiting performance characteristics similar to NUMA remote socket access.

CXL Device Protocols

Although the CXL interface physically shares a single link, it logically multiplexes three protocols.
CXL.io: Functionally nearly identical to PCIe. Handles device discovery, enumeration, configuration, interrupts, and non-coherent Load/Store. All CXL devices must support CXL.io.
CXL.cache: Allows the Device to access the Host's (CPU) memory coherently. This enables devices to snoop the CPU cache or read/write data directly, preventing data inconsistency.
CXL.mem: Allows the Host (CPU) to access the Device's attached memory coherently. The CPU can use the device memory like system memory via standard Load/Store instructions without separate drivers or DMA.

CXL.io / CXL.cache / CXL.mem

Device Types

Type 1: Caching Devices / Accelerators

Type 1: Caching Devices / Accelerators
Protocols: CXL.io + CXL.cache
Characteristics: Accelerators that lack local memory or do not expose it to the host. They use CXL.cache to utilize the host CPU's memory as their cache.
Key Devices: SmartNICs, Atomic accelerators. For example, a SmartNIC can process network packets and write the results directly to the receive queue (Ring Buffer) in host memory via CXL.cache.

Type 2: Accelerators with Memory

Protocols: CXL.io + CXL.cache + CXL.mem
Characteristics: Accelerators with their own high-performance memory (HBM, GDDR, etc.). The CPU can push data to the accelerator memory via CXL.mem, and the accelerator can access system memory via CXL.cache.
Bias Mode: Supports two modes for memory coherency optimization.
Host Bias: CPU manages coherency. Device memory is treated like standard system memory.
Device Bias: Device manages coherency. Reduces overhead when the accelerator intensively accesses its local memory.

Key Devices: GPUs, FPGAs, AI ASICs.

Type 3: Memory Expanders

Protocols: CXL.io + CXL.mem
Characteristics: The most actively adopted type currently, used to expand system memory capacity and bandwidth. Since it does not support CXL.cache, the device does not cache host memory, but the host uses the device's memory (Host-Managed Device Memory, HDM) as main memory.
OS View: The OS sees Type 3 devices as "CPU-less NUMA nodes." Existing NUMA management policies can be applied directly.
Key Devices: CXL Memory Buffers, Memory Expander Cards (Add-in Cards), EDSFF (E1.S/E3.S) memory modules.

Simulation & Emulation Tools

QEMU CXL Support

QEMU CXL Support
QEMU offers the most accessible emulation environment for CXL system software development and driver testing. Key CXL 2.0 features like Type 3 devices, switches, and interleaving are included in the mainline.link

Gem5 CXL Models

Gem5 provides cycle-accurate simulation for hardware architectures, used for precise analysis of CXL performance impacts. While QEMU focuses on functional emulation, Gem5 excels in timing and latency modeling.
CXLSim & gem5-CXL: Researchers have released repositories like 'CXLSim' or 'gem5-CXL' extending Gem5 to model CXL transaction and link layer latencies. These models simulate FLIT packing/unpacking overheads, switch delays (e.g., tens of ns per hop), and PCIe bus contention.
Usage: In Full System mode, booting a real Linux kernel and running benchmarks allows analysis of how CXL memory latency affects overall system IPC (Instructions Per Cycle) and application performance.

gem5-CXL link

CXL Simulation

Yang, Yiwei, et al. "CXLMemSim: A pure software simulated CXL. mem for performance characterization." arXiv preprint arXiv:2303.06153 (2023). PDF github
Kim, Seongbeom, et al. "CXLSim: A Simulator for CXL Memory Expander." 2025 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2025. PDF

CXL Emulation

OCEAN – Open-source CXL Emulation at Hyperscale Architecture and Networking github
Gioiosa, Roberto, et al. "Hardware-Software Co-Development for Emerging CXL Architectures." PDF

software

MemMachine, an open-source memory layer for advanced AI agents. github
Scalable Memory Development Kit github
Famfs Shared Memory Filesystem Framework github

Intel CoFluent / DRAMSys

Intel CoFluent: A modeling tool for large-scale data center or cluster-level system design. It is useful for visualizing traffic flows, bottlenecks, and resource utilization in CXL memory pooling scenarios before actual hardware deployment. link1, link2
DRAMSys: A SystemC TLM-2.0 based DRAM subsystem simulator, which has recently added CXL.mem protocol support. It models memory media (DDR5, HBM) behavior behind the CXL controller precisely, allowing analysis of the impact of refresh cycles or bank conflicts on overall CXL performance. link1, link2

Profiling & Benchmarks

Micro-benchmarks

Weisgut, Marcel, et al. "CXL-Bench: Benchmarking Shared CXL Memory Access." Proceedings of the VLDB Endowment. ISSN 2150: 8097. PDF

profiling

Liu, Jinshu, et al. "Systematic cxl memory characterization and performance analysis at scale." Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 2025. PDF
Liu, Jinshu, et al. "Dissecting cxl memory performance at scale: Analysis, modeling, and optimization." arXiv preprint arXiv:2409.14317 (2024). PDF
Ji, Houxiang, et al. "Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024. PDF

Application Workloads

Zeng, Jianping, et al. "Performance characterizations and usage guidelines of samsung cxl memory module hybrid prototype." arXiv preprint arXiv:2503.22017 (2025). PDF
Wang, Xi, et al. "Exploring and evaluating real-world cxl: use cases and system adoption." arXiv preprint arXiv:2405.14209 (2024). PDF
Lee, KyungSoo, et al. "Improving key-value cache performance with heterogeneous memory tiering: A case study of cxl-based memory expansion." IEEE Micro (2024). PDF

Protocols

Lee, Suyeon, et al. "Offloading to CXL-based Computational Memory." arXiv preprint arXiv:2512.04449 (2025). PDF

Research Papers

Survey & Tutorials

Das Sharma, Debendra, Robert Blankenship, and Daniel Berger. "An introduction to the compute express link (cxl) interconnect." ACM Computing Surveys 56.11 (2024): 1-37. PDF
Sharma, Debendra Das. "Compute Express Link®: An open industry-standard interconnect enabling heterogeneous data-centric computing." 2022 IEEE Symposium on High-Performance Interconnects (HOTI). IEEE, 2022. PDF

Memory Pooling & Disaggregation

Zhong, Yuhong, et al. "Oasis: Pooling PCIe Devices Over CXL to Boost Utilization." Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles. 2025. PDF
Park, Junhyeok, et al. "Maximizing Interconnect Bandwidth and Efficiency in Nonvolatile Memory, Express-Based Key-Value Solid-State Devices With Fine-Grained Value Transfer." IEEE Micro 45.6 (2025): 82-90. PDF
Hermes, Jon, et al. "Udon: A case for offloading to general purpose compute on cxl memory." arXiv preprint arXiv:2404.02868 (2024). PDF
Wu, Jianbo, et al. "Performance Study of CXL Memory Topology." Proceedings of the International Symposium on Memory Systems. 2024. PDF
Ham, Hyungkyu, et al. "Low-overhead general-purpose near-data processing in cxl memory expanders." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024. PDF
Gouk, Donghyun, et al. "Memory pooling with cxl." IEEE Micro 43.2 (2023): 48-57. PDF
Li, Huaicheng, et al. "Pond: Cxl-based memory pooling systems for cloud platforms." Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 2023. PDF
Boles, David, Daniel Waddington, and David A. Roberts. "Cxl-enabled enhanced memory functions." IEEE Micro 43.2 (2023): 58-65. PDF
Ahn, Minseon, et al. "Enabling CXL memory expansion for in-memory database management systems." Proceedings of the 18th International Workshop on Data Management on New Hardware. 2022. PDF
Yang, Qirui, et al. "Performance evaluation on cxl-enabled hybrid memory pool." 2022 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, 2022. PDF
Ha, Minho, et al. "Dynamic capacity service for improving cxl pooled memory efficiency." IEEE Micro 43.2 (2023): 39-47. PDF

Cache Coherence & Consistency

Wang, Zixuan, et al. "The Hitchhiker's Guide to Programming and Optimizing CXL-Based Heterogeneous Systems." arXiv preprint arXiv:2411.02814 (2024).PDF
Sun, Yan, et al. "Demystifying cxl memory with genuine cxl-ready systems and devices." Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 2023.PDF
Cabrera, Anthony M., Aaron R. Young, and Jeffrey S. Vetter. "Design and analysis of CXL performance models for tightly-coupled heterogeneous computing." Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions. 2022. PDF

System Software & OS Support

Maruf, Hasan Al, et al. "Tpp: Transparent page placement for cxl-enabled tiered-memory." Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 2023. PDF

Use Cases

Liu, Dong, and Yanxuan Yu. "CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving." Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2026. PDF
Jung, Myoungsoo. "Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure." arXiv preprint arXiv:2507.07223 (2025). PDF
Ji, Houxiang, et al. "Para-ksm: Parallelized Memory Deduplication with Data Streaming Accelerator." 2025 USENIX Annual Technical Conference (USENIX ATC 25). 2025. PDF
Kim, Hyungyo, et al. "LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading." Proceedings of the 52nd Annual International Symposium on Computer Architecture. 2025. PDF
Hwang, Minsoon, et al. "MOSAIC®: Chiplet Architecture Based on Die-to-Die Interface for CXL® Memory Applications and Limitations in Bandwidth, Latency, and Power." 2024 31st IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 2024. PDF
Zhou, Zhe, et al. "NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024.PDF
Wang, Jaylen, et al. "Designing cloud servers for lower carbon." 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE, 2024. PDF
Gouk, Donghyun, et al. "Breaking barriers: Expanding gpu memory with sub-two digit nanosecond latency cxl controller." Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems. 2024. PDF
Ryu, Seokhyun, et al. "System optimization of data analytics platforms using compute express link (CXL) memory." 2023 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2023. PDF
Sim, Joonseop, et al. "Computational cxl-memory solution for accelerating memory-intensive applications." IEEE Computer Architecture Letters 22.1 (2022): 5-8. PDF

Learning Resources

Talks & Videos

Articles & Blogs

Community & Ecosystem

Contributing

JSL M.S Inha university
If you want to contribute on a "CXL is All You Need", please contact wlstjr4425@gmail.com.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CXL is All You Need

Introduction

Why CXL?

1. Breaking the Memory Wall

2. Cache Coherency & Heterogeneity

3. Resource Disaggregation & Pooling

Official Specifications

CXL 1.x

CXL 2.x

CXL 3.x

CXL 4.0

PHY & Electrical Specs

CXL Device Protocols

CXL.io / CXL.cache / CXL.mem

Device Types

Type 1: Caching Devices / Accelerators

Type 2: Accelerators with Memory

Type 3: Memory Expanders

Simulation & Emulation Tools

QEMU CXL Support

Gem5 CXL Models

CXL Simulation

CXL Emulation

software

Intel CoFluent / DRAMSys

Profiling & Benchmarks

Micro-benchmarks

profiling

Application Workloads

Protocols

Research Papers

Survey & Tutorials

Memory Pooling & Disaggregation

Cache Coherence & Consistency

System Software & OS Support

Use Cases

Learning Resources

Talks & Videos

Articles & Blogs

Community & Ecosystem

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages