A comprehensive, curated repository for Compute Express Link (CXL) Covering Research Papers, Specifications, Simulation/Emulation tools, Benchmarks, and Resources for Type 1, 2, & 3 Devices.
Inspired by the transformative impact of the "Attention Is All You Need" paper in AI, "CXL is All You Need" aims to be the definitive hub for the next revolution in computer architecture: Compute Express Link (CXL).
As data centers evolve towards disaggregated and heterogeneous computing, CXL has emerged as the industry-standard interconnect offering high-bandwidth, low-latency connectivity with memory coherency. This repository aggregates scattered resources into a single, organized knowledge base for researchers, system architects, and developers.
Whether you are looking for the latest IEEE/ACM research papers, setting up a QEMU-based simulation environment, exploring Type 3 memory expansion, or benchmarking latency and throughput, you will find the essential resources here.
The traditional server architecture is facing a bottleneck known as the "Memory Wall" and efficient resource utilization challenges. CXL addresses these critical issues through three main pillars:
Modern workloads (AI/ML, In-memory DB) demand more memory capacity and bandwidth than traditional DRAM slots can provide.
- Expansion: CXL allows CPUs to access memory attached via PCIe-like interfaces with near-DRAM latency.
- Bandwidth: It enables scaling memory bandwidth independently of the main memory channels.
In a heterogeneous computing era (CPU, GPU, FPGA, SmartNIC), efficient data sharing is key.
- Coherency: CXL maintains cache coherency between the host CPU and accelerators without complex software overhead.
- Unified Interface: It supports three protocols (CXL.io, CXL.cache, CXL.mem) to cover various use cases from caching devices (Type 1) to memory buffers (Type 3).
CXL enables the shift from server-centric to data-centric architecture.
- Memory Pooling: Unused memory in one server can be pooled and assigned to another host dynamically, reducing stranded memory and TCO (Total Cost of Ownership).
- Composable Infrastructure: Hardware resources can be composed on-the-fly based on workload requirements.
- CXL 1.0 and 1.1 represent the nascent stage of the technology, based on the PCIe 5.0 physical layer (32 GT/s). A key characteristic of this phase is that it is limited to Single Node configurations. That is, it supports only a 1:1 connection structure between one host processor and one CXL device. Therefore, CXL 1.1 is primarily used for Direct Attached memory expansion or simple accelerator connections without switches.
-
CXL 2.0, announced in 2020, is a pivotal specification driving commercial adoption. While retaining the PCIe 5.0 (32 GT/s) physical layer, it introduced CXL Switching to significantly enhance system flexibility.
-
Switching & Pooling: Single-level switches allow a host to access multiple devices, or multiple hosts to share a memory pool. This is the core technology enabling dynamic resource allocation.
-
Multi-Logical Device (MLD): Supports partitioning a single physical CXL memory device into up to 16 logical devices, enabling different hosts to use distinct partitions simultaneously.
-
Security: CXL IDE (Integrity and Data Encryption) was introduced to ensure data integrity and confidentiality at the link level.
-
Global Persistent Flush: Standardized functionality to safely store data during system power loss.
-
CXL 3.0, released in August 2022, achieved a leap in data transfer speed and scalability. Based on the PCIe 6.0 physical layer, bandwidth doubled to 64 GT/s, and the signaling method changed from NRZ to PAM4 (Pulse Amplitude Modulation 4-level).
-
Fabric Capabilities: Going beyond single-level switching, it supports Multi-level Switching, enabling complex fabric topologies like Spine-Leaf. This allows for large-scale system configurations extending beyond the rack.
-
Peer-to-Peer (P2P): Supports direct data exchange between devices without passing through the host CPU. This maximizes efficiency for GPU-to-GPU data transfer or storage-to-memory communication.
-
Memory Sharing: While CXL 2.0 pooling involved "partitioning," CXL 3.0 supports true "sharing." Hardware-based coherency management allows multiple hosts to access and modify the same memory address space (Cache Line) concurrently.
-
CXL 3.1, released in late 2023, further strengthens fabric scalability and security.
-
Port Based Routing (PBR): Improved routing mechanisms for massive fabric expansion.
-
Trusted Security Protocol (TSP): Extends security scope to accelerators and memory devices, supporting virtualization-based Trusted Execution Environments (TEE) and enabling safe processing of Confidential Computing workloads.
-
Extended Metadata: Expanded metadata fields to exchange more sophisticated state information between hosts and devices.
-
The physical layer of CXL relies heavily on PCIe technology.
-
Signal & Speed: CXL 1.x/2.0 uses PCIe 5.0 (32 GT/s), while CXL 3.x uses PCIe 6.0 (64 GT/s). The PAM4 signaling introduced in CXL 3.0 transfers twice the data per clock but requires stronger Forward Error Correction (FEC) to ensure signal integrity.
-
Flit Structure: CXL uses fixed-size Flits (Flow Control Units) for low-latency communication. CXL 1.1/2.0 uses 68-byte Flits, whereas CXL 3.0 introduces 256-byte Flits for high bandwidth and FEC support. However, it selectively supports 68-byte mode for latency-sensitive applications.
-
Latency: By bypassing the heavy block layer of PCIe and optimizing the transaction layer, CXL achieves significantly lower latency than standard PCIe. It typically adds only about 20-40ns of latency compared to local DDR memory, exhibiting performance characteristics similar to NUMA remote socket access.
-
Although the CXL interface physically shares a single link, it logically multiplexes three protocols.
-
CXL.io: Functionally nearly identical to PCIe. Handles device discovery, enumeration, configuration, interrupts, and non-coherent Load/Store. All CXL devices must support CXL.io.
-
CXL.cache: Allows the Device to access the Host's (CPU) memory coherently. This enables devices to snoop the CPU cache or read/write data directly, preventing data inconsistency.
-
CXL.mem: Allows the Host (CPU) to access the Device's attached memory coherently. The CPU can use the device memory like system memory via standard Load/Store instructions without separate drivers or DMA.
-
Type 1: Caching Devices / Accelerators
-
Protocols: CXL.io + CXL.cache
-
Characteristics: Accelerators that lack local memory or do not expose it to the host. They use CXL.cache to utilize the host CPU's memory as their cache.
-
Key Devices: SmartNICs, Atomic accelerators. For example, a SmartNIC can process network packets and write the results directly to the receive queue (Ring Buffer) in host memory via CXL.cache.
-
Protocols: CXL.io + CXL.cache + CXL.mem
-
Characteristics: Accelerators with their own high-performance memory (HBM, GDDR, etc.). The CPU can push data to the accelerator memory via CXL.mem, and the accelerator can access system memory via CXL.cache.
-
Bias Mode: Supports two modes for memory coherency optimization.
-
Host Bias: CPU manages coherency. Device memory is treated like standard system memory.
-
Device Bias: Device manages coherency. Reduces overhead when the accelerator intensively accesses its local memory.
Key Devices: GPUs, FPGAs, AI ASICs.
-
Protocols: CXL.io + CXL.mem
-
Characteristics: The most actively adopted type currently, used to expand system memory capacity and bandwidth. Since it does not support CXL.cache, the device does not cache host memory, but the host uses the device's memory (Host-Managed Device Memory, HDM) as main memory.
-
OS View: The OS sees Type 3 devices as "CPU-less NUMA nodes." Existing NUMA management policies can be applied directly.
-
Key Devices: CXL Memory Buffers, Memory Expander Cards (Add-in Cards), EDSFF (E1.S/E3.S) memory modules.
- QEMU CXL Support
- QEMU offers the most accessible emulation environment for CXL system software development and driver testing. Key CXL 2.0 features like Type 3 devices, switches, and interleaving are included in the mainline.link
- Gem5 provides cycle-accurate simulation for hardware architectures, used for precise analysis of CXL performance impacts. While QEMU focuses on functional emulation, Gem5 excels in timing and latency modeling.
- CXLSim & gem5-CXL: Researchers have released repositories like 'CXLSim' or 'gem5-CXL' extending Gem5 to model CXL transaction and link layer latencies. These models simulate FLIT packing/unpacking overheads, switch delays (e.g., tens of ns per hop), and PCIe bus contention.
- Usage: In Full System mode, booting a real Linux kernel and running benchmarks allows analysis of how CXL memory latency affects overall system IPC (Instructions Per Cycle) and application performance.
- gem5-CXL link
- Yang, Yiwei, et al. "CXLMemSim: A pure software simulated CXL. mem for performance characterization." arXiv preprint arXiv:2303.06153 (2023). PDF github
- Kim, Seongbeom, et al. "CXLSim: A Simulator for CXL Memory Expander." 2025 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2025. PDF
- OCEAN – Open-source CXL Emulation at Hyperscale Architecture and Networking github
- Gioiosa, Roberto, et al. "Hardware-Software Co-Development for Emerging CXL Architectures." PDF
- MemMachine, an open-source memory layer for advanced AI agents. github
- Scalable Memory Development Kit github
- Famfs Shared Memory Filesystem Framework github
- Intel CoFluent: A modeling tool for large-scale data center or cluster-level system design. It is useful for visualizing traffic flows, bottlenecks, and resource utilization in CXL memory pooling scenarios before actual hardware deployment. link1, link2
- DRAMSys: A SystemC TLM-2.0 based DRAM subsystem simulator, which has recently added CXL.mem protocol support. It models memory media (DDR5, HBM) behavior behind the CXL controller precisely, allowing analysis of the impact of refresh cycles or bank conflicts on overall CXL performance. link1, link2
- Weisgut, Marcel, et al. "CXL-Bench: Benchmarking Shared CXL Memory Access." Proceedings of the VLDB Endowment. ISSN 2150: 8097. PDF
- Liu, Jinshu, et al. "Systematic cxl memory characterization and performance analysis at scale." Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 2025. PDF
- Liu, Jinshu, et al. "Dissecting cxl memory performance at scale: Analysis, modeling, and optimization." arXiv preprint arXiv:2409.14317 (2024). PDF
- Ji, Houxiang, et al. "Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024. PDF
- Zeng, Jianping, et al. "Performance characterizations and usage guidelines of samsung cxl memory module hybrid prototype." arXiv preprint arXiv:2503.22017 (2025). PDF
- Wang, Xi, et al. "Exploring and evaluating real-world cxl: use cases and system adoption." arXiv preprint arXiv:2405.14209 (2024). PDF
- Lee, KyungSoo, et al. "Improving key-value cache performance with heterogeneous memory tiering: A case study of cxl-based memory expansion." IEEE Micro (2024). PDF
- Lee, Suyeon, et al. "Offloading to CXL-based Computational Memory." arXiv preprint arXiv:2512.04449 (2025). PDF
- Das Sharma, Debendra, Robert Blankenship, and Daniel Berger. "An introduction to the compute express link (cxl) interconnect." ACM Computing Surveys 56.11 (2024): 1-37. PDF
- Sharma, Debendra Das. "Compute Express Link®: An open industry-standard interconnect enabling heterogeneous data-centric computing." 2022 IEEE Symposium on High-Performance Interconnects (HOTI). IEEE, 2022. PDF
- Zhong, Yuhong, et al. "Oasis: Pooling PCIe Devices Over CXL to Boost Utilization." Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles. 2025. PDF
- Park, Junhyeok, et al. "Maximizing Interconnect Bandwidth and Efficiency in Nonvolatile Memory, Express-Based Key-Value Solid-State Devices With Fine-Grained Value Transfer." IEEE Micro 45.6 (2025): 82-90. PDF
- Hermes, Jon, et al. "Udon: A case for offloading to general purpose compute on cxl memory." arXiv preprint arXiv:2404.02868 (2024). PDF
- Wu, Jianbo, et al. "Performance Study of CXL Memory Topology." Proceedings of the International Symposium on Memory Systems. 2024. PDF
- Ham, Hyungkyu, et al. "Low-overhead general-purpose near-data processing in cxl memory expanders." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024. PDF
- Gouk, Donghyun, et al. "Memory pooling with cxl." IEEE Micro 43.2 (2023): 48-57. PDF
- Li, Huaicheng, et al. "Pond: Cxl-based memory pooling systems for cloud platforms." Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 2023. PDF
- Boles, David, Daniel Waddington, and David A. Roberts. "Cxl-enabled enhanced memory functions." IEEE Micro 43.2 (2023): 58-65. PDF
- Ahn, Minseon, et al. "Enabling CXL memory expansion for in-memory database management systems." Proceedings of the 18th International Workshop on Data Management on New Hardware. 2022. PDF
- Yang, Qirui, et al. "Performance evaluation on cxl-enabled hybrid memory pool." 2022 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, 2022. PDF
- Ha, Minho, et al. "Dynamic capacity service for improving cxl pooled memory efficiency." IEEE Micro 43.2 (2023): 39-47. PDF
- Wang, Zixuan, et al. "The Hitchhiker's Guide to Programming and Optimizing CXL-Based Heterogeneous Systems." arXiv preprint arXiv:2411.02814 (2024).PDF
- Sun, Yan, et al. "Demystifying cxl memory with genuine cxl-ready systems and devices." Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 2023.PDF
- Cabrera, Anthony M., Aaron R. Young, and Jeffrey S. Vetter. "Design and analysis of CXL performance models for tightly-coupled heterogeneous computing." Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions. 2022. PDF
- Maruf, Hasan Al, et al. "Tpp: Transparent page placement for cxl-enabled tiered-memory." Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 2023. PDF
- Liu, Dong, and Yanxuan Yu. "CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving." Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2026. PDF
- Jung, Myoungsoo. "Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure." arXiv preprint arXiv:2507.07223 (2025). PDF
- Ji, Houxiang, et al. "Para-ksm: Parallelized Memory Deduplication with Data Streaming Accelerator." 2025 USENIX Annual Technical Conference (USENIX ATC 25). 2025. PDF
- Kim, Hyungyo, et al. "LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading." Proceedings of the 52nd Annual International Symposium on Computer Architecture. 2025. PDF
- Hwang, Minsoon, et al. "MOSAIC®: Chiplet Architecture Based on Die-to-Die Interface for CXL® Memory Applications and Limitations in Bandwidth, Latency, and Power." 2024 31st IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 2024. PDF
- Zhou, Zhe, et al. "NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024.PDF
- Wang, Jaylen, et al. "Designing cloud servers for lower carbon." 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE, 2024. PDF
- Gouk, Donghyun, et al. "Breaking barriers: Expanding gpu memory with sub-two digit nanosecond latency cxl controller." Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems. 2024. PDF
- Ryu, Seokhyun, et al. "System optimization of data analytics platforms using compute express link (CXL) memory." 2023 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2023. PDF
- Sim, Joonseop, et al. "Computational cxl-memory solution for accelerating memory-intensive applications." IEEE Computer Architecture Letters 22.1 (2022): 5-8. PDF
- CXL consortium
- The future of Memory and Storage
- Synopsys Compute Express Link (CXL) IP Solutions
- Controller IP for CXL
- Panmnesia CXL switch
- Xcena CXL SoC
- AMD Versal CXL FPGA
- Intel CXL FPGA
- OCP Global summit 2025
- Memverge
- The future of memory and storage
- LIQID Composable Memory Solutions
- CXL Near-Memory Compute and Expansion
- Xconn cxl switch
- JSL M.S Inha university
- If you want to contribute on a "CXL is All You Need", please contact wlstjr4425@gmail.com.
- MIT
