Skip to content

docker build fails when dockerd data-root is on NVMe Direct Disk #609

@pderrier

Description

@pderrier

Describe the bug
On Windows Server Kubernetes nodes:

  • When Docker’s data-root is on SSD (ATA bus)docker build works correctly.
  • When Docker’s data-root is on NVMe Direct Diskdocker build consistently fails with:
hcsshim::System::Start: failure in a Windows system call:
The process cannot access the file because it is being used by another process.

At the same time, Kubernetes workloads using containerd run fine with NVMe storage (pods pull images and start normally).

The issue is isolated to the dockerd HostProcess build workflow (image builds).


To Reproduce
Steps to reproduce the behavior:

  1. Configure Docker HostProcess daemon.json:
    {
      "data-root": "G:\\docker",
      "exec-opts": ["isolation=process"],
      "debug": true
    }
  2. Restart vmcompute, hns, and docker services.
  3. Run:
    docker build -t test .
  4. Observed result:
    hcsshim::System::Start: failure in a Windows system call:
    The process cannot access the file because it is being used by another process.
    
  5. Reconfigure Docker with data-root on SSD (C:\docker-data).
  6. Restart services.
  7. Re-run build.
  8. Expected result: build succeeds.

Expected behavior
docker build should succeed with data-root on NVMe Direct Disks, just as it does on SSD.


Configuration:

  • Edition: Windows Server 2022 Datacenter Core (Azure AKS Windows nodepool)
  • Base Image: mcr.microsoft.com/windows/servercore:ltsc2022
  • Container engine: Docker (dockerd HostProcess), containerd (for Kubernetes runtime)
  • Container engine version: Docker 20.10.x / 24.x (reproducible across versions)

Disk info (NVMe drive G:):

File System   : NTFS
Bytes Per Cluster : 4096
Compression/Dedup/Encryption : disabled
HealthStatus  : Healthy
OperationalStatus : OK

Additional context

  • Kubernetes pods (containerd) run fine with NVMe Direct Disk storage.
  • dockerd HostProcess fails only during builds when data-root is on NVMe.
  • Issue occurs at the moment dockerd/HCS tries to create or mount scratch.vhdx in windowsfilter.

Tests attempted (all failed on NVMe data-root):

  • Disabled Windows Defender completely (Tamper Protection off, GPO/registry, exclusions applied)
  • Disabled Windows Search/Indexing
  • Full node reboots after each change
  • Fresh empty G:\\docker directory (renamed previous one to .bak)
  • BuildKit disabled (features.buildkit=false)
  • DOCKER_TMPDIR set to SSD and NVMe (both worked fine → NVMe usable outside windowsfilter)
  • Checked open handles with Sysinternals handle.exe → only dockerd DB files, no external locks
  • Checked Containers-Wcifs and BindFlt logs → no SHARING VIOLATION entries
  • Verified NTFS 4K cluster size (Bytes Per Cluster = 4096)

Workarounds:

  • Place data-root on SSD → builds succeed.
  • Use NVMe only for:
    • DOCKER_TMPDIR (build temporary files)
    • Bind mounts / cache directories
  • Optionally:
    mklink /J G:\\docker C:\\docker-data
  • ⚠️ Offloading builds to Linux BuildKit does not work for Windows-native images (e.g., servercore, nanoserver).

Impact:

  • Blocks Windows container builds on nodes with NVMe-only storage.
  • CI/CD pipelines cannot run docker build locally with dockerd HostProcess if data-root is NVMe.
  • Forces SSD allocation for data-root, which reduces performance and complicates infrastructure design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNew and needs attention

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions