Skip to content

API hangs if one of the mounted NFS servers goes down #4334

@sidoruka

Description

@sidoruka

Requests to API hang for all NFS filesystems if one of the mounted NFS servers goes down. Most likely this is caused by the global lock used for all the filesystems.

┌─────────────────────────────────────────────────────────────────────────┐
│  /opt/api/file-systems                                                  │
│  ├── share-A  ──────►  NFS Server 1 (live)   ✓                          │
│  ├── share-B  ──────►  NFS Server 2 (live)   ✓                          │
│  └── share-C  ──────►  NFS Server 3 (dead)   ✗                          │
└─────────────────────────────────────────────────────────────────────────┘
                                      │
                    One request touches share-C
                                      ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  NFSStorageMounter                                                      │
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  synchronized File mount(NFSDataStorage)   ◄── single global lock   │
│   │       rootMount = .../share-C                                   │   │
│   │       if (!rootMount.exists())  ◄── stat() on dead mount = HANG │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                          │                                              │
│              Thread 1 blocks here forever                               │
└─────────────────────────────────────────────────────────────────────────┘
                                      │
         Other requests (share-A, share-B) call mount()
                                      ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Thread 2: mount(storage_A)  ──►  waiting for lock  ──►  never runs     │
│  Thread 3: mount(storage_B)  ──►  waiting for lock  ──►  never runs     │
│  ...                                                                    │
│  Result: All NFS operations (and thus API) appear hung                  │
└─────────────────────────────────────────────────────────────────────────┘

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions