This repository was archived by the owner on Mar 30, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 11
This repository was archived by the owner on Mar 30, 2023. It is now read-only.
Persistent buffers using 100% of available space causes the job to fail #113
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
We can use 99% (technically 100% minus one unit of granularity) of available space however when we create a buffer to consume the remaining space, it looks to be created successfully but then squeue reports:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
20 debug create-p test2 PD 0:00 1 (burst_buffer/datawarp: setup: panic: runtime error: index out of range [recovered]
panic: runtime error: index out of range
goroutine 1 [running]:
main.main.func1()
/home/circleci/data-acc/cmd/dacctl/main.go:187 +0xb9
panic(0xb0d7c0, 0x12394c0)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/RSE-Cambridge/data-acc/internal/pkg/dacctl/workflow_impl.sessionFacade.doAllocationAndWriteSession(0xce3440, 0xc000176d00, 0xcdf780, 0xc000179560, 0xce0e80, 0xc00017c750, 0xcccf
60, 0x126c0c8, 0x7ffff8268d30, 0x2, ...)
/home/circleci/data-acc/internal/pkg/dacctl/workflow_impl/session.go:166 +0x6ee
github.com/RSE-Cambridge/data-acc/internal/pkg/dacctl/workflow_impl.sessionFacade.CreateSession.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/circleci/data-acc/internal/pkg/dacctl/workflow_impl/session.go:110 +0xfb
github.com/RSE-Cambridge/data-acc/internal/pkg/dacctl/workflow_impl.sessionFacade.submitJob(0xce3440, 0xc000176d00, 0xcdf780, 0xc000179560, 0xce0e80, 0xc00017c750, 0xcccf60, 0x126c0c8, 0x7
ffff8268d30, 0x2, ...)
/home/circleci/data-acc/internal/pkg/dacctl/workflow_impl/session.go:53 +0x3e8
github.com/RSE-Cambridge/data-acc/internal/pkg/dacctl/workflow_impl.sessionFacade.CreateSession(0xce3440, 0xc000176d00, 0xcdf780, 0xc000179560, 0xce0e80, 0xc00017c750, 0xcccf60, 0x126c0c8,
0x7ffff8268d30, 0x2, ...)
/home/circleci/data-acc/internal/pkg/dacctl/workflow_impl/session.go:107 +0x1b4
github.com/RSE-Cambridge/data-acc/internal/pkg/dacctl/actions_impl.(*dacctlActions).CreatePerJobBuffer(0xc000179580, 0xcdcac0, 0xc0000e71e0, 0xc000179580, 0x0)
/home/circleci/data-acc/internal/pkg/dacctl/actions_impl/job.go:93 +0x6f5
main.setup(0xc0000e71e0, 0x0, 0x0)
/home/circleci/data-acc/cmd/dacctl/actions.go:92 +0xa3
github.com/urfave/cli.HandleAction(0xade2a0, 0xc15140, 0xc0000e71e0, 0xc0000e71e0, 0x0)
/go/pkg/mod/github.com/urfave/cli@v1.21.0/app.go:514 +0xbe
github.com/urfave/cli.Command.Run(0xbe38a2, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0a8db, 0x47, 0x0, ...)
/go/pkg/mod/github.com/urfave/cli@v1.21.0/command.go:171 +0x4d2
github.com/urfave/cli.(*App).Run(0xc0000dc540, 0xc0000b4000, 0xe, 0xf, 0x0, 0x0)
/go/pkg/mod/github.com/urfave/cli@v1.21.0/app.go:265 +0x733
main.runCli(0xc0000b4000, 0xf, 0xf, 0xbe21f8, 0x1)
/home/circleci/data-acc/cmd/dacctl/main.go:172 +0x1255
main.main()
/home/circleci/data-acc/cmd/dacctl/main.go:194 +0x1f1
)
At this point everything looks normal with the buffers with now 0 FreeSpace as expected:
Name=datawarp DefaultPool=default Granularity=1600GiB TotalSpace=24000GiB FreeSpace=0 UsedSpace=24000GiB
Flags=EnablePersistent,PrivateData
StageInTimeout=3600 StageOutTimeout=3600 ValidateTimeout=5 OtherTimeout=3600
GetSysState=/usr/local/bin/dacctl
GetSysStatus=/usr/local/bin/dacctl
Allocated Buffers:
Name=small CreateTime=2019-10-02T14:24:55 Pool=default Size=3200GiB State=allocated UserID=test2(1002)
Name=full CreateTime=2019-10-02T14:23:14 Pool=default Size=20800GiB State=allocated UserID=test2(1002)
Per User Buffer Use:
UserID=test2(1002) Used=24000GiB
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working