-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Bug Description
Running src/test/recovery/t/033_replay_tsp_drops.pl TAP test fails with:
FATAL: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: "proc.c", Line: 1067)
The primary server crashes during backend process exit because a heavyweight lock is not released.
Root Cause
Session lock leak in ALTER DATABASE ... SET TABLESPACE for utility-mode backends.
movedb() acquires a session-level AccessExclusiveLock on the database via MoveDbSessionLockAcquire(). This lock persists across transaction boundaries by design.
The release happens in CommitTransaction() (src/backend/access/transam/xact.c):
if (Gp_role == GP_ROLE_DISPATCH || IS_SINGLENODE())
MoveDbSessionLockRelease();However, in standalone PostgreSQL instances (used by TAP tests, pg_basebackup recovery, etc.), Gp_role == GP_ROLE_UTILITY and IS_SINGLENODE() == false, so MoveDbSessionLockRelease() is never called. The session lock leaks and triggers the assertion at process exit.
GDB Confirmation from Core Dump
(gdb) print Gp_role
$1 = GP_ROLE_UTILITY
(gdb) print gp_internal_is_singlenode
$2 = false
(gdb) print sessionLockMoveDbOid
$3 = 16394 # moveme_db's OID - lock never released
# The leaked lock:
(gdb) print *(LOCK *)0x7f9b5bc12a80
tag = {locktag_field1=0, locktag_field2=1262, locktag_field3=16394,
locktag_type=LOCKTAG_OBJECT} # AccessExclusiveLock on pg_database entry
# LOCALLOCK shows session-level ownership (owner=NULL):
(gdb) print lockOwners[0]
$4 = {owner = 0x0, nLocks = 1}
Secondary Issue
destroy_tablespace_directories() calls readlink() on in-place tablespace directories (created with allow_in_place_tablespaces=on). Since these are directories not symlinks, readlink() fails with EINVAL and produces spurious log messages:
LOG: could not read symbolic link "pg_tblspc/16385": Invalid argument
Fix
- Session lock leak: Add
Gp_role == GP_ROLE_UTILITYto the release condition inCommitTransaction():
if (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_UTILITY ||
IS_SINGLENODE())
MoveDbSessionLockRelease();- Spurious log: Skip logging when
readlink()fails withEINVAL(expected for directories):
if (errno != EINVAL)
ereport(redo ? LOG : ERROR, ...);How to Reproduce
# Enable TAP tests in configure, then:
make -C src/test/recovery installcheck PROVE_TESTS="t/033_replay_tsp_drops.pl"The test executes ALTER DATABASE moveme_db SET TABLESPACE target_ts followed by other statements on a standalone primary/standby pair. The leaked session lock causes the primary to crash on backend exit.