Skip to content

Fix session lock leak in ALTER DATABASE SET TABLESPACE for utility-mode backends #1626

@yjhjstz

Description

@yjhjstz

Bug Description

Running src/test/recovery/t/033_replay_tsp_drops.pl TAP test fails with:

FATAL: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: "proc.c", Line: 1067)

The primary server crashes during backend process exit because a heavyweight lock is not released.

Root Cause

Session lock leak in ALTER DATABASE ... SET TABLESPACE for utility-mode backends.

movedb() acquires a session-level AccessExclusiveLock on the database via MoveDbSessionLockAcquire(). This lock persists across transaction boundaries by design.

The release happens in CommitTransaction() (src/backend/access/transam/xact.c):

if (Gp_role == GP_ROLE_DISPATCH || IS_SINGLENODE())
    MoveDbSessionLockRelease();

However, in standalone PostgreSQL instances (used by TAP tests, pg_basebackup recovery, etc.), Gp_role == GP_ROLE_UTILITY and IS_SINGLENODE() == false, so MoveDbSessionLockRelease() is never called. The session lock leaks and triggers the assertion at process exit.

GDB Confirmation from Core Dump

(gdb) print Gp_role
$1 = GP_ROLE_UTILITY

(gdb) print gp_internal_is_singlenode
$2 = false

(gdb) print sessionLockMoveDbOid
$3 = 16394    # moveme_db's OID - lock never released

# The leaked lock:
(gdb) print *(LOCK *)0x7f9b5bc12a80
tag = {locktag_field1=0, locktag_field2=1262, locktag_field3=16394,
       locktag_type=LOCKTAG_OBJECT}  # AccessExclusiveLock on pg_database entry

# LOCALLOCK shows session-level ownership (owner=NULL):
(gdb) print lockOwners[0]
$4 = {owner = 0x0, nLocks = 1}

Secondary Issue

destroy_tablespace_directories() calls readlink() on in-place tablespace directories (created with allow_in_place_tablespaces=on). Since these are directories not symlinks, readlink() fails with EINVAL and produces spurious log messages:

LOG: could not read symbolic link "pg_tblspc/16385": Invalid argument

Fix

  1. Session lock leak: Add Gp_role == GP_ROLE_UTILITY to the release condition in CommitTransaction():
if (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_UTILITY ||
    IS_SINGLENODE())
    MoveDbSessionLockRelease();
  1. Spurious log: Skip logging when readlink() fails with EINVAL (expected for directories):
if (errno != EINVAL)
    ereport(redo ? LOG : ERROR, ...);

How to Reproduce

# Enable TAP tests in configure, then:
make -C src/test/recovery installcheck PROVE_TESTS="t/033_replay_tsp_drops.pl"

The test executes ALTER DATABASE moveme_db SET TABLESPACE target_ts followed by other statements on a standalone primary/standby pair. The leaked session lock causes the primary to crash on backend exit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions