-
Notifications
You must be signed in to change notification settings - Fork 187
Open
Labels
Description
Summary
If pg_repack loses its connection (e.g. due to HAProxy timeout) during the initialization stage (specifically, during the copy data phase), and the --no-kill-backend flag is used, the following serious issue occurs:
- The PostgreSQL backend process continues running the data copy query (e.g.
INSERT INTO repack.table_* SELECT ...) in an active state, even though the client (pg_repack) has disconnected.
#Just for example
ERROR: query failed: SSL SYSCALL error: EOF detected
DETAIL: query was: INSERT INTO repack.table_16451 SELECT
- At this point, pg_repack begins the cleanup phase, which requires an
ACCESS EXCLUSIVElock on the target table. - However, it cannot acquire the lock because the still-running data copy process prevents it from acquiring the lock.
- Since
--no-kill-backendis used, pg_repack does not terminate the blocking backend. - As a result, all other read/write queries to the affected table are blocked, potentially for a very long time.
Reproduction Steps
- Start pg_repack on a large table with --no-kill-backend and --wait-timeout=10 (or any value).
- Use a load balancer (e.g. HAProxy) with a timeout that is shorter than the expected duration of the data copy phase
global
log stdout format raw local0
maxconn 4096
defaults
log global
mode tcp
option tcplog
timeout connect 10s
timeout client 30s
timeout server 30s
retries 3
frontend ha-frontend
bind *:5432
default_backend ha-backend
backend ha-backend
server rds-primary PUT_HERE_UOYR_URI:5432 check- Observe:
- The backend on PostgreSQL continues running the INSERT ... SELECT in an active state.
- pg_repack starts the cleanup phase and tries to acquire ACCESS EXCLUSIVE.
- It fails to acquire the lock, but does not terminate the backend due to --no-kill-backend.
- The table becomes inaccessible (locked) for all other clients until the backend completes or is manually terminated.
# start pg_repack
pid | query_duration | state | wait_event_type | query
------+-----------------+---------------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------
4418 | 00:00:02.106592 | active | IO | INSERT INTO repack.table_16451 SELECT id,user_id,status,score,flag,event_time,notes,meta FROM ONLY public.big_data_tiny
4420 | 00:00:02.133738 | idle in transaction | Client | LOCK TABLE public.big_data_tiny IN ACCESS SHARE MODE
##### On sudden connection loss
pid | query_duration | state | wait_event_type | query
------+-----------------+--------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------
4273 | 00:01:39.248618 | active | IO | INSERT INTO repack.table_16451 SELECT id,user_id,status,score,flag,event_time,notes,meta FROM ONLY public.big_data_tiny
4370 | 00:00:00.009245 | active | Lock | LOCK TABLE public.big_data_tiny IN ACCESS EXCLUSIVE MODE