-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Bug Report: AOCS Sampling Fails with Large Objects (BLOB)
Summary
Greenplum 6.29.0, ANALYZE and auto_stats fail on AOCS tables containing large objects (TEXT/JSONB columns) with error:
ERROR: Advance not called on large datum stream object (datumstream.c:276)
Root Cause
Problem location: src/backend/access/aocs/aocsam.c, function aocs_gettuple_column()
if (chkvisimap && !isSnapshotAny && !AppendOnlyVisimap_IsVisible(&scan->visibilityMap, &aotid))
{
ret = false;
goto out; // ← Returns WITHOUT calling datumstreamread_advance()
}
datumstreamread_find(ds, rownum - ds->blockFirstRowNum); // Never reachedWhen a BLOB block is read, largeObjectState is set to HaveAoContent. If aocs_gettuple_column() returns early (visibility check or other reasons), datumstreamread_advance() is never called, leaving largeObjectState = HaveAoContent.
On the next sample row iteration, datumstreamread_nth() is called (line 1015 in elog DEBUG2), which throws error when largeObjectState == HaveAoContent.
Reproduction
- Create AOCS table with TEXT/JSONB column containing large values (>block size)
- Enable auto_stats:
SET gp_autostats_mode = 'on_change'; - INSERT/COPY data into the table
- Error occurs during auto_stats or manual ANALYZE
Workaround
Disable auto_stats:
SET gp_autostats_mode = 'none';Then run ANALYZE manually using legacy method or skip ANALYZE on affected tables.
Suggested Fix
In aocs_gettuple_column(), call datumstreamread_advance() even for invisible rows to properly transition largeObjectState:
if (chkvisimap && !isSnapshotAny && !AppendOnlyVisimap_IsVisible(&scan->visibilityMap, &aotid))
{
// Advance position for large objects to reset state
if (ds->largeObjectState == DatumStreamLargeObjectState_HaveAoContent)
datumstreamread_advance(ds);
ret = false;
goto out;
}Affected Tables
Tables with:
appendonly=true, orientation=column- TEXT, JSONB, or other varlena columns with large values (BLOBs)
Stack Trace
acquire_sample_rows -> analyze_rel -> vacuum -> auto_stats
datumstreamread_nthlarge (datumstream.c:276)
Related Files
src/backend/access/aocs/aocsam.c- aocs_gettuple_column(), aocs_gettuple()src/backend/utils/datumstream/datumstream.c- datumstreamread_nthlarge() (line 276)src/include/utils/datumstream.h- DatumStreamLargeObjectState enum
Status in other branches
Bug is NOT fixed in any branch (checked 2025-01-21):
origin/master- NOT FIXED (same goto out pattern)origin/OPENGPDB_STABLE- NOT FIXEDorigin/OPENGPDB_6_29_STABLE- NOT FIXED
The problematic goto out without calling datumstreamread_advance() exists in all branches.