Skip to content

AOCS Sampling Fails with Large Objects (BLOB) #349

@Sanikadze

Description

@Sanikadze

Bug Report: AOCS Sampling Fails with Large Objects (BLOB)

Summary

Greenplum 6.29.0, ANALYZE and auto_stats fail on AOCS tables containing large objects (TEXT/JSONB columns) with error:

ERROR: Advance not called on large datum stream object (datumstream.c:276)

Root Cause

Problem location: src/backend/access/aocs/aocsam.c, function aocs_gettuple_column()

if (chkvisimap && !isSnapshotAny && !AppendOnlyVisimap_IsVisible(&scan->visibilityMap, &aotid))
{
    ret = false;
    goto out;  // ← Returns WITHOUT calling datumstreamread_advance()
}

datumstreamread_find(ds, rownum - ds->blockFirstRowNum);  //  Never reached

When a BLOB block is read, largeObjectState is set to HaveAoContent. If aocs_gettuple_column() returns early (visibility check or other reasons), datumstreamread_advance() is never called, leaving largeObjectState = HaveAoContent.

On the next sample row iteration, datumstreamread_nth() is called (line 1015 in elog DEBUG2), which throws error when largeObjectState == HaveAoContent.

Reproduction

  1. Create AOCS table with TEXT/JSONB column containing large values (>block size)
  2. Enable auto_stats: SET gp_autostats_mode = 'on_change';
  3. INSERT/COPY data into the table
  4. Error occurs during auto_stats or manual ANALYZE

Workaround

Disable auto_stats:

SET gp_autostats_mode = 'none';

Then run ANALYZE manually using legacy method or skip ANALYZE on affected tables.

Suggested Fix

In aocs_gettuple_column(), call datumstreamread_advance() even for invisible rows to properly transition largeObjectState:

if (chkvisimap && !isSnapshotAny && !AppendOnlyVisimap_IsVisible(&scan->visibilityMap, &aotid))
{
    // Advance position for large objects to reset state
    if (ds->largeObjectState == DatumStreamLargeObjectState_HaveAoContent)
        datumstreamread_advance(ds);

    ret = false;
    goto out;
}

Affected Tables

Tables with:

  • appendonly=true, orientation=column
  • TEXT, JSONB, or other varlena columns with large values (BLOBs)

Stack Trace

acquire_sample_rows -> analyze_rel -> vacuum -> auto_stats
datumstreamread_nthlarge (datumstream.c:276)

Related Files

  • src/backend/access/aocs/aocsam.c - aocs_gettuple_column(), aocs_gettuple()
  • src/backend/utils/datumstream/datumstream.c - datumstreamread_nthlarge() (line 276)
  • src/include/utils/datumstream.h - DatumStreamLargeObjectState enum

Status in other branches

Bug is NOT fixed in any branch (checked 2025-01-21):

  • origin/master - NOT FIXED (same goto out pattern)
  • origin/OPENGPDB_STABLE - NOT FIXED
  • origin/OPENGPDB_6_29_STABLE - NOT FIXED

The problematic goto out without calling datumstreamread_advance() exists in all branches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions