Skip to content

[VL] Memory Leak possibly TableScan #9456

@nimesh1601

Description

@nimesh1601

Backend

VL (Velox)

Bug description

There seems to be a memory leak from Table Scan
Got this response from Velox team : Velox Issue

Sample query
WITH

cdl_summary AS (
SELECT
user_id,
CASE
WHEN <TRACKING_LABEL_COL> = 'store_front' THEN 'storefront'
ELSE <TRACKING_LABEL_COL>
END AS source,

SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_7D   AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_7d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_14D  AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_14d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_28D  AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_28d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_56D  AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_56d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_112D AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_112d,
 
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_7D   AND :END_DATE THEN 1 ELSE 0 END) AS click_count_7d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_14D  AND :END_DATE THEN 1 ELSE 0 END) AS click_count_14d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_28D  AND :END_DATE THEN 1 ELSE 0 END) AS click_count_28d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_56D  AND :END_DATE THEN 1 ELSE 0 END) AS click_count_56d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_112D AND :END_DATE THEN 1 ELSE 0 END) AS click_count_112d,
 
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_7D   AND :END_DATE THEN 1 ELSE 0 END) AS order_count_7d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_14D  AND :END_DATE THEN 1 ELSE 0 END) AS order_count_14d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_28D  AND :END_DATE THEN 1 ELSE 0 END) AS order_count_28d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_56D  AND :END_DATE THEN 1 ELSE 0 END) AS order_count_56d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_112D AND :END_DATE THEN 1 ELSE 0 END) AS order_count_112d,
 
AVG(CASE 
      WHEN name = 'marketplace_scrolled'
       AND datestr BETWEEN :START_7D AND :END_DATE THEN 0
      WHEN name IN ('feed_item_card_scrolled','feed_item_dish_card_scrolled')
       AND datestr BETWEEN :START_7D AND :END_DATE
       THEN feed.display_item_position
      ELSE NULL
    END) AS avg_scroll_depth_7d,
 
 
MAX(CASE WHEN name IN (<INTERACTION_EVENTS>)
         AND datestr BETWEEN :START_112D AND :END_DATE
     THEN epoch_ms ELSE NULL END) AS latest_interaction_time

FROM <SCHEMA_CDL>.<TABLE_CDL> cdl
WHERE TRUE
AND datestr BETWEEN :START_112D AND :END_DATE
AND name IN (<IMPRESSION_EVENTS>, <CLICK_EVENTS>, <ORDER_EVENTS>)
AND COALESCE(<SESSION_ID_COL>, user_id) IS NOT NULL
AND is_first_event = TRUE
AND user_id IS NOT NULL AND user_id <> ''
AND <FEED_CONTEXT_COL> IN ('home','vertical','allstores','all_stores')

GROUP BY 1,2
),

xlb_summary AS (
SELECT
user_id,
CASE
WHEN <TRACKING_LABEL_COL> = 'store_front' THEN 'storefront'
ELSE <TRACKING_LABEL_COL>
END AS source,

MAX(CASE WHEN name IN (<INTERACTION_EVENTS>)
         AND datestr BETWEEN :START_112D AND :END_DATE
     THEN epoch_ms ELSE NULL END) AS latest_interaction_time

FROM <SCHEMA_XLB>.<TABLE_XLB> xlb
WHERE TRUE

GROUP BY 1,2
),

data_summary AS (
SELECT
COALESCE(cdl.user_id, xlb.user_id) AS user_id,
COALESCE(cdl.source, xlb.source) AS source,
COALESCE(cdl.impression_count_7d, 0)
+ COALESCE(xlb.impression_count_7d, 0) AS impression_count_7d,

GREATEST(cdl.latest_interaction_time, xlb.latest_interaction_time)
  AS latest_interaction_time

FROM cdl_summary cdl
FULL OUTER JOIN xlb_summary xlb
ON cdl.user_id = xlb.user_id
AND cdl.source = xlb.source
),

summary_agg AS (
SELECT
user_id,
SUM(impression_count_7d) AS total_impression_7d,

SUM(order_count_112d)     AS total_order_112d

FROM data_summary
GROUP BY 1
)

INSERT OVERWRITE TABLE <SCHEMA_TARGET>.<TABLE_TARGET>
PARTITION (datestr)
SELECT
CONCAT(d.user_id, '|', d.source) AS uuid,
d.user_id,
d.source,
d.impression_count_7d,

agg.total_impression_7d,

d.latest_interaction_time,
:END_DATE AS datestr
FROM data_summary d
JOIN summary_agg agg
ON d.user_id = agg.user_id
WHERE d.source IS NOT NULL;

Gluten version

No response

Spark version

None

Spark configurations

NA

System information

NA

Relevant logs

E20250427 14:32:18.128706   176 VeloxMemoryManager.cc:401] Failed to release Velox memory manager after 43350ms as there are still outstanding memory resources. 
E20250427 14:32:18.128832   176 MemoryPool.cpp:435] [MEM] Memory leak (Used memory): Memory Pool[op.0.0.0.TableScan LEAF root[root] parent[node.0] MALLOC track-usage thread-safe]<unlimited max capacity unlimited capacity used 276.00KB available 748.00KB reservation [used 276.00KB, reserved 1.00MB, min 0B] counters [allocs 114189, frees 114184, reserves 0, releases 1, collisions 0])>
E20250427 14:32:18.128959   176 Exceptions.h:66] Line: cpp/velox/memory/VeloxMemoryManager.cc:102, Function:removePool, Expression: pool->reservedBytes() == 0 (1048576 vs. 0), Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (1048576 vs. 0)
Retriable: False
Expression: pool->reservedBytes() == 0
Function: removePool
File: cpp/velox/memory/VeloxMemoryManager.cc
Line: 102
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN6gluten20ListenableArbitrator10removePoolEPN8facebook5velox6memory10MemoryPoolE
# 4  _ZN8facebook5velox6memory13MemoryManager8dropPoolEPNS1_10MemoryPoolE
# 5  _ZN8facebook5velox6memory14MemoryPoolImplD2Ev
# 6  _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE24_M_release_last_use_coldEv
# 7  _ZN6gluten18VeloxMemoryManagerD1Ev
# 8  _ZN6gluten18VeloxMemoryManagerD0Ev
# 9  _ZN6gluten13MemoryManager7releaseEPS0_
# 10 Java_org_apache_gluten_memory_NativeMemoryManagerJniWrapper_release
# 11 0x00007fc63ca15074

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions