Skip to content

official support for datafusion.runtime.memory_limit and co #537

@e-kotov

Description

@e-kotov

Issue Report: datafusion.runtime.memory_limit enforcement and visibility

Description

The datafusion.runtime.memory_limit setting is correctly enforced by the SedonaDB (DataFusion) runtime, blocking queries that exceed the limit. However, the setting is "invisible" to the SQL SHOW ALL command, making it difficult for users to verify their current configuration.

Reproduction Script

import sedonadb
import pandas as pd
import numpy as np
import pyarrow as pa

# 1. Setup Data (1 Million rows ~30MB)
print("--- Setup ---")
table = pa.Table.from_pandas(pd.DataFrame({
    "id": np.arange(1000000), 
    "v": np.random.randn(1000000)
}))
sd = sedonadb.connect()
sd.create_data_frame(table).to_view("data")

# 2. PROOF OF INVISIBILITY
print("\n--- Test 1: Invisibility ---")
sd.sql("SET datafusion.runtime.memory_limit = '2G'").execute()
df_show = sd.sql("SHOW ALL").to_pandas()
is_visible = any(df_show["name"].str.contains("memory_limit"))
print(f"Setting 'memory_limit' found in SHOW ALL: {is_visible}")

# 3. PROOF OF ENFORCEMENT
print("\n--- Test 2: Enforcement ---")
sd.sql("SET datafusion.runtime.memory_limit = '1M'").execute()
print("Limit set to 1MB. Attempting a sort...")

try:
    # Sort triggers memory allocation
    sd.sql("SELECT * FROM data ORDER BY v").head(1).execute()
    print("Failure: Ingestion succeeded (Limit was ignored)")
except Exception as e:
    print(f"Success: Enforcement confirmed. Execution blocked.")
    print(f"Error caught: {str(e)[:150]}...")

# 4. Cleanup/Recovery
sd.sql("SET datafusion.runtime.memory_limit = '10G'").execute()
print(f"\nFinal count after lifting limit: {sd.view('data').count()}")

Actual Output

--- Setup ---

--- Test 1: Invisibility ---
Setting 'memory_limit' found in SHOW ALL: False

--- Test 2: Enforcement ---
Limit set to 1MB. Attempting a sort...
Success: Enforcement confirmed. Execution blocked.
Error caught: External error: Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as:
  TopK[0]#2(can spill: false) co...

Final count after lifting limit: 1000000

Maybe this just wasn't yet propagated to SedonaDB from apache/datafusion#18452 ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions