Skip to content

Calculate batch size based on measurements and available workers information #20

@vodkolav

Description

@vodkolav

You can use PySpark to access the memory and cores available on the workers through the SparkContext and SparkStatusTracker.

from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.master("spark://<master-node>:7077").appName("MemoryCheck").getOrCreate()
sc = spark.sparkContext

# Access SparkStatusTracker
status_tracker = sc._jsc.statusTracker()

# Get executor details
executors_info = sc._jsc.getExecutorMemoryStatus()

# Print memory details for each worker
for executor, memory_info in executors_info.items():
    worker_address = executor.split(":")[0]  # Worker address
    total_memory = memory_info[0]  # Total memory in bytes
    free_memory = memory_info[1]  # Free memory in bytes
    
    print(f"Worker: {worker_address}")
    print(f"  Total Memory: {total_memory / (1024 * 1024):.2f} MB")
    print(f"  Free Memory: {free_memory / (1024 * 1024):.2f} MB")

# Stop SparkSession
spark.stop()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions