Calculate batch size based on measurements and available workers information



You can use PySpark to access the memory and cores available on the workers through the SparkContext and SparkStatusTracker.

```
from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.master("spark://<master-node>:7077").appName("MemoryCheck").getOrCreate()
sc = spark.sparkContext

# Access SparkStatusTracker
status_tracker = sc._jsc.statusTracker()

# Get executor details
executors_info = sc._jsc.getExecutorMemoryStatus()

# Print memory details for each worker
for executor, memory_info in executors_info.items():
    worker_address = executor.split(":")[0]  # Worker address
    total_memory = memory_info[0]  # Total memory in bytes
    free_memory = memory_info[1]  # Free memory in bytes
    
    print(f"Worker: {worker_address}")
    print(f"  Total Memory: {total_memory / (1024 * 1024):.2f} MB")
    print(f"  Free Memory: {free_memory / (1024 * 1024):.2f} MB")

# Stop SparkSession
spark.stop()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate batch size based on measurements and available workers information #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Calculate batch size based on measurements and available workers information #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions