Skip to content

Investigating Task Overload Issues in Azure Batch with Automatic Scaling(maxParallelTasks) #184

@zoltix

Description

@zoltix

I have a strange issue with my Azure Batch setup. Here's the situation: I have a pipeline that creates about 30 jobs, and each job contains roughly 200 tasks. Normally, the job is configured to have a maximum of two parallel tasks running at the same timemax( at job level ParallelTasks=2). However, I’ve noticed that sometimes one of the jobs runs 15-20 tasks simultaneously.

However, I don’t understand why all my jobs generally work fine, with only two tasks running at the same time as expected. But occasionally, and even quite often, I observe many tasks running simultaneously in a job, despite the limitation being set to two. Honestly, I don’t understand this behavior.

When I increase the number of VMs in the pool ( in my tests), the problem becomes significantly worse. Sometimes, I observe 40, 50, or even 60 tasks running simultaneously. This creates major issues in production because it consumes a lot of resources, drastically slows everything down, and makes the system very inefficient.

I don’t understand this behavior. I’ll send you the automatic scaling configuration I’ve written. Maybe the error lies there.

$varActiveTasks = max($ActiveTasks.GetSample(15));
$varRunningTasks = max($RunningTasks.GetSample(15));
$varPendingTasks = max($PendingTasks.GetSample(15));
$varTaskSlotsPerNode = max($TaskSlotsPerNode);
$varCurrentDedicatedNodes = $CurrentDedicatedNodes;
$varUsableNodeCount = max($UsableNodeCount.GetSample(15));
$varPreemptedNodeCount = max($PreemptedNodeCount.GetSample(15));
$varTargetDedicatedNodes = $TargetDedicatedNodes;
$varMaxTargetDedicatedNodes = 10;  
$CurTime = time() + (2 * TimeInterval_Hour); 
$CurHour = $CurTime.hour;
$CurMinute = $CurTime.minute;
$IsNightTime = ($CurHour == 21 && $CurMinute >= 50) || ($CurHour >= 22) || ($CurHour < 3);
$minCapacity = $IsNightTime ? 0 : 0;
$samples = $PendingTasks.GetSamplePercent(TimeInterval_Minute * 15);
$LastSampledActiveTasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max($PendingTasks.GetSample(1), avg($PendingTasks.GetSample(TimeInterval_Minute * 15)));
$smoothedPendingTasks = avg($PendingTasks.GetSample(TimeInterval_Minute * 15));
$requiredNodes = ceil($smoothedPendingTasks / $varTaskSlotsPerNode);
$adjustmentFactor = 0.5;
$recommendedNode = $TargetDedicatedNodes + ($requiredNodes - $TargetDedicatedNodes) * $adjustmentFactor;
$TargetDedicatedNodes = min(max($recommendedNode, $minCapacity), $varMaxTargetDedicatedNodes);
$NodeDeallocationOption = taskcompletion;

How to limit the number of tasks in all jobs?
thanks in advance for your help,

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions