When there are
multiple nodes with enough resources available to deploy pods, the Kubernetes
scheduler selects the node with highest score. Let's discuss how Kubernetes
prioritize nodes based on resources.
The Kubernetes
scheduler has three algorithm related to resources. The first one is the least_requested
algorithm, with which the Kubernetes scheduler tends to spread pods out and keep resource utilization rate on every node low.
The algorithm looks like below:
// The
unused capacity is calculated on a scale of 0-10
// 0
being the lowest priority and 10 being the highest.
// The
more unused resources the higher the score is.
func
calculateUnusedScore(requested int64, capacity int64, node string) int64 {
if capacity == 0 {
return 0
}
if requested > capacity {
glog.V(10).Infof("Combined
requested resources %d from existing pods exceeds capacity %d on node %s",
requested, capacity, node)
return 0
}
return ((capacity - requested) * 10) /
capacity
}
allocatableResources
:= nodeInfo.AllocatableResource()
totalResources
:= *podRequests
totalResources.MilliCPU
+= nodeInfo.NonZeroRequest().MilliCPU
totalResources.Memory
+= nodeInfo.NonZeroRequest().Memory
cpuScore
:= calculateUnusedScore(totalResources.MilliCPU, allocatableResources.MilliCPU,
node.Name)
memoryScore
:= calculateUnusedScore(totalResources.Memory, allocatableResources.Memory,
node.Name)
The
final score is the average of cpuScore and memoryScore. The code above shows
that the nodes with lower resource utlization rate have higher score, then
highter priority to deploy pods. If there are two nodes
(with 2 CPU and 4 CPU respectively) available when scheduling a pod requesting
1 CPU, the least_requested algorithm tends to select the node with 4 CPU.
The most_requested
algorithm behaves in the opposite way, with which the Kubernetes scheduler
tends to deploy pods onto nodes with the highest resource utilization rate. The
code to score nodes is below:
// The
used capacity is calculated on a scale of 0-10
// 0
being the lowest priority and 10 being the highest.
// The
more resources are used the higher the score is. This function
// is
almost a reversed version of least_requested_priority.calculatUnusedScore
// (10 -
calculateUnusedScore). The main difference is in rounding. It was added to
// keep
the final formula clean and not to modify the widely used (by users
// in
their default scheduling policies) calculateUSedScore.
func
calculateUsedScore(requested int64, capacity int64, node string) int64 {
if capacity == 0 {
return 0
}
if requested > capacity {
glog.V(10).Infof("Combined
requested resources %d from existing pods exceeds capacity %d on node %s",
requested, capacity, node)
return 0
}
return (requested * 10) / capacity
}
If there are two
nodes (with 2 CPU and 4 CPU respectively) available when scheduling a pod
requesting 1 CPU, the most_requested algorithm tends to select the node with 2
CPU.
The third algorithm
is balanced_resource_allocation,
with which the Kubernetes scheduler tries to balance the utilization rates of
CPU and memory. Its related code looks like:
allocatableResources
:= nodeInfo.AllocatableResource()
totalResources
:= *podRequests
totalResources.MilliCPU
+= nodeInfo.NonZeroRequest().MilliCPU
totalResources.Memory
+= nodeInfo.NonZeroRequest().Memory
cpuFraction
:= fractionOfCapacity(totalResources.MilliCPU, allocatableResources.MilliCPU)
memoryFraction
:= fractionOfCapacity(totalResources.Memory, allocatableResources.Memory)
score :=
int(0)
if
cpuFraction >= 1 || memoryFraction >= 1 {
// if requested >= capacity, the
corresponding host should never be preferred.
score = 0
} else {
// Upper and lower boundary of difference
between cpuFraction and memoryFraction are -1 and 1
// respectively. Multilying the absolute
value of the difference by 10 scales the value to
// 0-10 with 0 representing well balanced
allocation and 10 poorly balanced. Subtracting it from
// 10 leads to the score which also scales
from 0 to 10 while 10 representing well balanced.
diff := math.Abs(cpuFraction -
memoryFraction)
score = int(10 - diff*10)
}
In the
code above, it calculates the CPU and memory utilization rate first, and then
their difference. The node with the highest resource utilization rate
difference has the lowest priority.
The
Kubernetes scheduler doesn't prefer nodes with 100% CPU or memory utilization
rate. When a node with 100% CPU or memory utilization, its score is 0 and it is
in the lowest priority to deploy pods.
When pods don't request resources explicitly (in Resources.Requests
of deployment config), the Kubernetes scheduler treat them with 0.1 CPU and
200M memory requests by default when scoring nodes (non-zero.go):
// For
each of these resources, a pod that doesn't request the resource explicitly
// will
be treated as having requested the amount indicated below, for the purpose
// of
computing priority only. This ensures that when scheduling zero-request pods,
such
// pods
will not all be scheduled to the machine with the smallest in-use request,
// and
that when scheduling regular pods, such pods will not see zero-request pods as
//
consuming no resources whatsoever. We chose these values to be similar to the
//
resources that we give to cluster addon pods (#10653). But they are pretty
arbitrary.
// As
described in #11713, we use request instead of limit to deal with resource
requirements.
const
DefaultMilliCpuRequest int64 = 100
// 0.1 core
const
DefaultMemoryRequest int64 = 200 * 1024 * 1024 // 200 MB
The
Kubernetes scheduler has --algorithm-provider
to config the algorithms to prioritize nodes, which has two options
DefaultProvider and ClusterAutoScalerProvider. Both options include the
balanced_resource_allocation algorithm. The only difference between these two
options is that DefaultProvider uses the least_requested algorithm, while
ClusterAutoScalerProvider utilizes the most_requested algorithm.
No comments:
Post a Comment