cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Creating a dashboard with a formula "percentage of Kubernetes pod usage (requests/limits)"

shakib
Guide

So I am able to see a % of the CPU being requested per pods compared to the limit of the CPU on said pod using the following formual:

(
builtin:cloud.kubernetes.pod.cpuRequests:last:splitBy("dt.entity.cloud_application_instance")
/ builtin:cloud.kubernetes.pod.cpuLimits:last:splitBy("dt.entity.cloud_application_instance")
* 100
):setUnit(Percent):sort(value(sum,descending))

 

But I am trying to add a filter to this by saying that I only want the data with a specific K8 cluster. When I create a dashboard tile for the above formula and then I add a dynamic filter on the dashboard to limit by a specific Kubernete Cluster, that dynamic filter doesn't work. 

So, I am now trying to enter a Kubernete cluster filter into the formula but I'm running into issues. Any suggestions on how I can achieve this? I tried adding things like "dt.entity.kuberentes_cluster" at the end of both metrics like below, but it does not work. 

(
builtin:cloud.kubernetes.pod.cpuRequests:last:splitBy("dt.entity.cloud_application_instance", "dt.entity.kubernetes_cluster")
/ builtin:cloud.kubernetes.pod.cpuLimits:last:splitBy("dt.entity.cloud_application_instance","dt.entity.kubernetes_cluster")
* 100
):setUnit(Percent):sort(value(sum,descending))

3 REPLIES 3

shakib
Guide

So in this example below I see a % of the CPU requested from the allocated on a Node. I want this same data at the pod level, which is what I am trying to find out in my formula above. 


(
builtin:kubernetes.node.requests_cpu:last:splitBy("dt.entity.kubernetes_node","dt.entity.kubernetes_cluster"):sum
/ builtin:kubernetes.node.cpu_allocatable:last:splitBy("dt.entity.kubernetes_node","dt.entity.kubernetes_cluster"):sum
* 100
):setUnit(Percent):sort(value(sum,descending))

shakib
Guide

Ok I figured out how to get it to work with an environment tag filter. But the problems Dynatrace is reporting to me are different than the ones I am seeing from my formula which now looks like:

(builtin:cloud.kubernetes.pod.cpuRequests:filter(and(or(in("dt.entity.cloud_application_instance",entitySelector("type(cloud_application_instance),fromRelationship.runsOn(type(KUBERNETES_NODE),tag(~"Environment:TagHERE~"))"))))):splitBy("dt.entity.cloud_application_instance") / builtin:cloud.kubernetes.pod.cpuLimits:filter(and(or(in("dt.entity.cloud_application_instance",entitySelector("type(cloud_application_instance),fromRelationship.runsOn(type(KUBERNETES_NODE),tag(~"Environment:TagHERE~"))"))))):splitBy("dt.entity.cloud_application_instance") * 100 ) :sort(value(auto,descending)):setUnit(Percent)

 

I guess this means that the CPU request saturation alerts that Dynatrace is showing me (which are worthless as they point to a Node and not an actual Pod that I can point to and say this is the problem one) are using some other statistic. 

 

 

florian_g
Dynatrace Mentor
Dynatrace Mentor

Hi 👋,

Here are a few thoughts from my end - I hope they help 🙂

  1. The metrics you're using in your expression are meanwhile deprecated. I recommend using the following alternative:
    (
    builtin:kubernetes.workload.requests_cpu:last:splitBy("dt.entity.cloud_application", "dt.entity.kubernetes_cluster")
    / builtin:kubernetes.workload.limits_cpu:last:splitBy("dt.entity.cloud_application","dt.entity.kubernetes_cluster")
    * 100
    ):setUnit(Percent):sort(value(sum,descending))

    With that expression, also the filtering works.

  2. I'm interested in the use-case behind the expression: Why do you want to know CPU requests compared to limits on a pod level? What I usually see is people comparing
    1. usage to requests on a workload level ("is my workload on average using what it is guaranteed and what it is blocking -> mostly about cost-efficiency by optimizing requests of workloads")
    2. OR they compare usage to limits ("is my workload hitting the limits? -> hitting the limits results in CPU throttling in case of CPU and OOM kills in case of memory").

Best,

Florian

One does not simply run a container...

Featured Posts