Solved: sum(builtin:tech.generic.cpu.usage) versus builtin:host.cpu.usage or builtin:host.cpu.user

PierreLamonzie · ‎19 Apr 2024

Hello all,

I wanted to check something very basic - the sum of the CPU usage of my processes is equals to my host CPU usage.

I was not expected to have exactly the same values, maybe some %ages of difference.

But I really have huge differences, quite often 30% to 50%, on both ways.

Figures are better when I compare sum(builtin:tech.generic.cpu.usage) with builtin:host.cpu.user rather than builtin:host.cpu.usage, but results are very far to be "good"

By increasing the resolution, from 1m to 15m for exemple, the differences are less however, even if I still meet differences higher than 20% between metrics in 20% of the cases.

I was wondering where the approach is not correct, and how I can create/use:

* problems on builtin:tech.generic.cpu.usage thresholds

* problems on builtin:host.cpu.user thresholds

being sure everything is consistent.

With the data I have, I could generate a problem telling "this process CPU usage is higher than x%" but with a host CPU usage which is lower than x% --> clearly not good...

(Working on premise)

Thx and regards.

Pierre

Eric_Yu · ‎22 Apr 2024

Hi Pierre,

Regarding the description of each metric, you can get an idea by checking the metrics tab:

For example, for CPU user vs CPU usage %, here's the difference:

The one you're looking for should be builtin:host.cpu.usage. And for the comparison between that and the builtin:tech.generic.cpu.usage per process, they should be around the same when aggregated. From my tests:

Can you provide the metric selectors of yours? It may be helpful to see what's going on there.

Regards,

Eric Yu

PierreLamonzie · ‎22 Apr 2024

Hello Eric,

thank you so much for your answer.

I was using the same selector than you, but discrepancies look to be higher than yours:

The absolute difference between the two values is low, but the relative difference can be very high.

This uncertainty is maybe intrinsic to the way those data are captured, because anyway usage values are quite low, it's hard to get a perfect accuracy from the linux OS itself.

Thank you again for your feedback, let's consider this topic as closed.

Best regards.

Pierre