Hello there. On a Windows server machine with ~30 instrumented .NET applications running (a mix of Windows services and IIS pools), the privileged CPU usage is high (machine icon is constantly red due to over 15% privileged CPU usage). With DT 5.6 there was no such problem. Also if debug info (full PDB) in the instrumented binaries is switched off, then the problem goes away as well. This has been observed on 2 different machines. The Windows performance monitor shows that each of the instrumented processes has exactly one thread that spikes the privileged CPU % measure every 10 seconds. Processes that have more threads exhibit higher CPU usage %%. Many processes spike up to 100%. When viewing threads in SysInternals Process Explorer, there is indeed one thread that spikes every 10 seconds - this thread has depth of 35 stack frames, with dtagentcore.dll!setAgentCorePath near the top, and no stack frames belonging to our apps. There is also perfproc.dll!ColelctSysProcessObjectData on the stack, also ntoskrnl.exe!PsResumeProcess, etc. Again, this spike is observed on all processes, but release builds have much shorter and smaller spikes. In effect, this high CPU usage makes the affected machines nearly unusable - it's a test environment machines, and we have to run debug builds there. Fiddling with sensors, even disabling all sensors on the system profile and agent group, has no effect on spikes. I suspect that this periodic activity has to do with gathering per-process CPU/memory/etc.
If the suspicion is correct, is it possible to reduce that polling frequency e.g. 10-fold?
Anyone experienced the same issue? There were no such problem with DT 5.6. We're running v.6.1 with 8105 patch level (fixpack).
Thanks, Victor.
Answer by Adam R. ·
Victor, thanks for the improved work around. We'll give this a shot.
I suspect that with several hundred processes, may need to even further back off from 60 seconds.
Andreas, if it helps, the support ticket I submitted was 00752352, was opened April 23, 2013, and closed May 17 with the resolution "disable performance counters".
Answer by Victor B. ·
UPDATE
1) There is a different workaround, as suggested by DT tech support: setting environment variable DT_PERFCOUNTERINTERVAL to 60 (60 seconds instead of the default 10 seconds) dramatically reduced kernel CPU usage - it's now at an acceptable level, so we can still take advantage of per-process NET performance metrics, albeit at a slightly lower resolution (which matters only during a short period of time as it is not warehoused at the higher resolution);
2) There is actually no difference between debug and release builds - a new series of tests using a single machine shows that the type of builds is not a factor;
3) The issue seems new (affecting dynatrace 6), and hopefully will be addressed in a future fixpack;
Answer by Victor B. ·
Hi Adam. Your fix worked - CPU dropped from 50...100% to 10% !!! Thanks a lot for sharing this.
Regards, Victor.
Hi Victor. I would still open a support ticket for this. If this is a known issue on certain windows/.NET versions then we should get an official statement from our support team. If it is not a well known issue then support needs to addres this with engineering
Andi
Answer by Adam R. ·
We experienced a similar issue with Dynatrace 5.x ... except with 200+ processes, instead of 30. As you can imagine, our host was not responsive.
YMMV, but we were OK after simply disabling the per-process performance counter monitoring by setting the appropriate environmental variable, i.e.,
setx DT_DISABLEPERFCOUNTERS true /M
FWIW, we opened a ticket with DT support for this issue for other workarounds, but I can't recall the outcome, and unfortunately can't find the ticket in the DT JIRA support system now (it was a SF support ticket).
JANUARY 15, 3:00 PM GMT / 10:00 AM ET