Hi, I have a question regarding the realtime analyzer queue. We are seeing periodic backups while viewing the self-monitoring server health dashboard. Sometimes they are spikes that last for 20 minutes and go back down. Other times, the queue spikes really high and sustains at that level. What are the ways to debug this type of issue? Can this be a result of a BT someone is running? If so, is there a way to identify this at runtime? We are evaulating the number of purepaths to ensure that there is no unusual spike in activity from the agents. So far, we dont see any. What else could cause this. Eventually, the heap starts struggling and we start getting measurement gaps.
Answer by Anderson T. ·
A typical pattern is between 4000-8000, after a restart. It can stay in that range for an entire day with no problems (sometimes). But, in the last 10 hours, it has been steadily rising to 750,000.
Other times, we have seen smaller spikes to 500,000, and then it recovers after 20-30 minutes.
Heap and GC look decent, no GC suspensions over 1 sec. But, there are some occasional large spikes in CPU.
Answer by Rob V. ·
So when you're seeing the 750K, what's the display duration for the dT Server Health dashboard? (default is 72 hours). If you take it down to say the last 6 hours or so, and focus on the timeframe when the RTA is getting backed up, at that resolution are you seeing the 750K numbers? That's not a good number.
Is UEM involved, and what version are you running? In unpatched 5.5, I've run into a blocking condition having to do with threads processing Visits, but that's been subsequently fixed.
Rob
Answer by Anderson T. ·
If I do a 6 or 12 hour timeframe, with a 15 minute resolution, i still see the 750K. Since the metrics are avg/max, and not count-based, the change in resolution doesnt affect it much.
We run 5.6 with UEM. UEM is disabled, but we are still captured all purepaths at the webserver agent layer. Then, about 10% of the app servers below the web tier are instrumented with the java agent.
Answer by Rob V. ·
The 15 minute resolution is what I was getting at. 750K for that resolution is not good. I can't diagnose that further without seeing more details unfortunately. I suspect that you'll need to open a support case.
One thing to consider is if your dT server is sized appropriately for the load. Adequate cores, RAM, etc. If it is, then a support case is definitely warranted.
Rob
Answer by Anderson T. ·
Thanks Rob for your input. We are definitely pursuing a support case on this one. I suspect this may have to do with the volume of purepaths coming from the webserver agents.
Answer by Rob V. ·
Will you please send me the support case number at rob.vollum@compuware.com? I'd like to play along...
Rob
Answer by Noli Y. ·
Hi, Rob. How are you doing? It's been a while.
We are having a similar issue but it only happens during a loadtest. I'm guessing that we are hitting some kind of limit. Network, CPU, GC all look good. Looking at the Realtime Analyzer Queue Size, we are peaking at around 400K and then it goes down to about 150k with a corresponding spike on Skipped Purepaths (non-analyzed). We are running 5.6 with UEM.
Can you please let me know if this support case was already resolved?
Thanks,
Noli
JANUARY 15, 3:00 PM GMT / 10:00 AM ET