question

Douglas V. avatar image
Douglas V. asked ·

Using dynatrace to detect the source of JVM freezing

Hello everyone,


One of our applications is suffering with a sudden JVM freezing. When it happens appmon dumps stops working (thread, cpu and memory). Theres another way to use appmon to detect the source of the JVM freeze?


Thanks everyone and sorry for the english.

appmonjava7.2
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

Joe H. avatar image
Joe H. answered ·

As Sebastian alludes to, I would not focus so much on transactional metrics, but more on environmental metrics, such as GC, memory, CPU, etc. If a worker thread went 100% Compute, the thread scheduler would still work and you would still be able to get CPU metrics, etc. But (as you state), if the whole JVM hangs, there's something else causing this that's systemic. Also look at your JVM STDOUT/STDERR logs for any telltale messages in the few minutes before the hang. The real issue likely happened a few minutes before the freeze.

If you suspect memory, try running a lightweight memory snapshot every 5 minutes constantly. Then when the freeze happens, you can compare the last 10(?) snapshots to see if there is any pattern of behavior in the heaps which could be the culprit to an unhealthy JVM state.

Another approach is to remove JVM memory and CPU directives. Sometimes people add memory and CPU and GC directives to the java command line and they actually cause problems, even to the point of the JVM becoming unresponsive. Try letting the JVM run itself with default values and see if things change.

1 comment Share
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

I'll test these suggestions.


Thanks for the help Joseph

0 Likes 0 · ·
Douglas V. avatar image
Douglas V. answered ·
All the JAVA/JMX related metrics doesnt work anymore when it happens. Only thing that happens differently is the behaviour of some of the CPU cores.


When it happens... one, two or three CPU cores gets stuck a 100% comsumption. This is the only notable symptom when the freezing happens.


Thanks.

3 comments Share
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

Do you have any purepaths from moment right before issue?

0 Likes 0 · ·

Yes, but only after about 10 minutes when they show as "corrupted". there's no pattern since a lot of different requests appeared as corrupted in the moment of freezing, dont know if this behaviour is the cause or the consequence.

0 Likes 0 · ·

Try setting up incident on CPU usage measure that will try making cpu sampling when cpu is above 50% (for example). The idea is to start it before application is not responsive. As action of this incident setup cpu sampling plugin and thread dump plugin. This gives you chance for some data to analysis. If cpu spike is to rapid it may not work unfortunately. But check this out.

0 Likes 0 · ·
Sebastian K. avatar image
Sebastian K. answered ·

What are symptoms? CPU, Memory, Garbage collector?

Sebastian

Share
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.