• Forums
    • Public Forums
      • Community Connect
      • Dynatrace
        • Dynatrace Open Q&A
      • Application Monitoring & UEM
        • AppMon & UEM Open Q&A
      • Network Application Monitoring
        • NAM Open Q&A
  • Home
  • Public Forums
  • Application Monitoring & UEM
  • AppMon & UEM Open Q&A
avatar image
Question by Steven P. · Jan 28, 2015 at 08:43 PM · diagnostics

Finding out who is calling System.gc()

Hi,

 

In this particular case there are 2 production servers running the same application, inside a JBoss EAP 6.1.1 container (Oracle/Sun JDK 1.7_25).
Both servers seemed to be doing excessing garbage collection, then we added the ''-XX:+DisableExplicitGC" JVM parameter on 1 of them.

Can we somehow find out where these GC invocations are coming from? The only difference between both servers is the 'DisableExplicitGC' parameter, without it, both JVMs have the same behaviour.

I included pictures for reference.

Bad:

[Broken image]

 

Good (explicit GC disabled):

[Broken image]

 

Regards,
Steven

Comment

People who like this

0 Show 0
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

5 Replies

  • Sort: 
  • Most voted
  • Newest
  • Oldest
avatar image

Answer by Balazs B. · Mar 09, 2015 at 07:49 PM

Hi!

If you would like to see which methods initiates the JVM garbage collection, my suggestion is to try to put a method sensor to the System.gc() and Runtime.gc() methods and you will be able to see all its occurances on the methods dashlet and it will allow you to drill down to the purepaths.

However I don't know lot about your application, give it a try to configure the RMI Garbage Collection Interval with these jvm parameters. Maybe it will help:

-Dsun.rmi.dgc.server.gcInterval=3600000

-Dsun.rmi.dgc.client.gcInterval=3600000

 

Please inform us about the results! Thanks.



Comment

People who like this

0 Show 1 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image Steven P. · Mar 09, 2015 at 08:27 PM 0
Share

Hi,

 

Sorry for not coming back with the definite cause, but it was the permgen space that was nearly full (over 90%), as I was thinking in my last comment. Causing a constant trigger of the CMS collector. So no one was calling System.gc() (or Runtime.gc()), but rather a badly configured JVM. since increasing the permgen a little, the issue didn't occur again.

 

FYI: the rmi interval is already at 'MAX_INT', as far as I know this would cause a full GC instead of a CMS cycle, at least in our configuration.

 

Regards,
Steven

avatar image

Answer by Steven P. · Feb 04, 2015 at 01:56 AM

Hi,

I now see this in another application, however this one is nearly idle as far as user requests are concerned. With such a low load, I see a pattern emerging.
Once a GC occurs on the young generation, the time needed on the old drops as well.

On this JVM, explicit GC is still allowed, but a CPU sampling doesn't reveal any suspects.
CMSClassUnloading is also eneabled.

Has anyone seen this behaviour before? How can this be explained? Looking at this, I don't think someone is calling System.gc, since that would also trigger a collection of the young generation if I'm not mistaken.

Regards,

Steven

Comment

People who like this

0 Show 1 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image Steven P. · Feb 04, 2015 at 03:14 AM 0
Share

Replying to myself...

I am suspecting that the perm gen space is not big enough (or there is a leak). This would mean the CMS constantly gets invoked to clear out some space on the perm gen, but fails to do so as everything is in use, so it would immediately trigger again.

Looking at the graphs for perm gen (custom in DT or using JConsole) I can see that it is nearly full, at this moment, of those 4 JVMS, 1 now has this issue, it is also the one with the highest amount of used perm gen space.

I have now increased the perm gen space for these 2 applications.

avatar image

Answer by Steven P. · Jan 31, 2015 at 01:09 AM

The CPU sampling doesn't bring in any likely suspect, the ones with a large execution time are 'wait(long)', socketRead0(), etc...
Sorting on 'CPU time' gets me the 'park(boolean,long)' method, as well as socketRead0, and things like that.

 

For now the application doesn't have that behaviour any more, which is strange in itself (the samples were taken when the issue was still present).
Should it reappear again, any other suggestions?

Comment

People who like this

0 Show 4 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image Rick B. · Jan 31, 2015 at 01:15 AM 0
Share

Not sure... from a strict Java performance engineering perspective, it's usually better to let the JVM manage the memory.  Without knowing anything about your app it's hard to say but perhaps the DisableExplicitGC setting is appropriate for normal production runtime?  andreas.grabner@dynatrace.com, the man who literally wrote the book may have more insight on that... (smile)

Rick B

 

avatar image Steven P. Rick B. · Jan 31, 2015 at 01:28 AM 0
Share

I've been through that book, really useful.

It would surprise me that it is 'normal' to activate that option in production, I think it was added for admins like me who have to deal with unknown, possibly badly written, code. I do have access to the git repository though, a search through both code and history came up empty, except for a single test class (JUnit).

avatar image Rick B. Steven P. · Jan 31, 2015 at 01:36 AM 0
Share

It's possible then that it's a lower-level method executing within a library.  Back to a question Andi posed earlier:  During the period when the issue was happening, do you see a lot of contribution from suspension (orange) in the PurePath or the Response Time Hotspots view?  Can you attribute it to one method or one API?

Rick B

avatar image Steven P. Rick B. · Jan 31, 2015 at 01:47 AM 0
Share

I did see a lot of suspension in the purepaths, and response time hotspots, but most of it was in the 'Servlet' api, that's the part every request comes through, and has the largest part of the response time anyway, about 80-90%. If I'm not mistaken, suspension time from one place also shows up in other places, so it is a bit harder to pinpoint on a busy server.

Please do correct me if I'm wrong on that assumption.

The data itself has already been cleared out of dynatrace, except for the CPU samplings. So I think I'll let this rest until it pops up again.

thanks for your input.

avatar image

Answer by Andreas G. · Jan 29, 2015 at 02:30 AM

I like Richards idea of the CPU Sampling. Those methods that show up as having high exec time are probably those calling it.

On the other side I would also look at those PurePaths that run long in that timeframe when the GC happens. Use the Methods Hotspot and see which methods take a long time. You might be lucky and find the problematic method as well. It could however be that gc() is called from a different thread that is currently not captured by a PUrePath. in that case the CPU Sampling is also the better approach

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Steven P. · Jan 28, 2015 at 08:52 PM

It seems the images are not properly shown, so I'll briefly describe them.

Both images are from the old generation heap of the JVM, the 'Bad' show the used memory as an almost straight horizontal line, and the GC involations as grey rectangles don't have any space in between them. In the 'Good' image, the used memory line has a clear 'up' trend, until it reaches approx 90%, then there is 1 grey bar for a GC, it drops down again to the same level as the used memory in the other image. Both images were taken with on 30 minute scale.

Comment

People who like this

0 Show 1 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image Rick B. · Jan 29, 2015 at 02:24 AM 0
Share

Here's how you can accomplish this: run a CPU Sample at highest sampling rate for a fixed period of time.  If you are constantly seeing GC and expect it to happen at least a couple of times in the course of a minute, run it for 60 seconds.  Otherwise run it for longer.  My recommendation though is to configure it to run only for a specified period of time.  I've seen cases where people have accidentally left this running in prod for days because they forgot it was running.  Doc: Runtime Specific Dashlets and CPU Sampling

Typically the other approach would be to temporarily place a sensor on the method, but since this is a native method we can't do that here.

Hope that helps,

Rick B

How to get started

First steps in the forum
Read Community User Guide
Best practices of using forum

NAM 2019 SP5 is available


Check the RHEL support added in the latest NAM service pack.

Learn more

LIVE WEBINAR

"Performance Clinic - Monitoring as a Self Service with Dynatrace"


JANUARY 15, 3:00 PM GMT / 10:00 AM ET

Register here

Follow this Question

Answers Answers and Comments

3 People are following this question.

avatar image avatar image avatar image

Related Questions

Error "Out of Memory" cause?

Memory Dump Plugin don't support selecting agent

Determine which sensor is capturing data

DT do not get the rcvd correctly.

How can I generate large Excel/CSV reports from dasboards and dashlets?

Forum Tags

dotnet mobile monitoring load iis 6.5 kubernetes mainframe rest api dashboard framework 7.0 appmon 7 health monitoring adk log monitoring services auto-detection uem webserver test automation license web performance monitoring ios nam probe collector migration mq web services knowledge sharing reports window java hybris javascript appmon sensors good to know extensions search 6.3+ server documentation easytravel web dashboard kibana system profile purelytics docker splunk 6.1 process groups account 7.2 rest dynatrace saas spa guardian appmon administration production user actions postgresql upgrade oneagent measures security Dynatrace Managed transactionflow technologies diagnostics user session monitoring unique users continuous delivery sharing configuration alerting NGINX splitting business transaction client 6.3 installation database scheduler apache mobileapp RUM php dashlet azure purepath agent 7.1 appmonsaas messagebroker nodejs 6.2 android sensor performance warehouse
  • Forums
  • Public Forums
    • Community Connect
    • Dynatrace
      • Dynatrace Open Q&A
    • Application Monitoring & UEM
      • AppMon & UEM Open Q&A
    • Network Application Monitoring
      • NAM Open Q&A