Hi,
We noticed intermittent high response time in one of our webservice apps, and on checking DT found that for most of the slow requests, the purepath breakdown shows 100% IO. However, this time is being shown as being spent in Servlet service() method. Please refer screenshots of purepaths (API/agent names masked for confidentiality) .There were few other purepaths which had almost 100% time spent in sync, trying to get hold of the log, while one of the threads were writing to log file (we use log4j). The only external dependency for this app is a MySQL DB which does not show any bottleneck in terms of connection acquistion or query execution.
We also have an independent infra dashboard which shows appserver and JVM metrics (completely done outside of DT) , and for this time window, the iostats average wait time was high. Please let me know the cause/effect here. Is the service() method being slow due to IO wait , or is it the other way round. I am guessing IO wait triggered the slowness, but do we have any way to confirm it via DT?
Answer by Joseph A. ·
Somehow hit "post" before attaching screenshots. Please refer purepath_1.png - PP showing hotspot in service() method, and purepath_2.png (API distribution) which again shows "Servlet" as the main contributor.
Thanks,
Joseph
Answer by Andreas G. ·
Hi Alex
Can you export a PurePath and attach it? If you do so - please export the PurePath including Time Series Data.
Answer by Andreas G. ·
Thanks. I looked at it but am puzzled why there is no Auto Sensor data. Did you turn that feature off?
It seems like the service method is doing something for most of entirety of this purepath until it starts executing these SQL Statements and spawning these additional threads. Is it possible that your app makes external calls through custom/propriatory channels?
Answer by Joseph A. ·
Hi Andreas,
I don't think auto sensors are disabled, how can we confirm that?. The SQL statements are not taking time, and the threads are for executing some functions in parallel. This has not been causing any issue so far. There is no external call at all for this app, only interaction is with DB. Attaching 2 purepaths where hotspot was related to logging, one while writing to the log (I/O) purepath_io.png, and the other trying to get a hold of the log handle (sync) purepath_sync.png. I just want to know if this is being triggered due to some underlying infrastructure I/O issue unrelated to the app.
Thanks,
Joseph
perfect - these screenshots is what i was hoping for. they show that auto sensors (grey methods) are picking up the areas of your code that are the real performance issue. it clearly shows that the time is spent in writing your log messages to the file system. Please have a look at the Host Health Monitoring Dashboard for that machine where this application is running on. dynaTrace automatically monitors disk i/o and available disk space.
If this doesnt tell you why writing to disk is slow you may want to look at other metrics of this OS or the JVM process, e.g: opened file handles, ...
I have to admit that I am not an expert when it comes to I/O - but - I am sure those folks that are familiar with the OS that your app is running on will know how to analyze issues with I/O. Explore the log files. Maybe there are other processes that also access them at the same time?
andi
Hi Andi,
Those purepaths give an indication about logging, but there a bunch of others which just show servlet service() method like this one. pp_servlet.png. Here 100% of the time is shown in I/O but we don't see which method is stuck in I/O . What can be done in such cases? . In "API distribution" for this purepath, pp_api_dist.png, it just shows all the time in "Servlet" component, but only a very small fraction is CPU time. This seems puzzling to me. Is there any other way to monitor this server within DT - maybe a different view other than Purepaths to know the I/O behavior. I am also checking with the infra monitoring team here to see if they can help.
Thanks,
Joseph
Hi.
Auto Sensors work in a way that they capture additonal/extended PurePath information in a way that they dont impact application performance. Therefore it is not guaranteed that we get this information for every PurePath. If there is no auto sensor information the time is shown on the last instrumented sensor node - in your case that is the service method.
In order to "proof" that it is the logging you could create a custom sensor rule for the log method on your Category class. This will then ensure that the method execution is always captured for every PurePath. I am pretty sure it is the log method in all cases - but - I would go ahead and create this custom sensor rule as a next step. The easiest way to do this is to right click on the log method in the Purepath where it shows up -> Create Sensor Rule -> Include Selected Method. This will create a new sensor. In order for the sensor to work you either need to do a Hot Sensor Placement (HSP) or restart the JVM. HSP can be done in the AGent OVerview. It is one of the context menu options on a Java Agent. It will "force" the JVM to reload its classes which allows dynaTrace to add our custom instrumentation for your sensors
Hope this helps
Answer by Christian S. ·
hi Joseph,
I totally second Andi's thoughts about the logging and how you should proceed!
and I'm also puzzled by the fact that you don't get auto-sensor data for a PurePath which lasts more than 11 seconds. could you tell us which JVM you are using? or could you maybe even create a system information archive (client cockpit -> status overview -> system information) including the corresponding agent log files? that could help us find out more.
thx, Christian
Answer by Joseph A. ·
Hi,
I couldn't to get this for a while. Reading through the posts again, i have attached the info requested on the agent log file - agent_log.txt properties dt_agent_props.txt, and deployed sensors info (from "Agents Overview" and clicking on the agent) deployed_sensor.txt. Please point me to the sensor setup that can be done for more extensive monitoring. I tried following "Purepath where it shows up -> Create Sensor Rule -> Include Selected Method, but seems the change did not persist. Do i need to be have admin privileges to make such changes? . Can you also suggest the changes to be done for infra level monitoring, CPU , disk , I/O etc. - currently, it shows all of them healthy host_health_status.png, but can't see any detailed metrics - host_info.png
Thanks,
Joseph
JANUARY 15, 3:00 PM GMT / 10:00 AM ET