I have a very standard 2 Web Server + 4 Java Agent setup. I am using the dT Cloud instance. It has never worked properly. As soon as I thrown on some load, about 5-10 minutes in data stops coming in (charts just drop off). I do see new PP's through PP dashlet but unclear if it's capturing everything. In the incidents view, I see >= 1000 purepaths are corrupted. That's the only clue I can see. I've looked at the server and FE server monitoring charts and can't discern anything that jumps out.
I am not sure what could cause this corruption and is this related to the lack of data through charts? Any advice on the cause or what to check is appreciated!
Answer by Reinhard W. ·
Roy,
did you check the collectors? Might there be a bottleneck between the collectors and the agents? Where are your collectors running?
Reinhard
It is a single collector running on a node serving backoffice (staff facing) traffic. That node is not instrumented with a Java agent. However, it is in same network as the nodes that do have Java agents. What / how can I check if it is being overrun?
UPDATE: Thanks for the hint Reinhard. I do see this a few times in Collector.0.0.log
. What do you think?
2015-07-21 13:47:56 WARNING [OutOfProcessEventHandler] Exception happened while trying to send buffer. Retry to send buffer... 2015-07-21 13:47:56 WARNING [OutOfProcessEventHandler] com.dynatrace.diagnostics.communication.tcp.exception.CommunicationException: class java.net.SocketException: Connection timed out, while executing request: class com.dynatrace.diagnostics.collector.OutOfProcessEventHandler$SendBufferRequest at com.dynatrace.diagnostics.communication.tcp.core.DefaultRequest.handleException(DefaultRequest.java:15) at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession$SessionRequest.handleException(DefaultSession.java:757) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.handleException(ManagedSocketConnection.java:56) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.access$000(ManagedSocketConnection.java:26) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection$RequestWrapper.handleException(ManagedSocketConnection.java:75) at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.handleException(SocketConnection.java:153) at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.executeRequest(SocketConnection.java:143) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.executeRequest(ManagedSocketConnection.java:98) at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession.executeRequestOnConnection(DefaultSession.java:631) at com.dynatrace.diagnostics.communication.tcp.session.DefaultService.executeRequestOnConnection(DefaultService.java:126) at com.dynatrace.diagnostics.collector.OutOfProcessEventHandler.sendBuffer(SourceFile:540) at com.dynatrace.diagnostics.collector.shared.protocol.server.SendQueue$BufferSender.run(SourceFile:158) Caused by: java.net.SocketException: Connection timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at com.dynatrace.diagnostics.sdk.io.IoStatsAwareInputStream.read(IoStatsAwareInputStream.java:19) at java.io.DataInputStream.readBoolean(DataInputStream.java:242) at com.dynatrace.diagnostics.sdk.io.DataInputStreamDataInput.readBoolean(DataInputStreamDataInput.java:54) at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession$SessionRequest.executeRequestResponse(DefaultSession.java:723) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection$RequestWrapper.executeRequestResponse(ManagedSocketConnection.java:70) at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.executeRequest(SocketConnection.java:135) ... 5 more 2015-07-21 13:47:56 WARNING [DefaultSession] unable to get outbound connection - session costco-hybris.compuwareapmaas.com:Plain:Uncompressed:Collector is in status: Disconnected. subsequent messages of this kind will be logged to log level FINE. 2015-07-21 15:37:02 WARNING [OutOfProcessEventHandler] Exception happened while trying to send buffer. Retry to send buffer... 2015-07-21 15:37:02 WARNING [OutOfProcessEventHandler] com.dynatrace.diagnostics.communication.tcp.exception.CommunicationException: class java.net.SocketException: Connection timed out, while executing request: class com.dynatrace.diagnostics.collector.OutOfProcessEventHandler$SendBufferRequest at com.dynatrace.diagnostics.communication.tcp.core.DefaultRequest.handleException(DefaultRequest.java:15) at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession$SessionRequest.handleException(DefaultSession.java:757) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.handleException(ManagedSocketConnection.java:56) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.access$000(ManagedSocketConnection.java:26) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection$RequestWrapper.handleException(ManagedSocketConnection.java:75) at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.handleException(SocketConnection.java:153) at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.executeRequest(SocketConnection.java:143) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection.executeRequest(ManagedSocketConnection.java:98) at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession.executeRequestOnConnection(DefaultSession.java:631) at com.dynatrace.diagnostics.communication.tcp.session.DefaultService.executeRequestOnConnection(DefaultService.java:126) at com.dynatrace.diagnostics.collector.OutOfProcessEventHandler.sendBuffer(SourceFile:540) at com.dynatrace.diagnostics.collector.shared.protocol.server.SendQueue$BufferSender.run(SourceFile:158) Caused by: java.net.SocketException: Connection timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at com.dynatrace.diagnostics.sdk.io.IoStatsAwareInputStream.read(IoStatsAwareInputStream.java:19) at java.io.DataInputStream.readBoolean(DataInputStream.java:242) at com.dynatrace.diagnostics.sdk.io.DataInputStreamDataInput.readBoolean(DataInputStreamDataInput.java:54) at com.dynatrace.diagnostics.communication.tcp.session.DefaultSession$SessionRequest.executeRequestResponse(DefaultSession.java:723) at com.dynatrace.diagnostics.communication.tcp.connection.ManagedSocketConnection$RequestWrapper.executeRequestResponse(ManagedSocketConnection.java:70) at com.dynatrace.diagnostics.communication.tcp.connection.SocketConnection.executeRequest(SocketConnection.java:135) ... 5 more 2015-07-21 15:37:02 WARNING [DefaultSession] unable to get outbound connection - session costco-hybris.compuwareapmaas.com:Plain:Uncompressed:Collector is in status: Disconnected. subsequent messages of this kind will be logged to log level FINE.
The Collector.0.log file would be the place to go if there are any suspicious items in there. Also if the network connection to the server has dropped or is slow.
JANUARY 15, 3:00 PM GMT / 10:00 AM ET