• Forums
    • Public Forums
      • Community Connect
      • Dynatrace
        • Dynatrace Open Q&A
      • Application Monitoring & UEM
        • AppMon & UEM Open Q&A
      • Network Application Monitoring
        • NAM Open Q&A
  • Home
  • Public Forums
  • Application Monitoring & UEM
  • AppMon & UEM Open Q&A
avatar image
Question by Gregg K. · Apr 09, 2015 at 02:02 AM ·

Agent to Collector Timeout

What is the timeout for an agent attempting to connect to a collector? There is conflicting information in at least the 5.5 and 6.1 documentation. The definition of the DT_WAIT environment variable indicates that it is 20 seconds, but in the .Net Agent Troubleshooting section it indicates that it is 60 seconds. The agent log file also indicates that it will try for 60 seconds. Considering that the default Windows service timeout is 30 seconds doesn't 60 seconds seem long? This appears to cause one of our applications that is running as a Windows service to fail to start when the collector was not available. When I set the DT_WAIT to 10 the service started fine.

Comment

People who like this

0 Show 0
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

7 Replies

  • Sort: 
  • Most voted
  • Newest
  • Oldest
avatar image

Answer by Guenter H. · Apr 11, 2015 at 06:28 AM

Thanks Gregg,
would be great if you tried and it worked with the new default!

I´ll talk with our good support guys and the devs in charge on Monday what the exact reason for the change was and if they could think of any new implications.

Have a nice WE!
G.

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Christian S. · Apr 14, 2015 at 12:14 AM

actually we changed the default from 60s to 20s exactly for this reason, because  there were services (especially on windows) which did not come up in time when the collector was - for whatever reason - not reachable.

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Gregg K. · Apr 11, 2015 at 12:00 AM

We tested increasing the service timeout to 1m and then 2m, but it didn't help. Windows only seems to have that global setting for all services so we were not that crazy about that idea anyways. We only tried 10s for DT_WAIT, but I suppose we could have went with 20s since that is the default going forward.

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Guenter H. · Apr 10, 2015 at 06:29 PM

Good morning Gregg,
I reverted the copy-edit changes and referenced the DT_WAIT constant in the .Net troubleshooting section instead of duplicating the value so it won´t haunt us in the future.

While editing the troubleshooting I think I found (quite verbally) a good pointer to your problem:

"If a configuration is found, the Agent tries to connect to the given Collector ('server' setting) and creates a new dt_<agentName>bootstrap<pid>.log file in the <DT_HOME>\log folder. On connection problems, the Agent tries for DT_WAIT seconds (default timeout setting) and blocks the process from executing. When the Agent times out, no instrumentation is done."

I suspect the Server > Collector > Agent update chain gets into the way time-wise. If you set the time-out to 10s the app is only blocked 10s and it will come up, but only uninstrumented.

Edit: How about increasing the Windows service start-up timeout instead of decreasing DT_WAIT?

If we can´t nail it with Christian´s next reply I suggest you contact support with a support archive, so they can look at it more deeply with better visibility.

Thanks
G.

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Gregg K. · Apr 10, 2015 at 04:39 AM

This could happen during maintenance if the application starts while the collector is rebooted for patching. In this case this is a critical application and the application cannot be prevented from starting for any reason.

We tested this scenario because the application would not start previously prior to the collector connecting to the dt server for the first time. It just so happened the proper firewall rules were not in place to allow that traffic, but the collector was reachable by the agent. After the initial collector connection to the dt server the agents started up fine.

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Christian S. · Apr 10, 2015 at 03:12 AM

hi Gregg,

apart from the changed defaults, this is usually a scenario that you should not experience on a regular basis, as it indicates that the Agent could not connect to the Collector. and in this case the Agent will not be instrumenting and only providing some metrics but no PurePaths and such.

so changing this timeout to a lower value also increases the possibility that the Agent will not work as expected.
so my question is: what is the reason for the Agent not connecting to the Collector? is this expected from your side?

best,
Christian 

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Guenter H. · Apr 09, 2015 at 06:21 PM

Sorry for the confusion, Gregg!
It´s a combination of bad luck and ignorant, overzealous copy-editing you fell over.

Firstly, the bad luck:
The 5.5 you looked at was the last version where the timeout was 60 seconds.

I normally put change notes in the new documentation when such parameters change, but I don´t have / take the time to put forward references in the old docs.

Secondly, the copy-editing:
My original in 5.6 for wait=<seconds>, because there were no copy editors yet:
Optional: Specifies the initial wait timeout – the maximum time to wait for a connection to a dynaTrace Collector in seconds. If the connection cannot be established within this timeframe, the application continues uninstrumented. Defaults to 20 seconds now; was 60s until 5.5.

After the first round of copy-editing:
Optional: Specifies the initial wait timeout – the maximum time to wait for a connection to a dynaTrace Collector in seconds. If the connection is not established within this timeframe, the application continues without instrumentation. It defaults to 20 seconds.

Any reference to the change is gone. I have no idea what´s the problem with this text, but I know they edited out other back references (what Java version was needed for installation on *NIX until when) and they can´t tell a *NIX chmod mask (777) from a (European) area code (0777) and edited exactly the wrong version out... <grrrrrin>

Thanks for bringing this up! I will try to find all occurrences and revert them.
G.

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

How to get started

First steps in the forum
Read Community User Guide
Best practices of using forum

NAM 2019 SP5 is available


Check the RHEL support added in the latest NAM service pack.

Learn more

LIVE WEBINAR

"Performance Clinic - Monitoring as a Self Service with Dynatrace"


JANUARY 15, 3:00 PM GMT / 10:00 AM ET

Register here

Follow this Question

Answers Answers and Comments

3 People are following this question.

avatar image avatar image avatar image

Forum Tags

dotnet mobile monitoring load iis 6.5 kubernetes mainframe rest api dashboard framework 7.0 appmon 7 health monitoring adk log monitoring services auto-detection uem webserver test automation license web performance monitoring ios nam probe collector migration mq web services knowledge sharing reports window java hybris javascript appmon sensors good to know extensions search 6.3+ server documentation easytravel web dashboard kibana system profile purelytics docker splunk 6.1 process groups account 7.2 rest dynatrace saas spa guardian appmon administration production user actions postgresql upgrade oneagent measures security Dynatrace Managed transactionflow technologies diagnostics user session monitoring unique users continuous delivery sharing configuration alerting NGINX splitting business transaction client 6.3 installation database scheduler apache mobileapp RUM php dashlet azure purepath agent 7.1 appmonsaas messagebroker nodejs 6.2 android sensor performance warehouse
  • Forums
  • Public Forums
    • Community Connect
    • Dynatrace
      • Dynatrace Open Q&A
    • Application Monitoring & UEM
      • AppMon & UEM Open Q&A
    • Network Application Monitoring
      • NAM Open Q&A