• Forums
    • Public Forums
      • Community Connect
      • Dynatrace
        • Dynatrace Open Q&A
      • Application Monitoring & UEM
        • AppMon & UEM Open Q&A
      • Network Application Monitoring
        • NAM Open Q&A
  • Home
  • Public Forums
  • Application Monitoring & UEM
  • AppMon & UEM Open Q&A
avatar image
Question by Shrimant S. · Mar 25, 2015 at 10:33 AM ·

Agent Up/Down Status in a Load Balanced Collector Cloud Environment

Hello,

 

My client is running openstack, an open source cloud computing platform. We have all of our collectors attached to a load balancer, and the collector/agent communication happens through this load balancer. The client is using only Host agents (not the z/os kind, the other basic kind).

Our goal is to create a dashboard that illustrates the up/down status of an agent. We are trying to use a standard URL monitor to do this, because the measures "agent availability" and "agent connected" are not available to us.

 

The problem we run into is, some collectors are not able to see some agents. We have no control over which collector is speaking to which agent, and we also have situations where an agent will relay information to various different collectors over the course of its uptime due to the load balancer.

 

The URL monitor tells specific collectors what to run, so of course some collectors will return "agent not connected" when they cant reach the agent, even if the agent is connected to the system itself.

 

What can we do instead to get this status output? Availability Monitor Plugin (judging from comments) seems to be broken with 6.1. Generic Execution plugin comes to mind, but I'm unclear how it works (does it execute on the agent? or on the collector? and how is this command sent to the agent?). Generic execution will also require us to create our own "script", a simple one liner, that can return the status we are looking for in a chartable format (ie an int/bool/double instead of a string).

 

Any ideas? Also, why is "agent available" and "agent connected" not available for host agents?

Comment

People who like this

0 Show 0
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

8 Replies

  • Sort: 
  • Most voted
  • Newest
  • Oldest
avatar image

Answer by Chris G. · Apr 01, 2015 at 03:44 AM

 

The host agent installed on all our Dev/Test servers listens on port 9998 (default). we specified that port for a Generic Execution test for an SSH connection and got a positive response from all servers.  Unfortunately we found that it was only a positive result from the collector to the agent, that the command was run, not that a response was returned..

Comment

People who like this

0 Show 1 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image Rick B. · Apr 01, 2015 at 04:03 AM 0
Share

As I stated above there is no listener port on these agents.  The port specified in the host agent .ini file is what port to attempt to connect to the collector on.

avatar image

Answer by Rick B. · Mar 31, 2015 at 12:51 AM

The agents do not listen on a port.  In communication with the collector, the use ephemeral, or client, ports as with any other client/server interaction.  You will probably have to describe what you have set up and what is not working in more specific detail

 

Rick B

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Sreerag M. · Mar 29, 2015 at 03:05 AM

One thing i can tell is that the number at the end of agent name is the application process ID, not the port agent is listening.

Also agent phones into collector and FW port need to be opened only from agent to collector. Once communication is established the collector can communicate with agent in any available high port. 

-Sreerag

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Shrimant S. · Mar 28, 2015 at 02:41 AM

The collectors are attached to a VIP (load balancer). The collector listens on port 9998 for data from the agent. Meanwhile, the agent needs to also listen on some port for communication from the collector. This connection is held open and assigned on a round robin basis from agent to collector. Because of this, however, certain collectors cannot communicate to certain agents (due to firewall) even though all agents can communicate with any collector.

What would fix this issue for us is being able to manually specify the port that the agent is listening on for collector traffic. If we can do this, we can ask security to open the firewall access rules to allow this communication, and any collector should be able to connect to any agent on the specified port.

I think right now the load balancer is acting in a one way capacity while a firewall prevents traffic from going the other way.

 

In agents overview, we see agents listed in the format AgentName @ AgentHost : port.

 

Can we manually edit that port and make it the same throughout the installation?

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Shrimant S. · Mar 28, 2015 at 02:11 AM

The reason we were avoiding using the infrastructure overview is that the agent could be down while the host itself could be up and functioning normally. This would give us a false positive

Comment

People who like this

0 Show 1 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image Rick B. · Mar 28, 2015 at 02:16 AM 0
Share

Trying to understand your initial post... Your monitors aren't working because sometimes the collector can't reach the servers due to firewall, is this correct?  Why not deploy a collector to the proper network zone and use just that collector for your monitors?  Sorry if I misunderstand.

Rick B

 

avatar image

Answer by Andreas G. · Mar 28, 2015 at 02:02 AM

Either the GEneric Execution plugin or maybe TCP Port Monitor Plugin. Plugins are executed on Collectors though. So - make sure that these collectors have access to these hosts. Based on your descriptions you have certain hosts that might not be available from certain collectors. What if you define custom Host Groups, e.g: "Hosts reachable from Collector1", "Host Reachable from Collector2", ... -> then you setup the generic execution or TCP monitor on your different collectors. The list of hosts to execute this plugin against is than your special host group for that collector.

This makes Administration rather easy as the only thing you need to do is to make sure your Hosts are assigned to the right host group

Hope this makes sense

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Chris G. · Mar 28, 2015 at 01:57 AM

 

Infrastructure Overview is what will be used for the Operations group as a consolidated monitoring window, and agent up/down will work for now. But what is needed is a host up/down status, and the agent being down doesn't neccessarily mean the host is down.  We need something that will not be handled directly by a specific collector.  Generic Execution plug-in?

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

avatar image

Answer by Andreas G. · Mar 28, 2015 at 12:55 AM

Hi

Have you looked at the Infrastructure Overview in dynatrace? It shows you all your machines that report data through your host agents. If a host agent is no longer sending data, e.G: is offline - then the Infrastrucutre View will show that agent as offline. We also have Incidents that trigger when an agent unexpectetly goes offline.

Wouldnt that solve your problem?

Comment

People who like this

0 Show 0 · Share
10 |2000000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Toggle Comment visibility. Current Visibility: Viewable by all users

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

How to get started

First steps in the forum
Read Community User Guide
Best practices of using forum

NAM 2019 SP5 is available


Check the RHEL support added in the latest NAM service pack.

Learn more

LIVE WEBINAR

"Performance Clinic - Monitoring as a Self Service with Dynatrace"


JANUARY 15, 3:00 PM GMT / 10:00 AM ET

Register here

Follow this Question

Answers Answers and Comments

3 People are following this question.

avatar image avatar image avatar image

Forum Tags

dotnet mobile monitoring load iis 6.5 kubernetes mainframe rest api dashboard framework 7.0 appmon 7 health monitoring adk log monitoring services auto-detection uem webserver test automation license web performance monitoring ios nam probe collector migration mq web services knowledge sharing reports window java hybris javascript appmon sensors good to know extensions search 6.3+ server documentation easytravel web dashboard kibana system profile purelytics docker splunk 6.1 process groups account 7.2 rest dynatrace saas spa guardian appmon administration production user actions postgresql upgrade oneagent measures security Dynatrace Managed transactionflow technologies diagnostics user session monitoring unique users continuous delivery sharing configuration alerting NGINX splitting business transaction client 6.3 installation database scheduler apache mobileapp RUM php dashlet azure purepath agent 7.1 appmonsaas messagebroker nodejs 6.2 android sensor performance warehouse
  • Forums
  • Public Forums
    • Community Connect
    • Dynatrace
      • Dynatrace Open Q&A
    • Application Monitoring & UEM
      • AppMon & UEM Open Q&A
    • Network Application Monitoring
      • NAM Open Q&A