Hello,
My client is running openstack, an open source cloud computing platform. We have all of our collectors attached to a load balancer, and the collector/agent communication happens through this load balancer. The client is using only Host agents (not the z/os kind, the other basic kind).
Our goal is to create a dashboard that illustrates the up/down status of an agent. We are trying to use a standard URL monitor to do this, because the measures "agent availability" and "agent connected" are not available to us.
The problem we run into is, some collectors are not able to see some agents. We have no control over which collector is speaking to which agent, and we also have situations where an agent will relay information to various different collectors over the course of its uptime due to the load balancer.
The URL monitor tells specific collectors what to run, so of course some collectors will return "agent not connected" when they cant reach the agent, even if the agent is connected to the system itself.
What can we do instead to get this status output? Availability Monitor Plugin (judging from comments) seems to be broken with 6.1. Generic Execution plugin comes to mind, but I'm unclear how it works (does it execute on the agent? or on the collector? and how is this command sent to the agent?). Generic execution will also require us to create our own "script", a simple one liner, that can return the status we are looking for in a chartable format (ie an int/bool/double instead of a string).
Any ideas? Also, why is "agent available" and "agent connected" not available for host agents?
Answer by Chris G. ·
The host agent installed on all our Dev/Test servers listens on port 9998 (default). we specified that port for a Generic Execution test for an SSH connection and got a positive response from all servers. Unfortunately we found that it was only a positive result from the collector to the agent, that the command was run, not that a response was returned..
Answer by Rick B. ·
The agents do not listen on a port. In communication with the collector, the use ephemeral, or client, ports as with any other client/server interaction. You will probably have to describe what you have set up and what is not working in more specific detail
Rick B
Answer by Sreerag M. ·
One thing i can tell is that the number at the end of agent name is the application process ID, not the port agent is listening.
Also agent phones into collector and FW port need to be opened only from agent to collector. Once communication is established the collector can communicate with agent in any available high port.
-Sreerag
Answer by Shrimant S. ·
The collectors are attached to a VIP (load balancer). The collector listens on port 9998 for data from the agent. Meanwhile, the agent needs to also listen on some port for communication from the collector. This connection is held open and assigned on a round robin basis from agent to collector. Because of this, however, certain collectors cannot communicate to certain agents (due to firewall) even though all agents can communicate with any collector.
What would fix this issue for us is being able to manually specify the port that the agent is listening on for collector traffic. If we can do this, we can ask security to open the firewall access rules to allow this communication, and any collector should be able to connect to any agent on the specified port.
I think right now the load balancer is acting in a one way capacity while a firewall prevents traffic from going the other way.
In agents overview, we see agents listed in the format AgentName @ AgentHost : port.
Can we manually edit that port and make it the same throughout the installation?
Answer by Shrimant S. ·
The reason we were avoiding using the infrastructure overview is that the agent could be down while the host itself could be up and functioning normally. This would give us a false positive
Trying to understand your initial post... Your monitors aren't working because sometimes the collector can't reach the servers due to firewall, is this correct? Why not deploy a collector to the proper network zone and use just that collector for your monitors? Sorry if I misunderstand.
Rick B
Answer by Andreas G. ·
Either the GEneric Execution plugin or maybe TCP Port Monitor Plugin. Plugins are executed on Collectors though. So - make sure that these collectors have access to these hosts. Based on your descriptions you have certain hosts that might not be available from certain collectors. What if you define custom Host Groups, e.g: "Hosts reachable from Collector1", "Host Reachable from Collector2", ... -> then you setup the generic execution or TCP monitor on your different collectors. The list of hosts to execute this plugin against is than your special host group for that collector.
This makes Administration rather easy as the only thing you need to do is to make sure your Hosts are assigned to the right host group
Hope this makes sense
Answer by Chris G. ·
Infrastructure Overview is what will be used for the Operations group as a consolidated monitoring window, and agent up/down will work for now. But what is needed is a host up/down status, and the agent being down doesn't neccessarily mean the host is down. We need something that will not be handled directly by a specific collector. Generic Execution plug-in?
Answer by Andreas G. ·
Hi
Have you looked at the Infrastructure Overview in dynatrace? It shows you all your machines that report data through your host agents. If a host agent is no longer sending data, e.G: is offline - then the Infrastrucutre View will show that agent as offline. We also have Incidents that trigger when an agent unexpectetly goes offline.
Wouldnt that solve your problem?
JANUARY 15, 3:00 PM GMT / 10:00 AM ET