I can't find any heuristics in the documentation, so maybe DT engineering has done some stress/performance testing in this area.
We are moving heavily into complimenting our agent-base with Monitors, for URL monitoring, app port monitoring, infrastructure monitoring for servers not covered by agents, etc. What I'm trying to learn is more about how Monitors work with their collectors and how many polls can be done simultaneously.
To provide some context, there's probably about 500-1000 servers we could be pinging for general server metrics. This would likely be driven by the same monitor in a single profile. I estimate that if we went all the way, about 25000 processes that could be eligible for URL or process monitoring, but spread out across 50-80 different profiles.
Currently, to keep sanity, we have one collector that is providing this functionality, but it is only monitoring a couple of hundred servers, and currently only about 30 processes.
Any idea on how many simultaneous monitors can be run from a single Monitoring Collector, assuming say 10 measures/host/monitor?
Answer by James M. ·
Thanks – that's what I was afraid of. The challenge then is configuring each monitor to use the right collector, hence our RFE to do more with collector clusters, where dynatrace automatically partitions and schedules the activities on a collector farm. Our infrastructure monitor alone today has several hundred servers in it, and increasing significantly next year.
These collectors only do monitoring, no agents can connect to them. We'll work on this; thanks!
Answer by Derek A. ·
DynaTrace Collectors support only 100 threads running concurrently (Compuware has this changed?). If the monitor (or I should say Collector) has anything else running at that time too, then they will get queued up and processed when a thread is available again. Depending on how aggressive your monitoring will be you may have some servers miss their data collection if it’s trying to process hundreds & hundreds of servers every minute. I believe 100 is the max, at least it was back in 3.5.2, but I don’t think it’s changed since.
We are also doing a lot of agent-less monitoring for a large amount of URLs, perfmon on a couple thousand servers, some windows service and process monitoring, etc. To get around the 100 thread limit, we just spin up another Collector instance on the server itself. So instead of having say 20 different Collectors on their own server, you could have 2 servers with 10 instances each or 5 servers with 4 instances each or however you want to slice it up depending on your resources. There are some self-monitoring measures for threads you can chart to give you an idea of what's going on for the Collector. Hope this helps.
JANUARY 15, 3:00 PM GMT / 10:00 AM ET