cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

VM availability Test on Custom Device

Debariki
Frequent Guest

Hi Team,

Dynatrace alerts were triggered for the few servers of a custom Device. Upon investigating we got to know that these alerts are false alerts, and the device is actually up and running during the issue time.

I just want to know the cause of these false alerts and how can we reduce the priority of these alerts from P2 to P4?

 

I am sharing you the problem notes of the false alerts raised for one of the server:

 "ImpactedEntity":"VM availability Test on Custom Device 172.20.86.69", "Tags":"Application-SE_FMC, systemservice-firewall", "ProblemSeverity":"AVAILABILITY", "ProblemDetails":"OPEN Problem P-24021767 in environment REBUS-PRODUCTIONnProblem detected at: 02:24 (UTC) 29.02.2024nn1 impacted infrastructure componentnnCustom Devicen172.20.86.69nnVM availability Testn172.20.86.69 is in down state. 

Thanks,
Bharathi

1 REPLY 1

ChadTurner
DynaMight Legend
DynaMight Legend

Issues like this are bet to be fully understood. The alert you stated is availability. What this out of the box availability or a custom metric? Reason being is that Custom metrics have a segment to alert on missing data.... which could cause the issue. Timing is also important in conjunction with networking. If you have a network anomaly where every 5 min, connectivity drops out for the OneAgent, the agent will continue to collect metric data because it indeed is running but communication out to the AG or Cluster is missing, then once the communication re-establishes, in the next 5 min - 1 hour, the agent will dump in all the metrics it collected while it couldn't communicate. 

Network issues can be hard to track down to having the fundamental understanding of what alerted, the duration of missing data etc... will help you pin point why something was down, when it was actually up. 

I once saw a security agent turning off the Oneagent which caused these availability alerts and we had to track them live to better understand what turned on just before the OneAgent reported connectivity issues, the duration etc.. and the repopulation of the data. 

-Chad

Featured Posts