Answer by Nyna W. ·
Inappropriate alert settings can often lead to frustration with potential false positives triggering alerts. A false positive is defined when single downstream event (often network event) causes a test run to fail. In order to prevent false positives from triggering an alert the following best practices should be utilized...
Step 1: Baseline
Before production alerts are put in place the production tests should be run for a period of time (see below) to generate a baseline as to how the application (site, page, etc.) being tested behaves. The goal is to create a baseline which takes into account natural traffic patterns and conditions so that daily surges or weekly/monthly events are included in the baseline.
24-48 Hours = Best Guess
1 Week = Inadequate
2 Weeks = Minimal
3-4 Weeks = Ideal
Step 2: Calculate Standard Deviation
Once the baseline has been captured the standard deviation needs to be calculated. This can be done by generating a chart and changing the aggregation to standard deviation. Changing the time interval for the chart is an easy way to calculate the average from the standard deviation results. Once you have calculated the average standard deviation, use that value to set your alert thresholds.
Warning = baseline average + 1X standard deviation
Critical = baseline average + 2X standard deviation
Variations of the above recommendation can include calculating the percentage of the standard deviation in relation to the baseline average and using that value instead of absolute values.
NOTE: for less aggressive thresholds use maximum standard deviation in pace of average standard deviation.
Step 3: Configure Alerts
Using the above values set your response time thresholds. In addition to the above recommendations make sure to configure the alerts with the following settings...
Use dynamic thresholds to take into account any increased geographic response time that may be occurring.
Use node thresholds to only drive alerts if they are being see from multiple locations. A good rule of thumb is to use 33% if you are monitor from 4 nodes or more. If monitoring from less than 4 nodes use 50%.
Use consecutive errors to only drive alerts if multiple errors are seen in succession. Setting higher number of consecutive alerts can reduce false positives but may impact the timeliness of initial alerts.
Using the above best practices will reduce the number of false positives being seen and make Dynatrace Synthetic Alerts more actionable
maintenance window ending at midnight 5 Answers