We have a BT that filters on matching URI pattern value > 1ms. That BT's response time measure is used for two separate incidents:
1) Timeframe of 10s with aggregation set to max. Threshold set to 30,000ms = Warning
2) Timeframe of 1 hour with aggregation set to average. Threshold set to 30,000ms = Severe
These measures were added to the Incident Rule and then the threshold was edited with the values above. The issue is when the threshold is passed, both incidents trigger and send off emails.
Any ideas as to why the 1 hour timeframe set to average would trigger if just one data point crosses the threshold?
Thanks,
Tom
Answer by Thomas L. ·
Hi Andreas,
It seems like the team disabled the incidents but I went back to Monday for this data:
The 1 hour timeframe incident with an average aggregation and a threshold set to 15,000 ms was thrown at 14:58:50. I created a chart with the finest resolution I could and then exported to an excel document. I then took the average times from 13:58 - 14:58 and only got 10,384ms. I tried other combinations of 1 hour timeframes with the data I have but couldn't get the average response time over 15,000ms.
Tom
Answer by Andreas G. ·
Have you charted this measure on two charts - one using a 10s aggregation and one a 1h aggregation? Could it be that your measure delivers such a high value in case of a problem that even the Average per hour exceeds your 30s?
JANUARY 15, 3:00 PM GMT / 10:00 AM ET