I have this incident rule that should only be triggered after the condition exists for 30 minutes. For this I created a measure to produce the metric and configured the incident rule to evaluate the average value over a period of 30 minutes. Now it seems that the incident is triggered from the first minute the condition exists. What am I doing wrong?
Here is the measure:
This is the incident rule:
This rule is not triggered as espected after 30 minutes, but immediately:
The condition only existed for approx. 20 minutes, yet the incident was triggered.
I use the same measure for another incident rule, that needs to be fires immediately when the value goes over 5:
That one works fine.
Please advice.
Wim.
Answer by wim d. ·
Andreas, still not convinced:
I need 2 alarms :
-1 that fires warning when one thread is hung for more than 30 minutes
-1 that fires severe when 5 threads are hung for more than 1 minute
For that reason I created the measure with the 2 threshholds (upper severe and upper warning) , and 2 incident rules (WebSphere Concurrent Longrunning Hung Threads Detected with an evaluation period of 30 minutes and WebSphere Concurrent Hung Threads Detected with an evaluation period of 1 minute)
From what you tell me I guess I need 2 measures then?
Wim
Answer by wim d. ·
If I set it to severe, will it then be evaluated against 5 or against 1 ?
5 - because you have 5 in the upper severe threshold
If you specify "warning or severe" the condition is met if the actual value of the measure is >= the value specified in warning OR >= the value in severe. If you specify "severe" it will only trigger if the value is >= the value in severe
Andi
Answer by Andreas G. ·
Thats correct - and that is how it should work. However - in your case because you have specified "warning and severe" and your severe threshold is set to 1 and you will always have at least 1 thread (correct?) it will kind of trigger immediately as well. So - please try to change that setting to "severe". If this still doesnt behave as you think it should be I would open a support ticket. Maybe there is an issue as it is supposed to work as you explained.
Answer by wim d. ·
I would expect that the first incident is only triggered when the situation exists for at least 30 minutes (as I understand from the documentation, the situation is evaluated over the specifued time period, in this case 30 minutes). When the value is 1 for only 20 minutes, it should not trigger an incident (as the average would be < 1: 120 times1 and 60 times 0).
In the second example the time period is only 1 minute, so it will fire quickly.
Wim.
Answer by Andreas G. ·
The problem in the first case is that you have "warning or severe" in the treshold dropdown specified. Becuase you have a Warning Threshold of 1 the Incident will trigger if the Average Value is >= 1 - so - this incident triggers everytime
In the second example you selected "severe" threshold - thats why it works
JANUARY 15, 3:00 PM GMT / 10:00 AM ET