question

Chris N. avatar image
Chris N. asked ·

How does Alerting, Notifications, and Audit logs work in a managed HA cluster

Hello:

We recently upgraded from a single node cluster to a 3 node HA Clusters

(Node ID 1 Original) (Node ID 6 New) (Node ID 7 New)


First, when a problem is detected, how do the other nodes know not to raise an alert on the same issue?

Is there a way to determine which node detects a problem from from the GUI information?

Is the information copied into all 3 nodes? If so what log is used?


Secondly, we have a lot of notifications that use external API web hooks.

I see information about the notification attempts int he audit.notifications.0.0.log file

How is it determined which node will send out the alert to the source?

Is the notification attempt kept only on that single node sending the alert, or is it copied somewhere onto all 3 servers?

I want to thank the community for taking time to look at my post.


Kindly,

Chris




Dynatrace Managedalertingproblem detectionnotifications
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

Radoslaw S. avatar image
Radoslaw S. answered ·

Cluster nodes continuously exchange data with each other to get awareness of what was seen and executed. Also Cluster nodes utilize distributed data storage engines - Cassandra and Elasticsearch, so data is replicated and available to all nodes.

There's no need to point what node detect the problem, as it is transparent to the user. If 1 node goes down, the other will take the responsibility of running the correlation of events to raise a problem and trigger notifications. It's the internal business logic embed to the cluster node.

Notification attempts are stored in Cassandra and replicated.

Do you have any additional questions?


2 comments Share
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

Thank you for that answer!


I would like to know where (and perhaps how) I would check to see if an attempt to notify has went out.


I do understand the transparency to the users. However, if an alert is suppose to go out and does not, I need to know the best way to trace and validate the attempt.

I was working with support and they suggested looking at the audit.notifications.0.0.log on each Cluster memeber.

When i did this.. I saw messages related to attempts for specific problems. I did not however see the same messages replicated across the 3 logs.

That being said, How would i determine what server to seach without having to look through all 3 logs?

Would it be necessary to query the distributed databases from outside the application itself?

Is this supported or a common practice?


Thank you again for your input. :)


0 Likes 0 · ·

If you really need, you have to browse logs on all nodes. Or use some log processing - e.g. Dynatrace, Splunk, Sumologic, or other..

However the question still remains... why would you need to bother with that?

0 Likes 0 · ·
Chris N. avatar image
Chris N. answered ·

I would need to bother with this because we have had situations where alerts show up in the problem window but don't reach their destination.

I can for instance in a log see a web-hook attempt and be given the information.whether is succeeded or failed.. here is an example from 1 log.

So the ability to trace it from a centralized query or location would be more efficient then going through 3 or more logs in an HA Cluster environment (1 per node)




1594645456912.png (406.2 KiB)
1 comment Share
10 |2000000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 50.0 MiB each and 250.0 MiB total.

I understand. Currently, there's no better way. Feel free to post a product idea! :)

0 Likes 0 · ·