Solved: How does Alerting, Notifications, and Audit logs work in a managed HA cluster

runatyr · ‎13 Jul 2020

Hello:

We recently upgraded from a single node cluster to a 3 node HA Clusters

(Node ID 1 Original) (Node ID 6 New) (Node ID 7 New)

First, when a problem is detected, how do the other nodes know not to raise an alert on the same issue?

Is there a way to determine which node detects a problem from from the GUI information?

Is the information copied into all 3 nodes? If so what log is used?

Secondly, we have a lot of notifications that use external API web hooks.

I see information about the notification attempts int he audit.notifications.0.0.log file

How is it determined which node will send out the alert to the source?

Is the notification attempt kept only on that single node sending the alert, or is it copied somewhere onto all 3 servers?

I want to thank the community for taking time to look at my post.

Kindly,

Chris

Radoslaw_Szulgo · ‎13 Jul 2020

Cluster nodes continuously exchange data with each other to get awareness of what was seen and executed. Also Cluster nodes utilize distributed data storage engines - Cassandra and Elasticsearch, so data is replicated and available to all nodes.

There's no need to point what node detect the problem, as it is transparent to the user. If 1 node goes down, the other will take the responsibility of running the correlation of events to raise a problem and trigger notifications. It's the internal business logic embed to the cluster node.

Notification attempts are stored in Cassandra and replicated.

Do you have any additional questions?

Senior Product Manager,
Dynatrace Managed expert

runatyr · ‎13 Jul 2020

Thank you for that answer!

I would like to know where (and perhaps how) I would check to see if an attempt to notify has went out.

I do understand the transparency to the users. However, if an alert is suppose to go out and does not, I need to know the best way to trace and validate the attempt.

I was working with support and they suggested looking at the audit.notifications.0.0.log on each Cluster memeber.

When i did this.. I saw messages related to attempts for specific problems. I did not however see the same messages replicated across the 3 logs.

That being said, How would i determine what server to seach without having to look through all 3 logs?

Would it be necessary to query the distributed databases from outside the application itself?

Is this supported or a common practice?

Thank you again for your input. 🙂

Radoslaw_Szulgo · ‎13 Jul 2020

If you really need, you have to browse logs on all nodes. Or use some log processing - e.g. Dynatrace, Splunk, Sumologic, or other..

However the question still remains... why would you need to bother with that?

Senior Product Manager,
Dynatrace Managed expert

runatyr · ‎13 Jul 2020

I would need to bother with this because we have had situations where alerts show up in the problem window but don't reach their destination.

I can for instance in a log see a web-hook attempt and be given the information.whether is succeeded or failed.. here is an example from 1 log.

So the ability to trace it from a centralized query or location would be more efficient then going through 3 or more logs in an HA Cluster environment (1 per node)

Radoslaw_Szulgo · ‎13 Jul 2020

I understand. Currently, there's no better way. Feel free to post a product idea! 🙂

Senior Product Manager,
Dynatrace Managed expert