Solved: Problem with restore cluster from backup

radek_jasinski · ‎11 Jul 2023

Hi,

I have a problem when trying to restore a cluster from a backup. I have machines that are identical in terms of software and hardware. I'm following the step by step instructions: https://www.dynatrace.com/support/help/shortlink/managed-cluster-restore#restore-from-backup

In step 3 I get an error when starting Casandra.

ERROR [main] 2023-07-11 11:13:53,823 UTC CassandraDaemon.java:803 - Exception encountered during startup

java.lang.RuntimeException: Unable to gossip with any peers

at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1603)

at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:628)

at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:888)

at org.apache.cassandra.service.StorageService.initServer(StorageService.java:745)

at org.apache.cassandra.service.StorageService.initServer(StorageService.java:694)

at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395)

at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633)

at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786)

INFO [StorageServiceShutdownHook] 2023-07-11 11:13:53,877 UTC HintsService.java:210 - Paused hints dispatch

WARN [StorageServiceShutdownHook] 2023-07-11 11:13:53,877 UTC Gossiper.java:1728 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown

I'm betting it's a network connection problem, but I need to confirm it.

Has anyone encountered this before?

Radek

Have a nice day!

islam_zidan · ‎11 Jul 2023

Hi Radek,

How many nodes you have? are they on the same subnet. if no, what is the latency between subnets.

BR,

Islam

Dynatrace Certified Professional - Dynatrace Partner - Yourcompass.ca

Radoslaw_Szulgo · ‎11 Jul 2023

I can confirm this is a network issue. The gossip - is the protocol used in Cassandra to negotiate the connection details between nodes. Make sure you've ensured proper network connection to the nodes you configured in step 2 - --cluster-nodes

Senior Product Manager,
Dynatrace Managed expert

radek_jasinski · ‎11 Jul 2023

Hi Radek,

Yes, I have verified all the requirements. For now, this environment has only one node because it's a test environment and we are testing the backup procedure before running it on the production environment.

Have a nice day!

Radoslaw_Szulgo · ‎11 Jul 2023

Hm.. so if that's only a single node... there should be no networking issue as there's no one to gossip with 😉

Senior Product Manager,
Dynatrace Managed expert

radek_jasinski · ‎11 Jul 2023

The development environment at the client looks like this:

1. The old DT server from which the backup was created
2. The network resource on which this backup is created (sub-mounted to two hosts).
3. The new DT server to which we make a restore from the backup.

Server 1 when starting the new one after the restore is done is disabled. Everything seems to be correct hence I'm surprised by this error. I will have another try tomorrow - let you know! 😊

Have a nice day!

radek_jasinski · ‎18 Jul 2023

Ok, the matter turned out to be simpler than I thought 😁
What was missing was specifying the --seed-ip variable when running the restore command.

btw. Casandra error message is highly specific in this case.

Radek

Have a nice day!