How to restore an ETCD pod after a node returned to the cluster
Challenge
An Etcd pod on the restarted instance is stuck in “Pending” state, and the Etcd cluster can't be rebuilt.
Cause
If Acura is deployed as a HA cluster, it stays operable if one of the nodes goes down (gets turned off or destroyed). However, after returning the lost node back to the cluster, the clustered Etcd needs several manual actions in order to rebuild itself.
Solution
Some steps are required to return the cluster from the emergency mode back to normal:
1. Establish an SSH connection to the restored node and clean etcd data:
# sudo rm -rf /acura/etcd/*
2. Remove persistent volume claim from the PV on the node:
# kubectl patch pv $(kubectl get pv | grep etcd | awk '/Released/ {print $1}') --type merge --patch '{"spec": {"claimRef": null}}'
After that, the Etcd pod should start and join the cluster.