Load on one of our Outscale K8S cluster node
Incident Report for Elium
Resolved
We performed several tests (including the deployment of a new version of the Elium services) to validate that the new node is stable.
Posted Jul 08, 2021 - 18:16 CEST
Monitoring
We have created a new node using different hardware specifications (CPU type). After several tests, we found that the abnormal load problem no longer occurs on this type of machine. We continue to monitor the behaviour of this node. At the same time, we are reporting our findings to 3DS Outscale support in order to validate that the problem comes from the type of machine used for this node.
Posted Jul 08, 2021 - 14:11 CEST
Update
We still testing different configurations for the faulty node (different kernel version, create another node).
Posted Jul 08, 2021 - 11:34 CEST
Update
We completely recreated the node and redeployed the services. The load continues to increase abnormally and this impacts the customer instances. We have therefore, once again, disabled the services on this node.
Posted Jul 08, 2021 - 10:46 CEST
Update
We are trying to solve the node load problem. This creates slowness on the instances of clients hosted on our private hosting (Outscale) when the services restart on the node.
Posted Jul 08, 2021 - 10:27 CEST
Identified
During rolling updates, restarting containers on the node produces timeouts
Posted Jul 07, 2021 - 22:41 CEST
Update
Restarting the node solved the load problem. We are still checking why this load occurred. Currently, the services are working properly again.
Posted Jul 07, 2021 - 19:22 CEST
Investigating
We have detected an abnormal load on one of the nodes of our Outscale kubernetes cluster. We had to restart it.
Posted Jul 07, 2021 - 19:20 CEST
This incident affected: Private Hosting.