At 4:13 AM PDT on 10/28/2020, the log partition on one of our servers reached capacity. While at capacity, OpenVPN was unable to rewrite the OpenVPN-status.log file which is used by several internal tools (including acu-teleport), therefore preventing acu-teleport (and other tools) from functioning.
During this time, our users experienced delays performing certain actions and in some cases, the attempted task failed. We truly apologize for any inconvenience this has caused.
To prevent this issue from happening again, we have implemented some short-term and long-term remedies noted below:
Short-term: Space was cleared off the logging partition and OpenVPN was restarted in order to restore the contents of the openvpn-status.log file. Restarting OpenVPN itself did cause a real drop of connections, but it recovered quickly after startup (within 5 minutes).
Short-term: After OpenVPN was restarted, opbok was also restarted, which purged the queue of the remaining maintenance tasks. This made way for incoming provisioning tasks to start sooner.
Long-term: A new acu-validate was released into production that alleviates the frequent triggering of the large error message.
We will continue to monitor this incident and ensure that logs don’t unknowingly grow to a size where performance is affected. Once again, your patience and consideration during this incident were very much appreciated. Thank you!