Avigilon Alta status page has been migrated to the Avigilon status page, and will be turned off on March 1st 2024. Please visit the new status page here.

Provisioning Delays - Identified
Postmortem

At 4:13 AM PDT on 10/28/2020, the log partition on one of our servers reached capacity. While at capacity, OpenVPN was unable to rewrite the OpenVPN-status.log file which is used by several internal tools (including acu-teleport), therefore preventing acu-teleport (and other tools) from functioning.

During this time, our users experienced delays performing certain actions and in some cases, the attempted task failed. We truly apologize for any inconvenience this has caused.

To prevent this issue from happening again, we have implemented some short-term and long-term remedies noted below:

Short-term:  Space was cleared off the logging partition and OpenVPN was restarted in order to restore the contents of the openvpn-status.log file. Restarting OpenVPN itself did cause a real drop of connections, but it recovered quickly after startup (within 5 minutes).

Short-term: After OpenVPN was restarted, opbok was also restarted, which purged the queue of the remaining maintenance tasks. This made way for incoming provisioning tasks to start sooner. 

Long-term: A new acu-validate was released into production that alleviates the frequent triggering of the large error message. 

We will continue to monitor this incident and ensure that logs don’t unknowingly grow to a size where performance is affected. Once again, your patience and consideration during this incident were very much appreciated. Thank you!

Posted Oct 29, 2020 - 10:41 EDT

Resolved
This incident has been resolved. Please re-attempt any previously failed actions such as provisioning, setting lockdown rules, etc. We will post a post-mortem shortly.
Posted Oct 28, 2020 - 16:07 EDT
Investigating
Our data processing infrastructure is running behind which is causing provisioning delays. During this time, you may experience issues with provisioning such as attempts not going through successfully or taking a long time to process.

Our Engineering Team is actively working on identifying the root cause so they can implement a fix. We will continue to post an update as soon as they come.
Posted Oct 28, 2020 - 13:32 EDT
This incident affected: United States (API, Control Center).