Atlassian has finally revealed the exact cause of an ongoing cloud services outage the company estimates could impact some of its customers for up to two more weeks.
When we first reported on this outage, Atlassian told us that a routine maintenance script blocked some customers’ access to their data after “unintentionally” disabled the sites of roughly 400 out of its over 200,000 customers.
Maintenance script wipes hundreds of customer sites
Atlassian’s Chief Technology Officer Sri Viswanath shared how hundreds of customer sites were deleted on April 5th accidentally, triggering a weeks-long incident the company is still working to address.
As he explained, the outage resulted from communication issues between two Atlassian teams who were working on deactivating the standalone legacy “Insight – Asset Management” app used by Jira Service Management and Jira Software on all customer sites.
Instead of being provided the ID needed to deactivate the app, the deactivation team received the IDs for the cloud sites where the app was installed.
Additionally, the maintenance script they used to disable the app was launched using the wrong execution mode (i.e., permanent deletion of data instead of deletion with recoverability failsafe).
“The script was executed with the wrong execution mode and the wrong list of IDs. The result was that sites for approximately 400 customers were improperly deleted,” Viswanath explained.
Almost 50% of deleted sites restored
The Atlassian status page was updated on Thursday to say that its engineers have restored functionality for 49% of users affected by the outage in a batch-based approach.
Atlassian initially estimated that the restoration efforts would not take more than several days, confirming to BleepingComputer that this outage was not the result of a cyberattack.
However, earlier this week, the company revealed in emails sent to affected customers that restoring the sites of all impacted users will likely take up to two more weeks.
“We are restoring affected customers identified by a mix of multiple variables including site size, complexity, edition, tenure, and several other factors in groups of up to 60 at a time,” the company said.
“The full restoration process involves our engineering teams, our customer support teams, and our customer.”
This outage comes after Atlassian announced in October 2020 that it will no longer sell licenses for on-premises products starting February 2021, with support for already active licenses to be discontinued three years later, on February 2nd, 2024.