Facebook blames outage on error during routine maintenance

London, October 6 (BNA) The company said that the global outage that caused Facebook and its other platforms to stop working for hours was caused by an error during routine maintenance.

Santosh Janardan, Vice President of Infrastructure at Facebook, said in a blog post that the emergence of Facebook, Instagram and WhatsApp was not due to malicious activity, but rather a mistake of our own making.

The problem occurred when engineers were doing day-to-day work on Facebook’s global backbone network. The computers, routers, and software in its data centers around the world along with the fiber-optic cables that connect them, the AP reports.

Janardan said on Tuesday: “During one of these routine maintenance jobs, an order was issued with the aim of assessing the availability of global backbone capacity, which inadvertently cut off all connections in our core network, effectively separating Facebook data centers globally “.

Janardan said Facebook’s systems are designed to catch such errors, but in this case a glitch in the audit tool prevented it from stopping the matter properly.

This change also triggered a second problem that made things worse by making it impossible to access Facebook’s servers even though they were working.

Janardan said engineers were quick to fix the problem on site, but that this took time due to the extra layers of security.

Data centers are hard to reach, and once you’re inside, devices and routers are designed to be difficult to modify even when you have physical access to them.

Once connectivity was restored, services were gradually restored to avoid increases in traffic that could cause further disruptions.

READ MORE  IMF approves $43 million to Congo

He said it was an “unexpected anomaly” of a faulty maintenance update to scrap Facebook’s backbone network, but that the company might have avoided a scenario in which its servers were completely disconnected, making it impossible to access the tools needed to fix it, Angelique Medina, of ThousandEyes, Inc. Cisco Systems, a company that monitors internet outages.

“The big question is why so many internal tools and systems can have a single source of failure,” Medina said. “Facebook was still down due to a network outage, but they could have resolved the outage sooner if they had internal access.”

insult

Source link

Leave a Comment