If you follow any form of tech news, you may have seen that on March 10th past midnight, a fire broke out inside a France-based datacenter of one of the most globally recognised cloud hosting providers OVH.
After OVH staff were called onsite to assess the situation, they left the building after deeming it too dangerous to enter. Firefighters were on the scene immediately but despite their efforts to control the blaze and deploying a water pumping boat, the entirety of one of the main buildings called SBG2 was destroyed including the servers and the data they held. Some servers in a close building, SBG1 were also damaged/destroyed.
It is from my understanding that a majority of the customers affected were ones using dedicated servers and VPSes (Virtual Private Servers), including some enterprise organisations such as the French Government and Rust, a popular online game. Shared web hosting customers may have been affected, since power was shut off to the entire location, but it is unclear if any data has been lost by those customers. Typically, shared web hosting companies should be keeping offsite backups, it is normally only dedicated server (also called bare-metal) and VPS customers who are in charge of making sure they have a good backup solution.
I put out a post about why backups are important and I will likely put out another one about the different types of backups and servers soon.
So what went wrong and what can we learn?
After watching a video interview by the CEO of OVH, Octave Klaba, it is suspected that the fire could have possibly been caused by one of their UPSes (Uninterruptible Power Supplies). UPSes are a temporary power source used to provide continuous power to a building if the normal grid power is disrupted or cut. They can come in different types and sizes, commonly diesel generators or big batteries. It is not clear which type OVH was using, but firefighters apparently used a thermal imaging camera, immediately after arriving onsite, to try to identify the source. 2 of the UPSes, UPS7 and UPS8 were identified as being on fire. Furthermore, Klaba explained that the datacenter had some recent maintenance work performed, specifically to the UPSes the day before the fire. OVH’s status page confirms this, which leads to the suspicion of this being the culprit.
While OVH’s actions at assessing the situation, informing the customers affected and trying to provide solutions as quickly as possible were good, could this have been easily avoided? I personally am wondering why this datacenter did not have a proper fire suppression system, specifically a non-toxic gas one. Systems like that are designed to detect early signs of a fire, trigger a series of alarms to alert staff to leave the building and then release a gas-like substance to extinguish the fire, while avoiding damage to the other servers and equipment. Secondly, UPSes are very sensitive pieces of equipment, possibly more than servers, especially the battery-type. You may have seen how a smartphone battery can react if damaged or overheated i.e. the Samsung Galaxy Note 7 incident. Is it possible that the recent maintenance work to their UPS was carried out without due care? Also, if you search on Google “UPS room”, you’ll see that these type of equipment rooms are using housed in the same building as the equipment they are powering, such as servers and networking infrastructure. While I’m not a datacenter designer (it does sound fun though), wouldn’t it make sense to keep different equipment in your datacenter separate from each other as much as possible? Apparently some of the networking rooms were ok, so why have these sensitive pieces of equipment been placed so close to the ones they are responsible for?
I’m glad that this incident has not affected the safety of anyone’s lives too badly and I hope that those in the industry have learned from this mistake.
I enjoy blogging about recent tech news and will continue to keep it up, in addition to my regular rantings.
Thanks for reading!