Facebook and its platforms, including WhatsApp, Instagram and Messenger, went down for about six hours on Monday evening, with all the memes and panic that comes with it. The problem seems to be with the border gateway protocol.
On Monday evening, a large part of global social media was gone in one fell swoop. The empire of Facebook, including the social network itself, but also Messenger, Instagram, WhatsApp, Workplace, and even VR service Oculus, had disappeared from the network for hours.
And that disappearance can be taken literally because although the sites themselves were not deleted, no device could find the servers to them. According to Facebook’s official statement, the problem was “a configuration of the underlying servers that coordinate network traffic.” From there, the problems would have spread to the data centres’ communication, and the internal tools also became unusable as a result. Admittedly, that explanation is fairly vague, but it strengthens the suspicion that this appears to be a failed update of the border gateway protocol (BGP).
The BGP is a primal protocol of the internet that must ensure that users and devices are redirected to the correct location of, for example, their chat messages or the status post of their uncle. It works a bit like the domain name system (DNS), which directs devices to the correct IP address but at a higher level. In a nutshell, it tells the world how it can surf to the network of a particular provider or, in this case, a tech giant.
Because Facebook is not part of the local Telenet, the company has its own domain registry and DNS servers and uses its own routing prefix for its own network. So anyone who opens a Facebook app should be directed to the Facebook network. However, with an update Monday evening, that signage would have been deleted so that no device could find the Facebook network. It immediately explains why so many sites and apps went down at once. Cloudflare, itself specialized in managing web traffic for sites, reports in a blog post that an update removed all BGP routes to Facebook, making Facebook’s own DNS servers inaccessible.
The fact that it took so long to recover seems to be the result of those ‘internal tools’ becoming inaccessible. Reports from New York Times reporter Sheera Frenkel, among others, report that Facebook employees could not log in to their own work servers or even enter the physical buildings with their badges because all systems run through their own servers … which were not found became. So it seems that those who had to fix the mistake were more or less locked out of their own infrastructure.
Currently, there is no indication of malicious intent. This kind of error also has one advantage: for once, no (extra) data has been leaked since it seems that only the routing servers, and not the data servers, were damaged.