Last Friday, 25 August, a routing incident caused large-scale internet disruption. It hit Japanese users the hardest, slowing or blocking access to websites and online services for dozens of Japanese companies.
What happened is that Google accidentally leaked BGP prefixes it learned from peering relationships, essentially becoming a transit provider instead of simply exchanging traffic between two networks and their customers. This also exposed some internal traffic engineering that caused many of these prefixes to get de-aggregated and therefore raised their probability of getting accepted elsewhere.
The incident technically lasted less than ten minutes, but spread quickly around the Internet and caused some damage. Connectivity was restored, but persistently slow connection speeds affected industries like finance, transportation, and online gaming for several hours. Google apologized for the trouble, saying it was caused by an errant network setting that was corrected within eight minutes of its discovery.
This incident showed, again, how fragile the global routing system still is against configuration mistakes, to say nothing about malicious attacks.
What it also showed is a lack of defense – the incident propagated seemingly without any attempt from other networks to stop it.
The Internet Society works to address security in many ways, including the Mutually Agreed Norms for Routing Security (MANRS), which encourages operators to build these defense lines. If one line fails, incorrect announcements will be stopped at the next one. Every operator should be offering such defense – not only for its own security and stability of operations, but for the stability and security of the global communication fabric.
Implementing the four MANRS actions leads to better protection against traffic anomalies caused by misconfigurations; cleaner setups resulting in easier troubleshooting and lower time-to-resolution (TTR); improved peering conditions; and opportunities for valuable collaboration with other operators through a discussion forum and professional network. But, MANRS has a limited scope. It is an absolute minimum that an operator should consider. Its requirements focus on elementary pieces of network topology, such as customer-provider relationships, preventing spoofed traffic for single-homed stub customers.
To be fair, MANRS would not have helped specifically in this case. But if Verizon were a MANRS member, perhaps that would have improved its security posture in more complex situations. If Google were a MANRS member, it could have communicated to its peers more clearly what announcements they should expect.
The more operators implement MANRS, the fewer incidents we will see, and the smaller will be their scope. MANRS is not a one stop solution to all of the internet’s routing woes, but it is an important step in the right direction toward a globally robust and secure routing infrastructure.
As to the leaks, there is currently some work happening in the IETF. The IDR working group has two proposals addressing the route leak problem: “Methods for Detection and Mitigation of BGP Route Leaks”, and “Route Leak Prevention using Roles in Update and Open messages”.