Last year in November we launched, in partnership with BGPmon, a new project called Routing Resilience Survey. As I wrote in the blog post “New Internet Society Project Aims to Learn: Is Your Internet Routing Secure and Resilient?”, this effort is based on collecting incident data related to routing resilience from an operator’s point of view. This approach allows us not only to filter out false positives – for instance, legitimate configuration changes – but also to record the impact and the severity of the incident.
Since the beginning of the project four months ago, 25 participants representing more than 300 ASNs joined the effort – providers often take care of their customers’ routing problems and wanted to include those problems in the survey. In total we asked folks to classify almost 500 events, and more than half of them have indeed been classified! And while we hope to collect more data to get a more statistically representative picture of these incidents and their impacts, there is already something to look at.
So I couldn’t resist the temptation to show you some graphs from this presentation of initial results (full presentation embedded below and available on SlideShare).
Routing Resilience: Impact Severity
First of all, let’s look at the number of registered incidents and their impact [slide 9]. Do not be surprised that the timeline starts back in 2011 – for each participant we presented a “historical” overview of significant events, and asked them to classify each. Green is the most prominent color – these are “false positives,” configuration changes, like connecting a new customer. But even with our relatively small set of surveyed networks, one can notice moderate and severe incidents.
If we look at the impact from a slightly different angle [slide 10], we see that “false positives” constitute at least half of all events. Real incidents sum up to 10% of all events. This number may be higher, depending on what is in the “unknown” category – events that have not been classified yet.
Finally, it is interesting to look at how folks became aware of the incidents [slide 11] – a question we asked when classifying an event. “Customer call” is a dominating answer, which indicates that reactive measures prevail over proactive ones.
It is still not too late to join this effort – your contribution will help us better assess the state of security and resilience of the routing system from a risk perspective. In return, you may become more aware yourself how risky the environment is, and after the project is completed, get an individual report where you can see how your risk profile compares to others.
Please send a request for the creation of your account to email@example.com.
In the request please indicate:
- your AS number
- email address for notifications
You may also include AS numbers of your customers for whom you would like to monitor and classify related security incidents.
Hope to see you soon!