How can your company continue to make its website and Internet services available during a massive distributed denial-of-service (DDoS) attack against a DNS hosting provider? In light of last Friday’s attack on Dyn’s DNS infrastructure, many people are asking this question.
One potential solution is to look at using multiple DNS providers for hosting your DNS records. The challenge with Friday’s attack was that so many of the affected companies – Twitter, Github, Spotify, Etsy, SoundCloud and many more – were using ONLY one provider for DNS services. When that DNS provider, Dyn, then came under attack, people couldn’t get to the servers running those services. It was a single point of failure.
You can see this yourself right now. If you go to a command line on a Mac or Linux system and type “dig ns twitter.com,” the answer you will see is something like:
twitter.com. 10345 IN NS ns4.p34.dynect.net. twitter.com. 10345 IN NS ns3.p34.dynect.net. twitter.com. 10345 IN NS ns1.p34.dynect.net. twitter.com. 10345 IN NS ns2.p34.dynect.net.
What this says is that Twitter is using only Dyn. (“dynect.net” is the domain name of Dyn’s “DynECT” managed DNS service.)
Companies using Dyn who also used another DNS provider, though, had less of an issue. Users may have experienced delays in initially connecting to the services, but they were still able to eventually connect. Here is what Etsy’s DNS looks like after Friday (via “dig ns etsy.com”):
etsy.com. 9371 IN NS ns1.p28.dynect.net. etsy.com. 9371 IN NS ns-870.awsdns-44.net. etsy.com. 9371 IN NS ns-1709.awsdns-21.co.uk. etsy.com. 9371 IN NS ns3.p28.dynect.net. etsy.com. 9371 IN NS ns-1264.awsdns-30.org. etsy.com. 9371 IN NS ns-162.awsdns-20.com. etsy.com. 9371 IN NS ns4.p28.dynect.net. etsy.com. 9371 IN NS ns2.p28.dynect.net.
Etsy is now using a combination of Dyn’s DynECT DNS services and Amazon’s Route 53 DNS services.
But wait, you say… shouldn’t this be “DNS 101”?
Aren’t you always supposed to have DNS servers spread out across the world?
Why don’t they have “secondary DNS servers”?
Isn’t that a common best practice?
Well, all of these companies did have secondary servers, and their DNS servers were spread out all around the world. This is why users in Asia, for instance, were able to get to Twitter and other sites while users in the USA and Europe were not able to do so.
So what happened?
It gets a bit complicated.
20 Years Ago…
Jumping back, say, 20 years or so, it was common for everyone to operate their own “authoritative servers” in DNS that would serve out their DNS records. A huge strength of DNS that it is “distributed and de-centralized” and anyone registering a domain name is able to operate their own “authoritative servers” and publish all of their own DNS records.
To make this work, you publish “name server” (“NS”) records for each of your domain names that list which DNS servers are “authoritative” for your domain. These are the servers that can answer back with the DNS records that people need to reach your servers and services.
You need to have at least one authoritative server that would give out your DNS records. Of course, in those early days if there was a problem with that server and it went offline, people would not be able to get the DNS records that would get them to your other computers and services. Similarly you could have a problem with your connection to the Internet and people could not get to your authoritative server.
For that reason the best practice emerged of having a “secondary” authoritative DNS server that contained a copy of all of the DNS records for your domain. The idea was to have this in a different geographic location and on a different network.
On the user end, we use what is called a “recursive DNS resolver” to send out DNS queries and get back the IP addresses that our computers need to connect. Our DNS resolvers will get the list of name servers (“NS records”) and choose one to connect to. If an answer doesn’t come back after some short period of time, the resolver will try the next NS record, and the next… until it runs out of NS records to try.
Back in July 1997, the IETF published RFC 2821 dedicated to this topic: Selection and Operation of Secondary DNS Servers. It’s fun to go back and read through that document almost 20 years later as a great bit has changed. But back in the day, this was a common practice:
The best approach is usually to find an organisation of similar size, and agree to swap secondary zones – each organization agrees to provide a server to act as a secondary server for the other organisation’s zones.
As noted in RFC 2821, it was common for people to have 2, 3, 4 or even more authoritative servers. One would be the “primary” or master server where changes were made – the others would all be “secondary” servers grabbing copies of the DNS records from the primary server.
Over the years, companies and organizations would spend a great amount of time, energy and money building out their own DNS server infrastructure. Having this kind of geographic and network resilience was critical to ensure that users and customers could get the DNS records that would get them to the organizations servers and services.
The Emergence of DNS Hosting Providers
But most people really didn’t want to run their own global infrastructure of DNS servers. They didn’t want to deal with all the headaches of establishing secondary DNS servers and all of that. It was costly and complicated – and just more than most companies wanted to deal with.
Over time companies emerged that were called “DNS hosting providers” or “DNS providers” who would take care of all of that for you. You simply signed up and delegated operation of your domain name to them – and they did everything else.
The advantages were – and are today – enormous. Instead of only a couple of secondary DNS servers, you could have tens or even hundreds. Technologies such as anycast made this possible. The DNS hosting provider would take care of all the data center operation, the geographic diversity, the network diversity… everything. And they provided you with all this capability on a global and network scale that very few companies could provide all by themselves.
The DNS hosting providers gave you everything in the RFC 2821 best practices – and so much more!
And so over the past 10 years most companies and people moved to using DNS hosting providers of some form. Often individuals simply use the DNS hosting provided by whatever domain name registrar they use to register their domain name. Companies have outsourced their DNS hosting to companies such as Dyn, Amazon’s Route 53, CloudFlare, Google’s Cloud DNS, UltraDNS, Verisign and so many more.
It’s simple and easy … and probably 99.99% of the time it has “just worked”.
And you only needed one DNS provider because they were giving you all the necessary secondary DNS services and diversity protection.
Until Friday. When for some parts of the Internet the DNS hosting services of Dyn didn’t work.
It’s important to note that Dyn’s overall DNS network still worked. They never lost all their data centers to the attack. People in some parts of the world, such as Asia, continued to be able to get DNS records and connect to all the affected services without any issues.
But on Friday, all the many companies and services that were using Dyn as their only DNS provider suddenly found that a substantial part of the Internet’s user community couldn’t get to their sites. They found that they were sharing the same fate as their DNS provider in a way that would not have been true before the large degree of centralization with DNS hosting providers.
Some companies, like Twitter, stayed with Dyn through the entire process and weathered the storm. Others, like Github, chose to migrate their DNS hosting to another provider. Still others chose to start using multiple DNS providers.
Why Doesn’t Everyone Just Use Multiple DNS Providers?
This would seem the logical question. But think about that for a second – each of these major DNS providers already has a global, distributed DNS architecture that goes far beyond what companies could provide in the past.
Now we want to ask companies to use multiple of these large-scale DNS providers?
I put this question out in a number of social networks and a friend of mine whose company was affected nailed the issue with this comment:
Because one DNS provider, with over a dozen points-of-presence (POPs) all over the world and anycast, had been sufficient, up until this unprecedented DDoS. We had eight years of 100% availability from Dyn until Friday. Dealing with multiple vendors (and paying for it) didn’t have very good ROI (and I’m still not sure it does, but we’ll do it anyway).
Others chimed in and I can summarize the answers as:
- CDNs and GLBs – Most websites no longer sit on a single web server publishing a simple set of HTML files. They are large complex beasts pulling in data from many different servers and sites. And they very often sit behind content delivery networks (CDNs) that cache website content and make it available through “local” servers or global load balancers (GLBs) that redirect visitors to different servers. Most of these CDNs and GLBs work by using DNS to redirect people to the “closest” server (chosen by some algorithm). When using a CDN or GLB, you typically wind up having to use only that service for your DNS hosting. I’ve found myself in this situation with a few of my own sites where I use a CDN.
- Features – Many companies use more sophisticated features of DNS hosting providers such as geographic redirection or other mechanisms to manage traffic. Getting multiple providers to modify DNS responses in exactly the same way can be difficult or impossible.
- Complexity – Beyond CDNs and features, multiple DNS providers simply adds complexity into IT infrastructure. You need to ensure both providers are publishing the same information, and getting that information out to providers can be tricky in some complex networks.
- Cost – The convenience of using a DNS hosting provider comes at a substantial financial cost. For the scale needed by major Internet services, the DNS providers aren’t cheap.
For all of these reasons and more, it’s not an easy decision for many sites to move to using multiple DNS providers.
And yet the type of massive DDoS attacks we saw on Friday may require companies and organizations to rethink their “DNS strategy”. With the continued deployment of the Internet of Insecure Things, in particular, these type of DDoS attacks may become worse before the situation can improve. (Please read Olaf Kolkman’s post for ideas about how we move forward.) There will be more of these attacks.
As my friend wrote in further discussion:
These days you outsource DNS to a company that provides way more diversity than anyone could in the days before anycast, but the capacity of botnets is still greater than one of the biggest providers, and probably bigger than the top several providers combined.
And even more to the point:
The advantage of multiple providers on Friday wasn’t network diversity, it was target diversity.
The attackers targeted Dyn this time, so companies who use DNS services from Amazon, Google, Verisign or others were okay. Next time the target might be one of the others. Or perhaps attackers may target several.
The longer-term solutions, as Olaf writes about, involve better securing all the devices connected to the Internet to reduce the potential of IoT botnets. They involve the continued work collaboratively to reduce the effects of malware and bad routing info (ex. MANRS). They involve the continued and improved communication and coordination between network operators and so many others.
But in the meantime, I suspect many companies and organizations will be considering whether it makes sense to engage with multiple DNS providers. For many, they may be able to do so. Others may need the specialized capabilities of specific providers and find themselves unable to use multiple providers. Some may not find the return on investment warrants it. While others may accept that they must do this to ensure that their services are always available.
Sadly, taking DNS resilience to an even higher level may be what is required for today.
What do you think? Do you use multiple DNS providers? If so, what worked for you? If not, why not? I would be curious to hear from readers, either as comments here or out on social networks.
 Windows users do not have the ‘dig’ command by default. Instead you can type “nslookup -type=NS <domainname>”. The results may look different that what is shown here, but will have similar information.
NOTE: I want to thank the people who replied to threads on this topic on Hacker News, in the /r/DNS subreddit and on social media. The comments definitely helped in expanding my own understanding of the complexities of the way DNS providers operate today.
Image credit: a photo I took of a friend’s T-shirt at a conference.