Ironing out the BGP Ruffles with Ivan Pepelnjak

If you need a point of contact for all things networking, then look no further than Ivan Pepelnjak. Ivan is the webinar author at ipSpace.net where he is working on making networking an approachable subject for everyone. From teaching, to writing books, Ivan has been at it for a long and storied career, and as a de fact go to for networking knowledge, you can’t beat him. Ivan and Corey discuss Ivan’s status as a CCIE Emeritus, and the old days of Cisco. Ivan also levels his network engineering expertise, and helps Corey to answer some questions about BGP and its implementation. Ivan aptly narrows it down into “layers” that he kindly runs us through. So tune in for a Dante-esque decent into BGP, DNS and Facebook, seeing out the graybeards of tech and more!

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by my friends at ThinkstCanary. Most companies find out way too late that they’ve been breached. ThinksCanary changes this and I love how they do it. Deploy canaries and canary tokens in minutes and then forget about them. What's great is the attackers tip their hand by touching them, giving you one alert, when it matters. I use it myself and I only remember this when I get the weekly update with a “we’re still here, so you’re aware” from them. It’s glorious! There is zero admin overhead to this, there are effectively no false positives unless I do something foolish. Canaries are deployed and loved on all seven continents. You can check out what people are saying at canary.love. And, their Kub config canary token is new and completely free as well. You can do an awful lot without paying them a dime, which is one of the things I love about them. It is useful stuff and not an, “ohh, I wish I had money.” It is speculator! Take a look; that’s canary.love because it's genuinely rare to find a security product that people talk about in terms of love. It really is a unique thing to see. Canary.love. Thank you to ThinkstCanary for their support of my ridiculous, ridiculous non-sense.

Corey: Developers are responsible for more than ever these days. Not just the code they write, but also the containers and cloud infrastructure their apps run on. And a big part of that responsibility is app security — from code to cloud.
That’s where Snyk comes in. Snyk is a frictionless security platform that meets developers where they are, finding and fixing vulnerabilities right from the CLI, IDEs, repos, and pipelines. And Snyk integrates seamlessly with AWS offerings like CodePipeline, EKS, ECR, etc., etc., etc., you get the picture! Deploy on AWS. Secure with Snyk. Learn more at snyk.io/scream. That’s S-N-Y-K-dot-I-O/scream. Because they have not yet purchased a vowel.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I have an interesting and storied career path. I dabbled in security engineering slash InfoSec for a while before I realized that being crappy to people in the community wasn’t really my thing; I was a grumpy Unix systems administrator because it’s not like there’s a second kind of those out there; and I dabbled ever so briefly in the wide world of network administration slash network engineering slash plugging the computers in to make them talk to one another, ideally correctly. But I was always a dabbler. When it comes time to have deep conversations about networking, I immediately tag out and look to an expert. My guest today is one such person. Ivan Pepelnjak is oh so many things. He’s a CCIE emeritus, and well, let’s start there. Ivan, welcome to the show.

Ivan: Thanks for having me. And oh, by the way, I have to tell people that I was a VAX/VMS administrator in those days.

Corey: Oh, yes the VAX/VMS world was fascinating. I talked—

Ivan: Yes.

Corey: —to a company that was finally emulating them on physical cards because that was the only way to get them there. Do you refer to them as VAXen, or VAXes, or how did you wind up referring—

Ivan: VAXes.

Corey: VAXes. Okay, I was on the other side of that with the inappropriately pluralizing anything that ends with an X with an en—‘boxen’ and the rest. And that’s why I had no friends for many years.

Ivan: You do know what the first VAX was, right?

Corey: I do not.

Ivan: It was a Swedish Hoover company.

Corey: Ooh.

Ivan: And they had a trademark dispute with Digital over the name, and then they settled that.

Corey: You describe yourself in your bio as a CCIE Emeritus, and you give the number—which is low—number 1354. Now, I’ve talked about certifications on this show in the context of the modern era, and whether it makes sense to get cloud certifications or not. But this is from a different time. Understand that for many listeners, these stories might be older than you are in some cases, and that’s okay. But Cisco at one point, believe it or not, was a shining beacon of the industry, the kind of place that people wanted to work at, and their certification path was no joke.

I got my CCNA from them—Cisco Certified Network Administrator—and that was basically a byproduct of learning how networks worked. There are several more tiers beyond that, culminating in the CCIE, which stands for Cisco Certified Internetworking Expert, or am I misremembering?

Ivan: No, no, that’s it.

Corey: Perfect. And that was known as the doctorate of networking in many circles for many years. Back in those days, if you had a CCIE, you are guaranteed to be making an awful lot of money at basically any company you wanted to because you knew how networking—

Ivan: In the US.

Corey: —worked. Well, in the US. True. There’s always the interesting stories of working in places that are trying to go with the lowest bidder for networking gear, and you wind up spending weeks on end trying to figure out why things are breaking intermittently, and only to find out at the end that someone saved 20 bucks by buying cheap patch cables. I digress, and I still have the scars from those.

But it was fascinating in those days because there was a lab component of getting those tests. There were constant rumors that in the middle of the night, during the two-day certification exam, they would come in and mess with the lab and things you’d set up—

Ivan: That’s totally true.

Corey: —you’d have to fix it the following day. That is true?

Ivan: Yeah. So, in the good old days, when the lab was still physical, they would even turn the connectors around so that they would look like they would be plugged in, but obviously there was no signal coming through. And they would mess up the jumpers on the line cards and all that stuff. So, when you got your broken lab, you really had to work hard, you know, from the physical layer, from the jumpers, and they would mess up your config and everything else. It was, you know, the real deal. The thing you would experience in real world with, uh, underqualified technicians putting stuff together. Let’s put it this way.

Corey: I don’t wish to besmirch our brethren working in the data centers, but having worked with folks who did some hilariously awful things with cabling, and how having been one of those people myself from time to time, it’s hard to have sympathy when you just spent hours chasing it down. But to be clear, the CCIE is one of those things where in a certain era, if you’re trying to have an argument on the internet with someone about how networks work and their responses, “Well, I’m a CCIE.” Yeah, the conversation was over at that point. I’m not one to appeal to authority on stuff like that very often, but it’s the equivalent of arguing about medicine with a practicing doctor. It’s the same type of story; it is someone where if they’re wrong, it’s going to be in the very fringes or the nuances, back in this era. Today, I cannot speak to the quality of CCIEs. I’m not attempting to besmirch any of them. But I’m also not endorsing that certification the way I once did.

Ivan: Yeah, well, I totally agree with you. When this became, you know, a mass certification, the reason it became a mass certification is because reseller discounts are tied to reseller status, which is tied to the number of CCIEs they have, it became, you know, this, well, still high-end, but commodity that you simply had to get to remain employed because your employer needed the extra two point discount.

Corey: It used to be that the prerequisite for getting the certification was beyond other certifications was, you spent five or six years working on things.

Ivan: Well, that was what gave you the experience you needed because in those days, there were no boot camps. Today, you have [crosstalk 00:06:06]—

Corey: Now, there’s boot camp [crosstalk 00:06:07] things where it’s we’re going to train you for four straight weeks of nothing but this, teach to the test, and okay.

Ivan: Yeah. No, it’s even worse, there were rumors that some of these boot camps in some parts of the world that shall remain unnamed, were actually teaching you how to type in the commands from the actual lab.

Corey: Even better.

Ivan: Yeah. You don’t have to think. You don’t have to remember. You just have to type in the commands you’ve learned. You’re done.

Corey: There’s an arc to the value of a certification. It comes out; no one knows what the hell it is. And suddenly it’s, great, you can use that to really identify what’s great and what isn’t. And then it goes at some point down into the point where it becomes commoditized and you need it for partner requirements and the rest. And at that point, it is no longer something that is a reliable signal of anything other than that someone spent some time and/or money.

Ivan: Well, are you talking about bachelor degree now?

Corey: What—no, I don’t have one of those either. I have—

Ivan: [laugh].

Corey: —an eighth grade education because I’m about as good of an academic as it probably sounds like I am. But the thing that really differentiated in my world, the difference between what I was doing in the network engineering sense, and the things that folks like you who were actually, you know, professionals rather than enthusiastic amateurs took into account was that I was always working inside of the LAN—Local Area Network—inside of a data center. Cool, everything here inside the cage, I can make a talk to each other, I can screw up the switching fabric, et cetera, et cetera. I didn’t deal with any of the WAN—Wide Area Network—think ‘internet’ in some cases. And at that point, we’re talking about things like BGP, or OSPF in some parts of the world, or RIP. Or RIPv2 if you make terrible life choices.

But BGP is the routing protocol that more or less powers the internet. At the time of this recording, we’re a couple weeks past a BGP… kerfuffle that took Facebook down for a number of hours, during which time the internet was terrific. I wish they could do that more often, in fact; it was almost like a holiday. It was fantastic. I took my elderly relatives out and got them vaccinated. It was glorious.

Now, we’re back to having Facebook and, terrific. The problem I have whenever something like this happens is there’s a whole bunch of crappy explainers out there of, “What is BGP and how might it work?” And people have angry opinions about all of these things. So instead, I prefer to talk to you. Given that you are a networking trainer, you have taught people about these things, you have written books, you have operated large—scale environments—

Ivan: I even developed a BGP course for Cisco.

Corey: You taught it for Cisco, of all places—

Ivan: Yeah. [laugh].

Corey: —back when that was impressive, and awesome and not a has-been. It’s honestly, I feel like I could go there and still wind up going back in time, and still, it’s the same Cisco in some respects: ‘evolve or die dinosaur,’ and they got frozen in amber. But let’s start at the very beginning. What is BGP?

Ivan: Well, you know, when the internet was young, they figured out that we aren’t all friends on the internet anymore. And I want to control what I tell you, and you want to control what you tell me. And furthermore, I want to control what I believe from what you’re telling me. So, we needed a protocol that would implement policy, where I could say, “I will only announce my customers to you, but not what I’ve heard from Verizon.” And you will do the same.

And then I would say, “Well, but I don’t want to hear about that customer of yours because he’s also my customer.” So, we need some sort of policy. And so they invented a protocol where you will tell me what you have, I will tell you what I have and then we would both choose what we want to believe and follow those paths to forward traffic. And so BGP was born.

Corey: On some level, it seems like it’s this faraway thing to people like me because I have a residential internet connection and I am not generally allowed to make my own BGP announcements to the greater world. Even when I was working in data centers, very often the BGP was handled by our upstream provider, or very occasionally by a router they would drop in with the easiest maintenance instructions in the world for me of, “Step one, make sure it has power. Step two, never touch it. Step three, we’d prefer if you don’t even look at it and remain at least 20 feet away to keep from bringing your aura near anything we care about.” And that’s basically how you should do with me in the context of hardware. So, it was always this arcane magic thing.

Ivan: Well, it’s not. You know, it’s like power transmission: when you know enough about it, it stops being magic. It’s technology, it’s a bit more complicated than some other stuff. It’s way less complicated than some other stuff, like quantum physics, but still, it’s so rarely used that it gets this aura of being mysterious. And then of course, everyone starts getting their opinion, particularly the graduates of the Facebook Academy.

And yes, it is true that usually BGP would be used between service providers, so whenever, you know, we are big enough to need policy, if you just need one uplink, there is no policy there. You either use the uplink or you don’t use the uplink. If you want to have two different links to two different points of presence or to two different service providers, then you’re already in the policy land. Do I prefer one provider over the other? Do I want to announce some things to one provider but other things to the other? Do I want to take local customers from both providers because I want to, you know, have lower latency because they are local customers? Or do I want to use one solely as the backup link because I paid so little for that link that I know it’s shitty.

So, you need all that policy stuff, and to do that, you really need BGP. There is no other routing protocol in the world where you could implement that sort of policy because everything else is concerned mostly with, let’s figure out as fast as possible, what is reachable and how to get there. And BGP is like, “Hey, slow down. There’s policy.”

Corey: Yeah. In the context of someone whose primary interaction with networks is their home internet, where there’s a single cable coming in from the outside world, you plug it into a device, maybe yours, maybe ISPs, maybe we don’t care. That’s sort of the end of it. But think in terms of large interchanges, where there are multiple redundant networks to get from here to somewhere else; which one should traffic go down at any given point in time? Which networks are reachable on the other end of various distant links? That’s the sort of problem that BGP is very good at addressing and what it was built for. If you’re running BGP internally, in a small network, consider not doing exactly that.

Ivan: Well, I’ve seen two use cases—well, three use cases for people running BGP internally.

Corey: Okay, this I want to hear because I was always told, “No touch ‘em.” But you know, I’m about to learn something. That’s why I’m talking to you.

Ivan: The first one was multinationals who needed policy.

Corey: Yes. Many multi-site environments, large-scale companies that have redundant links, they’re trying to run full mesh in some cases, or partial mesh where—between a bunch of facilities.

Ivan: In this case, it was multiple continents and really expensive transcontinental links. And it was, I don’t want to go from Europe to Sydney over US; I want to go over Middle East. And to implement that type of policy, you have to split, you know, the whole network into regions, and then each region is what BGP calls an autonomous system, so that it gets its stack, its autonomous system number and then you can do policy on that saying, “Well, I will not announce Asian routes to Europe through US, or I will make them less preferred so that if the Middle East region goes down, I can still reach Asia through US but preferably, I will not go there.”

The second one is yet again, large networks where they had too many prefixes for something like OSPF to carry, and so their OSPF was breaking down and the only way to solve that was to go to something that was designed to scale better, which was BGP.

And third one is if you want to implement some of the stuff that was designed for service providers, initially, like, VPNs, layer two or layer three, then BGP becomes this kitchen sink protocol. You know, it’s like using Route 53 as a database; we’re using BGP to carry any information anyone ever wants to carry around. I’m just waiting for someone to design JSON in BGP RFC and then we are, you know… where we need to be.

Corey: I feel on some level, like, BGP gets relatively unfair criticism because the only time it really intrudes on the general awareness is when something has happened and it breaks. This is sort of the quintessential network or systems—or, honestly, computer—type of issue. It’s either invisible, or you’re getting screamed at because something isn’t working. It’s almost like a utility. On some level. When you turn on a faucet, you don’t wonder whether water is going to come out this time, but if it doesn’t, there’s hell to pay.

Ivan: Unless it’s brown.

Corey: Well, there is that. Let’s stay away from that particular direction; there’s a beautiful metaphor, probably involving IBM, if we do. So, the challenge, too, when you look at it is that it’s this weird, esoteric thing that isn’t super well understood. And as soon as it breaks, everyone wants to know more about it. And then in full on charging to the wrong side of the Dunning-Kruger curve, it’s, “Well, that doesn’t sound hard. Why are they so bad at it? I would be able to run this better than they could.” I assure you, you can’t. This stuff is complicated; it is nuanced; it’s difficult. But the common question is, why is this so fragile and able to easily break? I’m going to turn that around. How is it that something that is this esoteric and touches so many different things works as well as it does?

Ivan: Yeah, it’s a miracle, particularly considering how crappy the things are configured around the world.

Corey: There have been periodic outages of sites when some ISP sends out a bad BGP announcement and their upstream doesn’t suppress it because hey, you misconfigured things, and suddenly half the internet believes oh, YouTube now lives in this tiny place halfway around the world rather than where it is currently being Anycasted from.

Ivan: Called Pakistan, to be precise.

Corey: Exact—there was an actual incident there; we are not dunking on Pakistan as an example of a faraway place. No, no, an Pakistani ISP wound up doing exactly this and taking YouTube down for an afternoon a while back. It’s a common problem.

Ivan: Yeah, the problem was that they tried to stop local users accessing YouTube. And they figured out that, you know, YouTube, is announcing this prefix and if they would announce to more specific prefixes, then you know, they would attract the traffic and the local users wouldn’t be able to reach YouTube. Perfect. But that leaked.

Corey: If you wind up saying that, all right, the entire internet is available on this interface, and a small network of 256 nodes available on the second interface, the most specific route always wins. That’s why the default route or route of last resort is the entire internet. And if you don’t know where to send it, throw it down this direction. That is usually, in most home environments, the gateway that then hands it up to your ISP, where they inspect it and do all kinds of fun things to sell ads to you, and then eventually get it to where it’s going.

This gets complicated at these higher levels. And I have sympathy for the technical aspects of what happened at Facebook; no sympathy whatsoever for the company itself because they basically do far more harm than they do good and I’ve been very upfront about that. But I want to talk to you as well about something that—people are going to be convinced I’m taking this in my database direction, but I assure you I’m not—DNS. What is the relationship between BGP and DNS? Which sounds like a strange question, sometimes.

Ivan: There is none.

Corey: Excellent.

Ivan: It’s just that different large-scale properties decided to implement the global load-balancing global optimal access to their servers in different ways. So, Cloudflare is a typical example of someone who is doing Anycast, they are announcing the same networks, the same prefixes, from hundreds locations around the world. So, BGP will take care that you always get to the close Cloudflare [unintelligible 00:18:46]. And that’s it. That’s how they work. No magic. Facebook didn’t believe in the power of Anycast when they started designing their service. So, what they’re doing is they have DNS servers around the world, and the DNS servers serve the local region, if you wish. And that DNS server then decides what facebook.com really stands for. So, if you query for facebook.com, you’ll get a different answer in Europe than in US.

Corey: Just a slight diversion on what Anycast is. If I ping Google’s public resolver 8.8.8.8—easy to remember—from my computer right now,
the packet gets there and back in about five milliseconds.

Wherever you are listening to this, if you were to try that same thing you’d see something roughly similar. Now, one of two things is happening; either Google has found a way to break the laws of physics and get traffic to a central point faster than light for the 8.8.8.8 that I’m talking to and the one that you are talking to are not in fact the same computer.

Ivan: Well, by the way, it’s 13 milliseconds for me. And between you and me, it’s 200 millisecond. So yes, they are cheating.

Corey: Just a little bit. Or unless they tunneled through the earth rather than having to bounce it off of satellites and through cables.

Ivan: No, even that wouldn’t work.

Corey: That’s what the quantum computers are for. I always wondered. Now, we know.

Ivan: Yeah. They’re entangling the replies in advance, and that’s how it works. Yeah, you’re right.

Corey: Please continue. I just wanted to clarify that point because I got that one hilariously wrong once upon a time and was extremely confused for about six months.

Ivan: Yeah. It’s something that no one ever thinks about unless, you know, you’re really running large-scale DNS because honestly, root DNS servers were Anycasted for ages. You think they’re like 12 different root DNS servers; in reality, there are, like, 300 instances hidden behind those 12 addresses.

Corey: And fun trivia fact; the reason there are 12 addresses is because any more than that would no longer fit within the 512 byte limit of a UDP packet without truncating.

Ivan: Thanks for that. I didn’t know that.

Corey: Of course. Now, EDNS extensions that you go out with a larger [unintelligible 00:21:03], but you can’t guarantee that’s going to hit. And what happens when you receive a UDP packet—when you receive a DNS result with a truncate flag set on the UDP packet? It is left to the client. It can either use the partial result, or it can try and re-establish over a TCP connection.

That is one of those weird trivia questions they love to ask in sysadmin interviews, but it’s yeah, fundamentally, if you’re doing something that requires the root nameservers, you don’t really want to start going down those arcane paths; you want it to just be something that fits in a single packet not require a whole bunch of computational overhead.

Ivan: Yeah, and even within those 300 instances, there are multiple servers listening to the same IP address and… incoming packets are just sprayed across those servers, and whichever one gets the packet replies to it. And because it’s UDP, it’s one packet in one packet out. Problem solved. It all works. People thought that this doesn’t work for TCP because, you know, you need a whole session, so you need to establish the session, you send the request, you get the reply, there are acknowledgements, all that stuff.

Turns out that there is almost never two ways to get to a certain destination across the internet from you. So, people thought that, you know, this wouldn’t work because half of your packets will end in San Francisco, and half of the packets will end in San Jose, for example. Doesn’t work that way.

Corey: Why not?

Ivan: Well, because the global Internet is so diverse that you almost never get two equal cost paths to two different destinations because it would be San Francisco and San Jose announcing 8.8.8.8 and it would be a miracle if you would be sitting just in the middle so that the first packet would go to San Francisco, the second one would go to San Jose, and you know, back and forth. That never happens. That’s why Cloudflare makes it work by analysing the same prefix throughout the world.

Corey: So, I just learned something new about how routing announcements work, an aspect of BGP, and you a few minutes ago learned something about the UDP size limit and the root name servers. BGP and DNS are two of the oldest protocols in existence. You and I are also decades into our careers. If someone is starting out their career today, working in a cloud-y environment, there are very few network-centric roles because cloud providers handle a lot of this for us. Given these protocols are so foundational to what goes on and they’re as old as they are, are we as an industry slash sector slash engineers losing the skills to effectively deploy and manage these things?

Ivan: Yes. The same problem that you have in any other sufficiently developed technology area. How many people can build power lines? How many people can write a compiler? How many people can design a new CPU? How many people can design a new motherboard?

I mean, when I was 18 years old, I was wire wrapping my own motherboard, with 8-bit processor. You can’t do that today. You know, as the technology is evolving and maturing, it’s no longer fun, it’s no longer sexy, it stops being a hobby, and so it bifurcates into users and people who know about stuff. And it’s really hard to bridge the gap from one to the other. So, in the end, you have, like, this 20 [graybeard 00:24:36] people who know everything about the technology, and the youngsters have no idea. And when these people die, don’t ask me [laugh] how we’ll get any further on.

Corey: This episode is sponsored by our friends at CloudAcademy. That’s right, they have a different lab challenge up for you called, “Code Red: Repair an AWS Environment with a Linux Bastion Host.” What does it do? Well, its going to assess your ability to troubleshoot AWS networking and security issues in a production like environment. Well, kind of, its not quite like production because some exec is not standing over your shoulder, wetting themselves while screaming. But..ya know, you can pretend in fact I’m reasonably certain you can retain someone specifically for that purpose should you so choose. If you are the first prize winner who completes all four challenges with the fastest time, you’ll win a thousand bucks. If you haven’t started yet you can still complete all four challenges between now and December 3rd to be eligible for the grand prize. There's only a few days left until the whole thing ends, so I would get on it now. Visit cloudacademy.com/corey. That’s cloudacademy.com/C-O-R-E-Y, for god’s sake don’t drop the “E” that drives me nuts, and thank you again to Cloud Academy for not only promoting my ridiculous non sense but for continuing to help teach people how to work in this ridiculous environment.

Corey: On some level, it feels like it’s a bit of a down the stack analogy for what happened to me early in my career. My first systems administration job was running a large-scale email system. So, it was a hobby that I was interested in. I basically bluffed my way into working at a university for a year—thanks, Chapman; I appreciate that [laugh]—and it was great, but it was also pretty clear to me that with the rise of things like hosted email, Gmail, and whatnot, it was not going to be the future of what the present day at that point looked like, which was most large companies needed an email administrator. Those jobs were dwindling.

Now, if you want to be an email systems administrator, there are maybe a dozen companies or so that can really use that skill set and everyone else just outsources that said, at those companies like Google and Microsoft, there are some incredibly gifted email administrators who are phenomenal at understanding every nuance of this. Do you think that is what we’re going to see in the world of running BGP at large scale, where a few companies really need to know how this stuff works and everyone else just sort of smiles, nods and rolls with it?

Ivan: Absolutely. We’re already there. Because, you know, if I’m an end customer, and I need BGP because I have to uplinks to two ISPs, that’s really easy. I mean, there are a few tricks you should follow and hopefully, some of the guardrails will be built into network operating systems so that you will really have to configure explicitly that you want to leak [unintelligible 00:26:15] between Verizon and AT&T, which is great fun if you have too low-speed links to both of them and now you’re becoming transit between the two, which did happen to Verizon; that’s why I’m mentioning them. Sorry, guys.

Anyway, if you are a small guy and you just need two uplinks, and maybe do a bit of policy, that’s easy and that’s achievable, let’s say with some Google and paste, and throwing spaghetti at the wall and seeing what sticks. On the other hand, what the large-scale providers—like for example Facebook because we were talking about them—are doing is, like, light years away. It’s like comparing me turning on the light bulb and someone running, you know, nuclear reactor.

Corey: Yeah, you kind of want the experts running some aspects on that. Honestly, in my case, you probably want someone more competent flipping the light switch, too. But that’s why I have IoT devices here that power my lights, it on the one hand, keeps me from hurting myself on the other leads to a nice seasonal feel because my house is freaking haunted.

Ivan: So, coming back to Facebook, they have these DNS servers all around the world and they don’t want everyone else to freak out when one of these DNS servers goes away. So, that’s why they’re using the same IP address for all the DNS servers sitting anywhere in the world. So, the name server for facebook.com is the same worldwide. But it’s different machines and they will give you different answers when you ask, “Where is facebook.com?”

I will get a European answer, you will get a US answer, someone in Asia will get whatever. And so they’re using BGP to advertise the DNS servers to the world so that everyone gets to the closest DNS server. And now it doesn’t make sense, right, for the DNS server to say, “Hey, come to European Facebook,” if European Facebook tends to be down. So, if their DNS server discovers that it cannot reach the servers in the data center, it stops advertising itself with BGP.

Why would BGP? Because that’s the only thing it can do. That’s the only protocol where I can tell you, “Hey, I know about this prefix. You really should send the traffic to me.” And that’s what happened to Facebook.

They bricked their backbone—whatever they did; they never told—and so their DNS server said, “Gee, I can’t reach the data center. I better stop announcing that I’m a DNS server because obviously I am disconnected from the rest of Facebook.” And that happens to all DNS servers because, you know, the backbone was bricked. And so they just, you know, [unintelligible 00:29:03] from the internet, they've stopped advertising themselves, and so we thought that there was no DNS server for Facebook. Because no DNS server was able to reach their core, and so all DNS servers were like, “Gee, I better get off this because, you know, I have no clue what’s going on.”

So, everything was working fine. Everything was there. It’s just that they didn’t want to talk to us because they couldn’t reach the backend servers. And of course, people blamed DNS first because the DNS servers weren’t working. Of course they weren’t. And then they blame the BGP because it must be BGP if it isn’t DNS. But it’s like, you know, you’re blaming headache and muscle cramps and high fever, but in fact you have flu.

Corey: For almost any other company that wasn’t Facebook, this would have been a less severe outage just because most companies are interdependent on each other companies to run infrastructure. When Facebook itself has evolved the way that it has, everything that they use internally runs on the same systems, so they wound up almost with a bootstrapping problem. An example of this in more prosaic terms are okay, the data center had a power outage. Okay, now I need to power up all the systems again and the physical servers I’m trying to turn on need to talk to a DNS server to finish booting but the DNS server is a VM that lives on those physical servers. Uh-oh. Now, I’m in trouble. That is a overly simplified and real example of what Facebook encountered trying to get back into this, to my understanding.

Ivan: Yes, so it was worse than that. It looks like, you know, even out-of-band management access didn’t work, which to me would suggest that out-of-band management was using authentication servers that were down. People couldn’t even log to Zoom because Zoom was using single-sign-on based on facebook.com, and facebook.com was down so they couldn’t even make Zoom calls or open Google Docs or whatever. There were rumors that there was a certain hardware tool with a rotating blade that was used to get into a data center and unbrick
a box. But those rumors were vehemently denied, so who knows?

Corey: The idea of having someone trying to physically break into a data center in order to power things back up is hilarious, but it does lead to an interesting question, which is in this world of cloud computing, there are a lot of people in the physical data centers themselves, but they don’t have access, in most cases to log into any of the boxes. One of the most naive things I see all the time is, “Oh well, the cloud provider can read all of your data.” No, they can’t. These things are audited. And yeah, theoretically, if they’re lying outright, and somehow have
falsified all of the third-party audit stuff that has been reported and are willing to completely destroy their business when it gets out—and I assure you, it would—yeah, theoretically, that’s there. There is an element of trust here. But I’ve had to answer a couple of journalists questions recently of, “Oh, is AWS going to start scanning all customer content?” No, they physically cannot do it because there are many ways you can configure things where they cannot see it. And that’s exactly what we want.

Ivan: Yeah, like a disk encryption.

Corey: Exactly. Disk encryption, KMS on some level, using—rolling your own, et cetera, et cetera. They use a lot of the same systems we do. The point being, though, is that people in the data centers do not even have logging rights to any of these nodes for the physical machines, in some cases, let alone the customer tenants on top of those things. So, on some level, you wind up with people building these systems that run on top of these computers, and they’ve never set foot in one of the data centers.

That seems ridiculous to me as someone who came up visiting data centers because I had to know where things were when they were working so I could put them back that way when they broke later. But that’s not necessary anymore.

Ivan: Yeah. And that’s the problem that Facebook was facing with that outage because you start believing that certain systems will always work. And when those systems break down, you’re totally cut off. And then—oh, there was an article in ACM Queue long while ago where they were discussing, you know, the results of simulated failures, not real ones, and there were hilarious things like phone directory was offline because it wasn’t on UPS and so they didn’t know whom to call. Or alerts couldn’t be diverted to a different data center because the management station for alert configuration was offline because it wasn’t on UPS.

Or, you know the one, right, where in New York, they placed the gas pump in the basement, and the diesel generators were on the top floor, and the hurricane came in and they had to carry gas manually, all the way up to the top floor because the gas pump in the basement just stopped working. It was flooded. So, they did everything right, just the fuel wouldn’t come to the diesel generators.

Corey: It’s always the stuff that is under the hood on these things that you can’t make sense of. One of the biggest things I did when I was evaluating data center sites was I’d get a one-line diagram—which is an electrical layout of the entire facility—great. I talked to the folks running it. Now, let’s take a walk and tour it. Hmmm, okay. You show four transformers on your one-line diagram. I see two transformers and two empty concrete pads. It’s an aspirational one-line diagram. It’s a joke that makes it a one-liner diagram and it’s not very funny. So it’s, okay if I can’t trust you for those little things, that’s a problem.

Ivan: Yeah, well, I have another funny story like that. We had two power feeds coming into the house plus the diesel generator, and it was, you know, the properly tested every month diesel generator. And then they were doing some maintenance and they told us in advance that they will cut both power feeds at 2 a.m. on a Sunday morning.

And guess what? The diesel generator didn’t start. Half an hour later UPS was empty, we were totally dead in water with quadruple redundancy because you can’t get someone it’s 2 a.m. on a Sunday morning to press that button on the diesel generator. In half an hour.

Corey: That is unfortunate.

Ivan: Yeah, but that’s how the world works. [laugh].

Corey: So, it’s been fantastic reminding myself of some of the things I’ve forgotten because let’s be clear, in working with cloud, a lot of this stuff is completely abstracted away. I don’t have to care about most of these things anymore. Now, there’s a small team of people that AWS who very much has to care; if they don’t, I will say mean things to them on Twitter, if I let my HugOps position slip up just a smidgen. But they do such a good job at this that we don’t have problems like this, almost ever, to the point where when it does happen, it’s noteworthy. It’s been fun talking to you about this just because it’s a trip down a memory lane that is a lot more aligned with the things that are there and we tend not to think about them. It’s almost a How it’s Made episode.

Ivan: Yeah. And don’t be so relaxed regarding the cloud networking because, you know, if you don’t go full serverless with nothing on-premises, you know what protocol you’re running between on-premises and the cloud on direct connect? It’s called BGP.

Corey: Ah. You know, I did not know that. I’ve done some ridiculous IPsec pairings over those things, and was extremely unhappy for a while afterwards, but I never got to the BGP piece of it. Makes sense.

Ivan: Yeah, even over IPsec if you want to have any dynamic failover, or multiple sites, or anything, it’s [BP 00:36:56].

Corey: I really want to thank you for taking the time to go through all this with me. If people want to learn more about how you view these things, learn more things from you, as I’d strongly recommend they should if they’re even slightly interested by the conversation we’ve had, where can they find you?

Ivan: Well, just go to ipspace.net and start exploring. There’s the blog with thousands of blog entries, some of them snarkier than others. Then there are, like, 200 webinars, short snippets of a few hours of—

Corey: It’s like a one man version of re:Invent. My God.

Ivan: Yeah, sort of. But I’ve been working on this for ten years, and they do it every year, so I can’t produce the content at their speed. And then there are three different full-blown courses. Some of them are just, you know, the materials from the webinars, plus guest speakers plus hands-on exercises, plus I personally review all the stuff people submit, and they cover data centers, and automation, and public clouds.

Corey: Fantastic. And we will, of course, put links to that into the [show notes 00:38:01]. Thank you so much for being so generous with your time. I appreciate it.

Ivan: Oh, it’s been such a huge pleasure. It’s always great talking with you. Thank you.

Corey: It really is. Thank you once again. Ivan Pepelnjak network architect and oh so much more. CCIE #1354 Emeritus. And read the bio; it’s well worth it. I am Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice and a comment formatted as a RIPv2 announcement.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and
we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Join our newsletter

checkmark Got it. You're on the list!
Want to sponsor the podcast? Send me an email.

2021 Duckbill Group, LLC