The Hari Seldon of Third Party Tooling with Aidan Steele
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.
Corey: Today’s episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that’s built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you’re defining those as, which depends probably on where you work. It’s getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that’s exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn’t eat all the data you’ve gotten on the system, it’s exactly what you’ve been looking for. Check it out today at min.io/download, and see for yourself. That’s min.io/download, and be sure to tell them that I sent you.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by someone who is honestly, feels like they’re after my own heart. Aidan Steele by day is a serverless engineer at Stedi, but by night, he is an absolute treasure and a delight because not only does he write awesome third-party tooling and blog posts and whatnot around the AWS ecosystem, but he turns them into the most glorious, intricate, and technical shit posts that I think I’ve ever seen. Aidan, thank you for joining me.
Aidan: Hi, Corey, thanks for having me. It’s an honor to be here. Hopefully, we get to talk some AWS, and maybe also talk some nonsense as well.
Corey: I would argue that in many ways, those things are one in the same. And one of the things I always appreciated about how you approach things is, you definitely seem to share that particular ethos with me. And there’s been a lot of interesting content coming out from you in recent days. The thing that really wound up showing up on my radar in a big way was back at the start of January—2022, for those listening to this in the glorious future—about using IPv6 to use multi-factor auth, which it is so… I don’t even have the adjectives to throw at this because, first it is ridiculous, two, it is effective, and three, it is just who thinks like that? What is this and what did you—what monstrosity have you built?
Aidan: So, what did I end up calling it? I think it was ipv6-ghost-ship. And I think I called it that because I’d recently watched, oh, what was that series that was recently on Apple TV? Uh, the Isaac Asimov—
Corey: If it’s not Paw Patrol, I have no idea what it is because I have a four-year-old who is very insistent about these things. It is not so much a TV show as it is a way of life. My life is terrible. Please put me out of my misery.
Aidan: Well, at least it’s not Bluey. That’s the one I usually hear about. That’s Australia’s greatest export. But it was one of the plot devices was a ship that would teleport around the place, and you could never predict where it was next. And so no one could access it. And I thought, “Oh, what about if I use the IPv6 address space?”
Corey: Oh, Foundation?
Aidan: That’s the one. Foundation. That’s how the name came about. The idea, honestly, it was because I saw—when was it?—sometime last year, AWS added support for those IP address prefixes. IPv4 prefixes were small; very useful and important, but IPv6 with more than 2 trillion IP addresses, per instance, I thought there’s got to be fun to be had there.
Corey: 281 trillion, I believe is the—
Aidan: 281 trillion.
Corey: Yeah. It is sarcastically large space. And that also has effectively, I would say in InfoSec sense, killed port scanning, the idea I’m going to scan the IP range and see what’s there, just because that takes such a tremendous amount of time. Now here, in reality, you also wind up with people using compromised resources, and yeah, it turns out, I can absolutely scan trillions upon trillions of IP addresses as long as I’m using your AWS account and associated credit card in which to do it. But here in the real world, it is not an easily discoverable problem space.
Aidan: Yeah. I made it as a novelty, really. I was looking for a reason to learn more about IPv6 and subnetting because it’s the term I’d heard, a thing I didn’t really understand, and the way I learn things is by trying to build them, realizing I have no idea what I’m doing, googling the error messages, reluctantly looking at the documentation, and then repeating until I’ve built something. And yeah, and then I built it, published it, and seemed to be pretty popular. It struck a chord. People retweeted it. It tickled your fancy. I think it spoke something in all of us who are trying not to take our jobs too seriously, you know, know we can have a little fun with this ludicrous tech that we get to play with.
Corey: The idea being, you take the multi-factor auth code that your thing generates, and that is the last series of octets for the IP address you wind up going towards and that is such a large problem space that you’re not going to find it in time, so whatever it is automatically connect to that particular IP address because that’s the only one that’s going to be listening for a 30 to 60-second span for the connection to be established. It is a great idea because SSH doesn’t support this stuff natively. There’s no good two-factor auth approach for this. And I love it. I’d be scared to death to run this in production for something that actually matters.
And we also start caring a lot more about how accurate are the clocks on those instances, all of a sudden. But, oh, I just love the concept so much because it hits on the ethos of—I think—what so much of the cloud does were these really are fundamental building blocks that we can use to build incredible, awe-inspiring things that are globe-spanning, and also ridiculousness. And there’s so much value of being able to do the same thing, sometimes at the same time.
Aidan: Yeah, it’s interesting, you mentioned, like, never using in prod, and I guess when I was building it, I thought, you know, that would be apparent. Like, “Yes, this is very neat, but surely no one’s going to use it.” And I did see someone raised an issue on the GitHub project which was talking about clock skew. And I mentioned—
Corey: Here at the bank where I’m running this in production, we’re—
Aidan: [laugh].
Corey: —having some trouble with the clock. Yeah, it’s—
Aidan: You know, I mentioned that the underlying 2FA library did account for clock scheme 30 seconds either way, but it made me realize, I might need to put a disclaimer on the project. While the code is probably reasonably sound, I personally wouldn’t run it in production, and it was more meant to be a piece of performance art or something to tickle one’s fancy and to move on, not to roll it out. But I don’t know, different strokes for different folks.
Corey: I have gotten a lot better about calling out my ridiculous shitpost things when I do them. And the thing that really drove that home for me was talking about using DNS TXT records to store information about what server a virtual machine lives on—or container or whatnot—thus using Route 53 is a database. And that was a great gag, and then someone did a Reddit post of “This seems like a really good idea, so I’m going to start doing it, and I’m having these questions.”
And at that point is like, “Okay, I’ve got a break character at that point.” And is, yeah, “Hi. That’s my joke. Don’t do it because X, Y, and Z are your failure modes, there are better tools for it. So yeah, there are ways you can do this with DNS, but it’s not generally a great idea, and there are some risk factors to it. And okay, A, B, and C are the things you don’t want to do, so let’s instead do it in a halfway intelligent way because it’s only funny if everyone’s laughing. Otherwise, we fall into this trap of people take you seriously and they feel bad as a result when it doesn’t work in production. So, calling it out as this is a joke tends to put a lot of that aside. It also keeps people from feeling left out.
Aidan: Yeah. I realized that because the next novelty project I did a few days later—not sure if you caught it—it was a Rick Roll over ICMPv6 packets, where if you had run ping six to a certain IP range, it would return the lyrics to music’s greatest treasure. So, I think that was hopefully a bit more self-evident that this should never be taken seriously. Who knows, I’m sure someone will find a use for it in prod.
Corey: And I was looking through this, this is great. I love some of the stuff that you’re doing because it’s just fantastic. And I started digging a bit more to things you had done. And at that point, it was whoa, whoa, whoa, wait a minute. Back in 2020, you found an example of an issue with AWS’s security model where CloudTrail would just start—if asked nicely—spewing other people’s credential sets and CloudTrail events and whatnot into your account.
And, A, that’s kind of a problem. B, it was something that didn’t make that big of a splash when it came out—I don’t even think I linked to it at the time—and, C, it was examples of after the recent revelations around CloudFormation and Glue that the fine folks at Orca Security found out. That wasn’t a one-off because you’d done this a year beforehand. We have now an established track record of cross-account data sharing and, potentially, exploits, and I’m looking at this and I got to level with you I felt incredibly naive because I had assumed that since we hadn’t heard of this stuff in any real big sense that it simply didn’t happen.
So, when we heard about Azure; obviously, it’s because Azure is complete clown shoes and the excellent people that AWS would never make these sorts of mistakes. Except we now have evidence that they absolutely did and didn’t talk about it publicly. And I’ve got a level with you. I feel more than a little bit foolish, betrayed, naive for all this. What’s your take on it?
Aidan: Yeah, so just to clarify, it wasn’t actually in your account. It was the new AWS custom resource execution model was you would upload a Lambda function that would run in an Amazon-managed account. And so that immediately set off my spidey sense because executing code in someone else’s account seems fraught with peril. And so—
Corey: Yeah, you can do all kinds of horrifying things there, like, use it to run containers.
Aidan: Yeah. [laugh]. Thankfully, I didn’t do anything that egregious. I stayed inside the Lambda function, but I look—I poked around at
what credentials have had, and it would use CloudWatch to reinvoke itself and CloudWatch kept recording CloudTrail. And I won’t go into all the details, but it ended up being that you could see credentials being recorded in CloudTrail in that account, and I could, sort of,
funnel them out of there.
When I found this, I was a little scared, and I don’t think I’d reported an issue to AWS before, so I didn’t want to go too far and do anything that could be considered malicious. So, I didn’t actively seek out other people’s credentials.
Corey: Yeah, as a general rule, it’s best once you discover things like that to do the right thing and report it, not proceed to, you know, inadvertently commit felonies.
Aidan: Yeah. Especially because it was my first time. I felt better safe than sorry. So, I didn’t see other credentials, but I had no reason to believe that, I wouldn’t see it if I kept looking. I reported it to Amazon. Their security team was incredibly professional, made me feel very
comfortable reporting it, and let me know when, you know, they’d remediated it, which was a matter of days later.
But afterwards, it left me feeling a little surprised because I was able to publish about it, and a few people responded, you know, the sorts of people who pay close attention to the industry, but Amazon didn’t publish anything as far as I was aware. And it changed the way I felt about AWS security, because like you, I sort of felt that AWS, more or less had a pretty perfect track record. They would have advisories about possible [Zen 00:12:04] exploits, and so on. But they’d never published anything about potential for compromise. And it makes me wonder how many of the things might have been reported in the past where either the third-party researcher either didn’t end up publishing, or they published and it just disappeared into the blogosphere, and I hadn’t seen it.
Corey: They have a big earn trust principle over there, and I think that they always focus on the trust portion of it, but I think what got overlooked is the earn. When people are giving you trust that you haven’t earned, on some level, the right thing to do is to call it out and be transparent around these things. Yes, I know, Wall Street’s going to be annoyed and headlines, et cetera, et cetera, but I had always had the impression that had there been a cross-account vulnerability or a breach of some sort, they would communicate this and they would have their executives go on a speaking tour about it to explain how defense-in-depth mitigated some of it, and/or lessons learned, and/or what else we can learn. But it turns out that wasn’t was happening at all. And I feel like they have been given trust that was unearned and now I am not happy with it.
I suddenly have a lot more of a, I guess, skeptical position toward them as a result, and I have very little tolerance left for what has previously been a staple of the AWS security discussions, which is an executive getting on stage for a while and droning on about the shared responsibility model with the very strong implication that “Oh, yeah, we’re fine. It’s all on your side of the fence that things are going to break.” Yeah, turns out, that’s not so true. Just you know, about the things on your side of the fence in a way that you don’t about the things that are on theirs.
Aidan: Yeah, it’s an interesting one. Like, I think about it and I think, “Well, they never made an explicit promise that they would publish these things,” so, on one hand, I say to myself, “Oh, maybe that’s on me for making that assumption.” But, I don’t know, I feel like the way we felt was justified. Maybe naive in hindsight, but then, you know, I guess… I’m still not sure how to feel because of, like, I think about recent issues and how a couple of AWS Distinguished Engineers jumped on Twitter, and to their credit were extremely proactive in engaging with the community.
But is that enough? It might be enough for say, to set my mind at ease or your mind at ease because we are, [laugh] to put it mildly, highly engaged, perhaps a little too engaged in the AWS space, but Twitter’s very ephemeral. Very few of AWS’s customers—
Corey: Yeah, I can’t link to tweets by distinguished engineers to present to an executive leadership team as an official statement from Amazon. I just can’t.
Aidan: Yeah. Yeah.
Corey: And so the lesson we can take from this is okay, so “Well, we never actually said this.” “So, let me get this straight. You’re content to basically let people assume whatever they want until they ask you an explicit question around these things. Really? Is that the lesson you want me to take from this? Because I have a whole bunch of very explicit questions that I will be asking you going forward, if that is in fact, your position. And you are not going to like the fact that I’m asking these questions.”
Even if the answer is a hard no, people who did not have this context are going to wonder why are people asking those questions? It’s a massive footgun here for them if that is the position that they intend to have. I want to be clear as well; this is also a messaging problem.
It is not in any way, a condemnation of their excellent folks working on the security implementation themselves. This stuff is hard and those people are all-stars. I want to be very clear on this. It is purely around the messaging and positioning of the security posture.
Aidan: Yeah, yeah. That’s a good clarification because like you, my understanding that the service teams are doing a really stellar, above-average job, industry-wide, and the AWS Security Response Teams, I have absolute faith in them. It is a matter of messaging. And I guess what particularly brings it to front-of-mind is, it was earlier this month, or maybe it was last month, I received an email from a company called Sourcegraph. They do code search.
I’m not even a customer of theirs yet, you know? I’m on a free trial, and I got an email that—I’m paraphrasing here—was something to the effect of, we discovered that it was possible for your code to appear in other customers’ code search results. It was discovered by one of our own engineers. We found that the circumstances hadn’t cropped up, but we wanted to tell you that it was possible. It didn’t happen, and we’re working on making sure it won’t happen again.
And I think about how radically different that is where they didn’t have a third-party researcher forcing their hand; they could have very easily swept under the rug, but they were so proactive that, honestly, that’s probably what’s going to tipped me over to the edge into me becoming a customer. I mean, other than them having a great product. But yeah, it’s a big contrast. It’s how I like to see other companies work, especially Amazon.
Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They’ve also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That’s S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.
Corey: The two companies that I can think of that have had security problems have been CircleCI and Travis CI. Circle had an incredibly transparent early-on blog post, they engaged with customers on the forums, and they did super well. Travis basically denied, stonewalled for ages, and now the only people who use Travis are there because they haven’t found a good way to get off of it yet. It is effectively DOA. And I don’t think those two things are unrelated.
Aidan: Yeah. No, that’s a great point. Because you know, I’ve been in this industry long enough. You have to know that humans write code and humans make mistakes—I know I’ve made more than my fair share—and I’m not going to write off the company for making a mistake. It’s entirely in their response. And yeah, you’re right. That’s why Circle is still a trustworthy business that should earn people’s business and why Travis—why I recommend everyone move away from.
Corey: Yeah, I like Orca Security as a company and as a product, but at the moment, I am not their customer. I am AWS’s customer. So, why the hell am I hearing it from Orca and not AWS when this happens?
Aidan: Yeah, yeah. It’s… not great. On one hand, I’m glad I’m not in charge of finding a solution to this because I don’t have the skills or the expertise to manage that communication. Because like I think you said in the past, there’s a lot of different audiences that they have to communicate with. They have to communicate with the stock market, they have to communicate with execs, they have to communicate with developers, and each of those audiences demands a different level of detail, a different focus. And it’s tricky. And how do you manage that? But, I don’t know, I feel like you have an obligation to when people place that level of trust in you.
Corey: It’s just a matter of doing right by your customers, on some level.
Aidan: Yeah.
Corey: How long have you been working on an AWS-side environments? Clearly, this is not like, “Well, it’s year two,” because if so I’m going to feel remarkably behind.
Aidan: [laugh]. So, I’ve been writing code in some capacity or another for 20 years. It took about five years to get anyone to pay me to do so. But yeah, I guess the start of my professional career—and by ‘professional,’ I want to use it in strictest term, means getting paid for money; not that I [laugh] am necessarily a professional—coincided with the launch of AWS. So, I don’t hadn’t experienced with the before times of data centers, never had to think about direct connect, but it means I have been using AWS since sometime in 2008.
I was just looking at my bill earlier, I saw that my first bill was for $70. It was—I was using a C1xLarge, which was 80 cents an hour, and it had eight-core CPUs. And to put that in context at the time—
Corey: Eight vCPUs, technically I believe.
Aidan: An it basically is—
Corey: —or were they using [eCPU 00:20:31] model back then?
Aidan: Yeah, no, that was vCPUs. But to me, that was extraordinary. You know, I was somewhere just after high school. It was—the Netflix Prize was around. If you’re not sure what that was, it was Netflix had this open competition where they said anyone who could improve upon their movie recommendation algorithm could win a million dollars.
And obviously being a teenager, I had a massive ego and [laugh] no self-doubt, so I thought I could win this, but I just don’t have enough CPUs or RAM on my laptop. And so when EC2 launched, and I could pay 80 cents an hour, rather than signing up for a 12-month contract with a colocation company, it was just a dream come true. I was able to run my terrible algorithms, but I could run them eight times faster. Unfortunately and obviously, I didn’t win because it turns out, I’m not a world-class statistician. But—
Corey: Common mistake. I make that mistake myself all the time.
Aidan: [laugh]. Yeah. I mean, you know, I think I was probably 19 at the time, so I had—my ego did make me think I was one, but it turned out not to be so. But I think that was what really blew my mind was that me, a nobody, could create an account with Amazon and get access to these incredibly powerful machines for less than the dollar. And so I was hooked.
Since then, I’ve worked at companies that are AWS customers since then. I’ve worked at places that have zero EC2 service, worked at places that have had thousands, and places in between. And it’s got to a point, actually, where, I guess, my career is so entwined with AWS that one, my initials are actually AWS, but also—and this might sound ridiculous, and it’s probably just a sign of my privilege—that I wouldn’t consider working somewhere that used another cloud. Not—
Corey: No, I think that’s absolutely the right approach.
Aidan: Yeah.
Corey: I had a Twitter thread on this somewhat recently, and I’m going to turn it into a blog post because I got some pushback. If I were looking at doing something and I would come into the industry right now, my first choice would be Google Cloud because its developer experience is excellent. But I’m not coming to this without any experience. I have spent a decade or so learning not just how it was works, but also how it breaks, understanding the failure mode and what that’s going to look like and what it’s good at and what it’s not. That’s the valuable stuff for running things in a serious way.
Aidan: Yeah. It’s an interesting one. And I mean, for better or worse, AWS is big. I’m sure you will know much better than I do the exact numbers, but if a junior developer came to me and said, “Which cloud should I learn, or should I learn all of them?” I mean, you’re right, Google Cloud does have a better developer experience, especially for new developers, but when I think about the sheer number of jobs that are available for developers, I feel like I would be doing them a disservice by not suggesting AWS, at least in Australia. It seems they’ve got such a huge footprint that you’ll always be able to find a job working as an AWS-familiar engineer. It seems like that would be less the case with Google Cloud or Azure.
Corey: Again, I am not sitting here, suggesting that anyone should, “Oh, clouds are insecure. We’re going to run our own stuff in our own data centers.” That is ridiculous in this era. They are still going to do a better job of security than any of us will individually, let’s be clear here. And it empowers and unlocks an awful lot of stuff.
But with their privileged position as these hyperscale providers that are the default choice for building things, I think comes with a significant level of responsibility that I am displeased to discover that they’ve been abdicating. And I don’t love that.
Aidan: Yeah, it’s an interesting one, right, because, like you’re saying, they have access and the expertise that people doing it themselves will never match. So, you know, I’m never going to hesitate to recommend people use AWS on account security because your company’s security posture will almost always be better for using AWS and following their guidelines, and so on. But yeah, like you say, with great power comes significant responsibility to earn trust and retain that trust by admitting and publicizing when mistakes are made.
Corey: One last topic I want to get into with you is one that you and I have talked about very briefly elsewhere, that I feel like you and I are both relatively up-to-date on AWS intricacies. I think that we are both better than the average bear working with the platform. But I know that I feel this way, and I suspect you do too that VPCs have gotten confusing as hell. Is that just me? Am I a secret moron that no one bothered to ever tell me this, and I should update my own self-awareness?
Aidan: [laugh]. Yeah, it’s… I mean, that’s been the story of my career with AWS. When I started, VPCs didn’t exist. It was EC2 Classic—well, I guess at the time, it was just EC2—and it was simple. You launched an instance and you had an IP address.
And then along came VPCs, and I think at the time, I thought something to the effect of “This seems like needless complexity. I’m not going to bother learning this. It will never be relevant.” In the end that wasn’t true. I worked in much large deployments when VPCs made fantastic sense made a lot of things possible, but I still didn’t go into the weeds.
Since then, AWS has announced that EC2 Classic will be retired; an end of an era. I’m not personally still running anything in EC2 Classic, and I think they’ve done an incredible job of maintain support for this long, but VPC complexity has certainly been growing year-on-year since then. I recently was using the AWS console—like we all do and no one ever admits to—to edit a VPC subnet route table. And I clicked the drop-down box for a target, and I was overwhelmed by the number of options. There were NAT gateways, internet gateways, carrier gateways, I think there was a thing called a wavelength gateway, ENI, and… I [laugh] I think I was surprised because I just scroll through the list, and I thought, “Wow, that is a lot of different options. Why is that?”
Especially because it’s not so relevant to me. But I realized a big thing of what AWS has been doing lately is trying to make themselves available to people who haven’t used the cloud yet. And they have these complicated networking needs, and it seems like they’re trying to—reasonably successfully—make anything possible. But with that comes, you know, additional complexity.
Corey: I appreciate that the capacity is there, but there has to be an abstraction model for getting rid of some of this complexity because otherwise, the failure mode is you wind up with this amazingly capable thing that can build marvels, but you also need to basically have a PhD in some of these things to wind up tying it all together. And if you bring someone else in to do it, then you have no idea how to run the thing. You’re effectively a golden retriever trying to fly a space shuttle.
Aidan: Yeah. It’s interesting, like, clearly, they must be acutely aware of this because they have default VPCs, and for many use cases, that’s all people should need. But as soon as you want, say a private subnet, then you need to either modify that default VPC or create a new one, and it’s sort of going from 0 to 100 complexity extremely quickly because, you know, you need to create route tables to everyone’s favorite net gateways, and it feels like the on-ramp needs to be not so steep. Not sure what the solution is, I hope they find one.
Corey: As do I. I really want to thank you for taking the time to speak with me about so many of these things. If people want to learn more about what you’re up to, where’s the best place to find you?
Aidan: Twitter’s the best place. On Twitter, my username is @__Steele, which is S-T-E-E-L-E. From there, that’s where I’ll either—I’ll at least speculate on the latest releases or link to some of the silly things I put on GitHub. Sometimes they’re not so silly things. But yeah, that’s where I can be found. And I’d love to chat to anyone about AWS. It’s something I can geek out about all day, every day.
Corey: And we will certainly include links to that in the [show notes 00:29:50]. Thank you so much for taking the time to speak with me
today. I really appreciate it.
Aidan: Well, thank you so much for having me. It’s been an absolute delight.
Corey: Aidan Steele, serverless engineer at Stedi, and shit poster extraordinaire. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an immediate request to correct the record about what I’m not fully understanding about AWS’s piss-weak security communications.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Announcer: This has been a HumblePod production. Stay humble.
Join our newsletter
2021 Duckbill Group, LLC