Transparency in Cloud Security with Gafnit Amiga
Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. We’ve taken a bit of a security bent to the conversations that we’ve been having on this show and over the past year or so and, well, today’s episode is no different. In fact, we’re going a little bit deeper than we normally tend to. My guest today is Gafnit Amiga, who’s the Director of Security Research at Lightspin. Gafnit, thank you for joining me.
Gafnit: Hey, Corey. Thank you for inviting me to the show.
Corey: You sort of burst onto the scene—and by ‘scene,’ I of course mean the cloud space, at least to the level of community awareness—back, I want to say in April of 2022 when you posted a very in-depth blog post about exploiting RDS and some misconfigurations on AWS’s side to effectively display internal service credentials for the RDS service itself. Now, that sounds like it’s one of those incredibly deep, incredibly murky things because it is, let’s be clear. At a high level, can you explain to me exactly what it is that you found and how you did it?
Gafnit: Yes, so, RDS is database service of Amazon. It’s a managed service where you can choose the engine that you prefer. One of them is Postgres. There, I found the vulnerability. The vulnerability was in the extension in the log_fdw—so it’s for—like, stands for Foreign Data Wrapper—where this extension is, therefore reading the logs directly of the engine, and then you can query it using SQL queries, which should be simpler and easy to use.
And this extension enables you to provide a path. And there was a path traversal, but the traversal happened only when you dropped a validation of the wrapper. And this is how I managed to read local files from the database EC2 machine, which shouldn’t happen because this is a managed service and you shouldn’t have any access to the underlying host.
Corey: It’s always odd when the abstraction starts leaking, from an AWS perspective. I know that a friend of mine was on Aurora during the beta and was doing some high-performance work and suddenly started seeing SQL errors about /var/temp filling up, which is, for those who are not well versed in SQL, and even for those who are, that’s not the sort of thing you tend to expect to show up on there. It feels like the underlying system tends to leak in—particularly in RDS sense—into what is otherwise at least imagined to be a fully-managed service.
Gafnit: Yes because sometimes they want to give you an informative error so you will be able to realize what happened and what caused to the error, and sometimes they prefer not to give you too many information because they don’t want you to get to the underlying machine. This is why, for example, you don’t get a regular superuser; you have an RDS superuser in the database.
Corey: It seems to me that this is sort of a problem of layering different security models on top of each other. If you take a cloud-native database that they designed, start to finish, themselves, like DynamoDB, the entire security model for Dynamo, as best I can determine, is wrapped up within IAM. So, if you know IAM—spoiler, nobody knows IAM completely, it seems—but if you have that on lock you’ve got it; there’s nothing else you need to think about. Whereas with RDS, you have to layer on IAM to get access to the database and what you’re allowed to do with it.
But then there’s an entirely separate user management system, in many respects, of local users for other Postgres or MySQL or any other systems that were using, to a point where even when they started supporting IRM for authentication to RDS at the database user level. It was flagged in the documentation with a bunch of warnings of, “Don’t do this for high-volume stuff; only do this in development style environments.” So, it’s clear that it has been a difficult marriage, for lack of a better term. And then you have to layer on all the other stuff that if God forbid, you’re in a multi-cloud style environment or working with Kubernetes on top of all of this, and it seems like you’re having to pick and choose between four or five different levels of security modeling, as well as understand how all of those things interplay together. How come we don’t see things like this happening four times a day as a result?
Gafnit: Well, I guess that there are more issues being found, but not always published but I think that this is what makes it more complex for both sides. Creating managed services with resources and third parties that everybody knows. To make it easy for them to use requires a deep understanding of the existing permission models of the service where you want to integrate it with your permission model and how the combination works. So, you actually need to understand how every change is going to affect the restrictions that you want to have. So, for example, if you don’t want the database users to be able to read-write or do a network activity, so you really need to understand the permission model of the Postgres itself. So, it makes it more complicated for development, but it’s also good for researchers because they already know Postgres and they have a good starting point.
Corey: My philosophy has always been when you’re trying to secure something, you need to have at least a topical level of understanding of the entire system, start to finish. One of the problems I’ve had with the idea of microservices as is frequently envisioned is that there’s separation, but not real separation, so you have to hand-wave over a whole bunch of the security model. If you don’t understand something, I believe it’s very difficult to secure it. And let’s be honest, even if you do understand [laugh] something, it can be very difficult to secure it. And the cloud vendors with IAM and similar systems don’t seem to be doing themselves any favors, given the sheer complexity and the capabilities that they’re demanding of themselves, even for having one AWS service talk to another one, but in the right way.
And it’s finicky, and it’s nuanced, and debugging it becomes a colossal pain. And finally, at least those of us who are bad at these things, finally say, “The hell with it,” and they just grant full access from Service A to Service B—in the confines of a test environment. I’m not quite that nuts myself, most days. And then it’s the biggest lie we always tell ourselves is once we have something overscoped like that, usually for CI/CD, it’s, “Oh, todo: I’ll go back and fix that later.” Yeah, I’m looking back five years ago and that’s still on my todo list.
For some reason, it’s never been the number one priority. And in all likelihood, it won’t be until right after it really should have been my number one priority. It feels like in cloud security particularly, you can’t win, you can only not lose. I always found that to be something of a depressing perspective and I didn’t accept it for the longest time. But increasingly, these days, it started to feel like that is the state of the world. Am I wrong on that? Am I just being too dour?
Gafnit: What do you mean by you cannot lose?
Corey: There’s no winning in security from my perspective because no one is going to say, “All right. We won the security. Problem solved. The end.” Companies don’t view security as a value-add. It is only about a downside risk mitigation play.
It’s, “Yay, another day of not getting breached.” And the failure mode from there is, “Okay, well, we got breached, but we found out about it ourselves immediately internally, rather than reading about it in The New York Times in two weeks.” The winning is just the steady-state, the status quo. It’s just all different flavors of losing beyond that.
Gafnit: So, I don’t think it’s quite the case because I can tell that they do do always an active work on securing the services and their structure because I went over other extensions before reaching to the log foreign data wrapper, and they actually excluded high-risk functionalities that could help me to achieve privileged access to the underlying host. And they do it with other services as well because they do always do the security review before having it integrated externally. But you know, it’s an endless zone. You can always have something. Security vulnerabilities are always [arrays 00:09:06]. So everyone, whenever they can help and to search and to give their value, it’s appreciated.
Corey: I feel like I need to clarify a bit of nuance. When your blog post first came out talking about this, I was, well let’s say a little irritated toward AWS on Twitter and other places. And Twitter is not a place for nuance, it is easy to look at that and think, “Oh, I was upset at AWS for having a vulnerability.” I am not, I want to be very clear on that. Now, it’s certainly not good, but these are computers; that is the nature of how they work.
If you want to completely secure computer, cut the power to it, sink it in concrete and then drop it in the ocean. And even then, there are exceptions to all of that. So, it’s always a question of not blocking all risk; it’s about trade-offs and what risk is acceptable. And to AWS is credit, they do say that they practice defense-in-depth. Being able to access the credentials for the running RDS service on top of the instance that it was running on, while that’s certainly not good, isn’t as if you’d suddenly had keys to everything inside of AWS and all their security model crumbles away before you.
They do the right thing and the people working on these things are incredibly good. And they work very hard at these things. My concern and my complaint is, as much as I enjoy the work that you do and reading these blog posts talking about how you did it, it bothers me that I have to learn about a vulnerability in a service for which I pay not small amounts of money—RDS is the number one largest charge in my AWS bill every month—and I have to hear about it from a third-party rather than the vendor themselves. In this case, it was a full day later, where after your blog post went up, and they finally had a small security disclosure on AWS’s site talking about it. And that pattern feels to me like it leads nowhere good.
Gafnit: So, transparency is a key word here. And when I wrote the post, I asked if they want to add anything from their side, and they told that they already reached out to the vulnerable customers and they helped them to migrate to their fixed version. So, from their side, it didn’t felt it’s necessary to add it over there. But I did mention the fact that I did the investigation and no customer data was hurt. Yeah, but I think that if there will be maybe a more organized process for any submission of any vulnerability that where all the steps are aligned, it will help everyone and anyone can be informed with everything that happens.
Corey: I have always been extraordinarily impressed by people who work at AWS and handle a lot of the triaging of vulnerability reports. Zack Glick, before he left, was doing an awful lot of that Dan [Erson 00:12:05] continues to be a one of the bright lights of AWS, from my perspective, just as far as customer communication and understanding exactly what the customer perspective is. And as individuals, I see nothing but stars over at AWS. To be clear, ‘Nothing but Stars’ is also the name of most of my IAM policies, but that’s neither here nor there.
It seems like, on some level, there’s a communications and policy misalignment, on some level, because I look at this and every conversation I ever have with AWS’s security folks, they are eminently reasonable, they’re incredibly intelligent, and they care. There’s no mistaking that they legitimately care. But somewhere at the scale of company they’re at, incentives get crossed, and everyone has a different position they’re looking at these things from, and it feels like that disjointedness leads to almost a misalignment as far as how to effectively communicate things like this to customers.
Gafnit: Yes, it looks like this is the case, but if more things will be discovered and published, I think that they will have eventually an organized process for that. Because I guess the researchers do find things over there, but they’re not always being published for several reasons. But yes, they should work on that. [laugh].
Corey: And that is part of the challenge as well, where AWS does not have a public vulnerability disclosure program. [unintelligible 00:13:30] hacker one, they don’t have a public bug bounty program. They have a vulnerability disclosure email address, and the people working behind that are some of the hardest working folks in tech, but there is no unified way of building a community of researchers around the idea of exploring this. And that is a challenge because you have reported vulnerabilities, I have reported significantly fewer vulnerabilities, but it always feels like it’s a hurry up and wait scenario where the communication is not always immediate and clear. And at best, it feels like we often get a begrudging, “Thank you.”
Versus all right, if we just throw ethics completely out the window and decide instead that now we’re going to wind up focusing on just effectively selling it to the highest bidder, the value of, for example, a hypervisor escape on EC2 for example, is incalculable. There is no amount of money that a bug bounty program could offer for something like that compared to what it is worth to the right bad actor at the right time. So, the vulnerabilities that we hear about are already we’re starting from a basis of people who have a functioning sense of ethics, people who are not deeply compromised trying to do something truly nefarious. What worries me is the story of—what are the stories that we aren’t seeing? What are the things that are being found where instead of fighting against the bureaucracy around disclosure and the rest, people just use them for their own ends? And I’m gratified by the level of response I see from AWS on the things that they do find out about, but I always have to wonder, what aren’t we seeing?
Gafnit: That’s a good question. And it really depends on their side if they choose to expose it or not.
Corey: Part of the challenge too, is the messaging and the communication around it and who gets credit and the rest. And it’s weird, whenever they release some additional feature to one of their big headline services, there are blog posts, there are keynote speeches, there are customer references, they go on speaking tours, and the emails, oh, God, they never stopped the emails talking about how amazing all of these things are. But whenever there’s a security vulnerability or a disclosure like this—and to be fair, AWS’s response to this speaks very well of them—it’s like you have to go sneak down into the dark sub-basement, with the filing cabinet behind the leopard sign and the rest, to even find out that these things exist. And I feel like they’re not doing themselves any favors by developing that reputation for lack of transparency around these things. “Well, while there was no customer impact, so why would we talk about it?”
Because otherwise, you’re setting up a myth that there never is a vulnerability on the side of—what is it that you’re building as a cloud provider. And when there is a problem down the road—because there always is going to be; nothing is perfect—people are going to say, “Hey, wait a minute. You didn’t talk about this. What else haven’t you talked about?”
And it rebounds on them with sometimes really unfortunate side effects. With Azure as a counterexample here, we see a number of Azure exploits where, “Yeah, turned out that we had access to other customers’ data and Azure had no idea until we told them.” And Azure does it statements about, “Oh, we have no evidence of any of this stuff being used improperly.” Okay, that can mean that you’ve either check your logs and things are great or you don’t have logging. I don’t know that necessarily is something I trust.
Conversely, AWS has said in the past, “We have looked at the audit logs for this service dating back to its launch years ago, and have validated that none of that has never been used like this.” One of those responses breeds an awful lot of customer trust. The other one doesn’t. And I just wish AWS knew a little bit more how good crisis communication around vulnerabilities can improve customer trust rather than erode it.
Gafnit: Yes, and I think that, as you said, there will always be vulnerabilities. And I think that we are expecting to find more, so being able to communicate as clearly as you can and to expose things about maybe the fakes and how the investigation is being done, even in a high level, for all the vulnerabilities can gain more trust from the customer side.
Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.
Corey: You have experience in your background specifically around application security and cloud security research. You’ve been doing this for seven years at this point. When you started looking into this, did you come at the RDS vulnerability exploration from a perspective of being deeper on the Postgres side or deeper on the AWS side of things?
Gafnit: So, it was both. I actually came to the RDS lead from another service where there was something [about 00:18:21] in the application level. But then I reached to an RDS and thought, well, it will be really nice to find thing over here and to reach the underlying machine. And when I entered to the RDS zone, I started to look at it from the application security eyes, but you have to know the cloud as well because there are integrations with S3, you need to understand the IAM model. So, you need a mix of both to exploit specifically this kind of issue. But you can also be database experts because the payload is a pure SQL.
Corey: It always seems to me that this is an inherent risk in trying to take something that is pre-existing is an open-source solution—Postgres is one example but there are many more—and offer it as a managed service. Because I think one of the big misunderstandings is that when—well, AWS is just going to take something like Redis and offer that as a managed service, it’s okay, I accept that they will offer a thing that respects the endpoints and then acts as if it were Redis, but under the hood, there is so much in all of these open-source projects that is built for optionality of wherever you want to run this thing, it will run there; whatever type of workload you want to throw at it, it can work. Whereas when you have a cloud provider converting these things into a managed service, they are going to strip out an awful lot of those things. An easy example might be okay, there’s this thing that winds up having to calculate for the way the hard drives on a computer work and from a storage perspective.
Well, all the big cloud providers already have interesting ways that they have solved storage. Every team does not reimplement that particular wheel; they use in-house services. Chubby’s file locking, for example, over on Google side is a classic example of this that they’ve talked about an awful lot so every team building something doesn’t have to rediscover all of that. So, the idea that, oh, we’re just going to take up this open-source thing, clone it off a GitHub, fork it, and then just throw it into production as a managed service seems more than a little naive. What’s your experience around seeing, as you get more [laugh] into the weeds of these things than most customers are allowed to get, what’s your take on this?
Do you find that this looks an awful lot like the open-source version that we all use? Or is it something that looks like it has been heavily customized to take advantage of what AWS is offering internally as underlying bedrock services?
Gafnit: So, from what I saw until now, they do want to save the functionality so you will have the same experience as you’re working with the same service that not on AWS because you’re you are used to that. So, they are not doing dramatic changes, but they do want to reduce the risk in the security space. So, there will be some functionalities that they will not let you to do. And this is because of the managed party in areas where the full workload is deployed in your account and you can access it anyway, so they will not have the same security restrictions because you can access the workload anyway. But when it’s managed, they need to prevent you from accessing the underlying host, for example. And they do the changes, but they’re really picked to the specific actions that can lead you to that.
Corey: It also feels like RDS is something of a, I don’t want to call it a legacy service because it is clearly still very much actively developed, but it’s what we’ll call it a ‘classic service.’ When I look at a new AWS launch, I tend to mentally bucket them into two things. There’s the cloud-native approach, and we’ve already talked about DynamoDB. That would be one example of this. And there’s the cloud-hosted model where you have to worry about things like instances and security groups and the networking stuff, and so on and so forth, where it’s basically feels like they’re running their thing on top of a pile of EC2 instances, and that abstraction starts leaking.
Part of me wonders if looking at some of these older services like RDS, they made decisions in the design and build out of these things that they might not if they were to go ahead and build it out today. I mean, Aurora is an example of what that might look like. Have you found as you start looking around the various security foibles of different cloud services, that the security posture of some of the more cloud-native approaches is better or worse or the same as the cloud-hosted world?
Gafnit: Well, so for example, in the several issues that were found, and also here in the RDS where you can see credentials in a file, this is not a best practice in security space. And so, definitely there are things to improve, even if it’s developed on the provider side. But it’s really hard to answer this question because in a managed area where you don’t have any access, it’s hard to tell how it’s configured and if it’s configured properly. So, you need to have some certification from their side.
Corey: This is, on some level, part of the great security challenge, especially for something that is not itself open-source, where they obviously have terrific security teams, don’t get me wrong. At no point do I want to ever come across a saying, “Oh, those AWS people don’t know how security works.” That is provably untrue. But there is something to be said for the value of having a strong community in the security space focusing on this from the outside of looking at these things, of even helping other people contextualize these things. And I’m a little disheartened that none of the major cloud providers seem to have really embraced the idea of a cloud security community, to the point where the one that I’m most familiar with, the cloud security forum Slack team seems to be my default place where I go for context on things.
Because I dabble. I keep my hand in when it comes to security, but I’m certainly no expert. That’s what people like you are for. I make fun of clouds and I work on the billing parts of it and that’s about as far as it goes for me. But being able to get context around is this a big deal? Is this description that a company is giving, is it accurate?
For example, when your post came out, I had not heard of Lightspin in this context. So, reaching out to a few people I trusted, is this legitimate? The answer was, “Yes. It’s legitimate and it’s brilliant. That’s a company that keep your eye on.” Great. That’s useful context and there’s no way to buy that. It has to come from having those conversations with people in the [broader 00:24:57] sense of the community. What’s your experience been looking at the community side of the world of security?
Gafnit: Well, so I think that the cloud security has a great community, and this is one of the things that we at Lightspin really want to increase and push forward. And we see ourselves as a security-driven company. We always do the best to publish a post, even detailed posts, not about vulnerabilities, about how things works in the cloud and how things are being evaluated, to release open-source tools where you can use them to check your environment even if you’re not a customer. And I think that the community is always willing to explain and to investigate together. And it’s a welcome effort, but I think that the messaging should be also for all layers, you know, also for the DevOps and the developers because it can really help if it will start from this point from their side, as well.
Corey: It needs to be baked in, from start to finish.
Gafnit: Yeah, exactly.
Corey: I really want to thank you for taking the time out of your day to speak with me today. If people want to learn more about what you’re up to, where’s the best place for them to find you?
Gafnit: So, you can find me on Twitter and on LinkedIn, and feel free to reach out.
Corey: We will, of course, put links to that in the [show notes 00:26:25]. Thank you so much for being so generous with your time today. I appreciate it.
Gafnit: Thank you, Corey.
Corey: Gafnit Amiga, Director of Security Research at Lightspin. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, and if it’s on the YouTubes, smash the like and subscribe buttons, which I’m told are there. Whereas if you’ve hated this podcast, same story, like and subscribe and the buttons, leave a five-star review on a various platform, but also leave an insulting, angry comment about how my observation that our IAM policies are all full of stars is inaccurate. And then I will go ahead and delete that comment later because you didn’t set a strong password.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Announcer: This has been a HumblePod production. Stay humble.
Join our newsletter
2021 Duckbill Group, LLC