Throwing Houlihans at MongoDB with Rick Houlihan
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: The company 0x4447 builds products to increase standardization and security in AWS organizations. They do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain and fail-tolerant solutions, one of which is their VPN product built on top of the popular OpenVPN project which has no license restrictions; you are only limited by the network card in the instance. To learn more visit: snark.cloud/deployandgo
Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. A year or two before the pandemic hit, I went on a magical journey to a mythical place called Australia. I know, I was shocked as anyone to figure out that this was in fact real. And while I was there, I gave the opening keynote at a conference that was called Latency Conf, which is great because there’s a heck of a timezone shift, and I imagine that’s what it’s talking about.
The closing keynote was delivered by someone I hadn’t really heard of before, and he started talking about single table design with respect to DynamoDB, which, okay, great; let’s see what he’s got to say. And the talk started off engaging and entertaining and a high-level overview and then got deeper and deeper and deeper and I felt, “Can I please be excused? My brain is full.” That talk was delivered by Rick Houlihan, who now is the Director of Developer Relations for Strategic Accounts over at MongoDB, and I’m fortunate enough to be able to get him here to more or less break down some of what he was saying back then, catch up with what he’s been up to, and more or less suffer my slings and arrows. Rick, thank you for joining me.
Rick: Great. Thanks, Corey. I really appreciate—you brought back some memories, you know, trip down memory lane there. And actually, interestingly enough, that was the world’s introduction to single table design was that. That was my dry-run rehearsal for re:Invent 2018 is where I delivered that talk, and it has become since the most positive—
Corey: This was two weeks before re:Invent, which was just a great thing. I’d been invited to go; why not? I figured I’d see a couple of clients I had out in that direction. And I learned things like Australia is a big place. So, doing a one-week trip, including Sydney, Melbourne, and Perth. Don’t do that.
Rick: I had no idea that it took so long to fly from one side to the other, right? I mean, that’s a long plane [laugh] [crosstalk 00:02:15]—
Corey: Oh, yeah. And you were working at AWS at the time—
Rick: Absolutely.
Corey: —so I can only assume that they basically stuffed you into a dog kennel and threw you underneath the seating area, given their travel policy?
Rick: Well, you know, I have the—[clear throat] actually at the time, they just upgraded the policy to allow the intermediate seating, right? So, if you wanted to get the—
Corey: Ohhh—
Rick: I know—
Corey: Big spender. Big spender.
Rick: Yes, yes. I can get a little bit extra legroom, so I didn’t have my knees shoved into some of these back. But it was good.
Corey: So, let’s talk about, I guess… we’ll call it the elephant in the room. You were at MongoDB, where you were a big proponent of the whole no-SQL side of the world. Then you went to go work at AWS and you carried the good word of DynamoDB far and wide. It made an impression; I built my entire newsletter pipeline production system on top of DynamoDB. It has the same data in three different tables because I’m not good at listening or at computers.
But now you’re back at Mongo. And it’s easy to jump to the conclusion of, “Oh, you’re just shilling for whoever it is that happens to sign your paycheck.” And at this point, are you—what’s the authenticity story? But I’ve been paying attention to what you’ve been saying, and I think that’s a bad take because you have been saying the same things all along since before you were on the Dynamo side of it. I do some research for this show, and you’ve been advocating for outcomes and the right ways to do things. How do you view it?
Rick: That’s basically the story here, right? I’ve always been a proponent of NoSQL. You know, what I took—the knowledge—it was interesting, the knowledge I took from MongoDB evolved as I went to AWS and I delivered, you know, thousands of applications and deployed workloads that I’d never even imagined I would have my hands on before I went there. I mean, honestly, what a great place it was to cut your teeth on data modeling at scale, right? I mean, that’s the—there is no greater scale.
That’s when you learn where things break. And honestly, a lot of the lessons I took from MongoDB, well, when I applied them at scale at AWS, they worked with varying levels of success, and we had to evolve those into the sets of design patterns, which I started to propose for DynamoDB customers, which had been highly effective. I still believe in all those patterns. I would never tell somebody that they need to drop everything and run to MongoDB, but, you know, again, all those patterns apply to MongoDB, too, right? A very—a lot—I wouldn’t say all of them, but many of them, right?
So, I’m a proponent of NoSQL. And I think we talked before the call a little bit about, you know, if I was out there hocking relational technology right now and saying RDBMS is the future, then everybody who criticizes anything I say, I would absolutely have to, you know, say that there’s some validity there. But I’m not saying anything different I’ve ever said. MongoDB announced Serverless, if you remember, in July, and that was a big turning point for me because the API that we offer, the developer experience for MongoDB is unmatched, and this is what I talk to people now. And it’s the patterns that I’ve always proposed, I still model data the same way, I don’t do it any different, and I’ve always said, if you go back to my earlier sessions on NoSQL, it’s all the same.
It doesn’t matter if it’s MongoDB, DynamoDB, or any other technology. I’ve always shown people how to model their data and NoSQL and I don’t care what database you’re using, I’ve actually helped MongoDB customers do their job better over the years as well. So.
Corey: Oh, yeah. And looking back at some of your early talks as well, you passed my test for, “Is this person a shill?” Because you wound up in those talks, addressing head-on when is a relational model the right thing to do? And then you put the answers up on a slide, and this—and what—it didn’t distill down to, “If you’re a fool.”
Rick: [laugh].
Corey: Because there are use cases where if you don’t [unintelligible 00:05:48] your access patterns, if you have certain constraints and requirements, then yeah. That you have always been an advocate for doing the right thing for the workload. And in my experience, for my use cases, when I looked at MongoDB previously, it was not a fit for me. It was very much a you run this on an instance basis, you have to handle all this stuff. Like three—you kno, keeping it in triplicate in three different DynamoDB tables, my newsletter production pipeline now, including backups and the rest, of DynamoDB portion has climbed to the princely sum of $1.30 a month, give or take.
Rick: A month. Yes, exactly.
Corey: So, there’s no answer for that there. Now that Mongo Serverless is coming out into the world, oh, okay, this starts to be a lot more compelling. It starts to be a lot more flexible.
Rick: I was just going to say, for your use case there, Corey, you’re probably looking at the very similar pricing experience now, with MongoDB Serverless. Especially when you look at the pricing model, it’s very close to the on-demand table model. It actually has discounted tiering above it, which I haven’t really broken it down yet against a provision capacity model, but you know, there’s a lot of complexity in DynamoDB pricing. And they’re working on this, they’ll get better at it as well, but right now you have on-demand, you have provisioned throughput, you have [clear throat] reserved capacity allocations. And, you know, there’s a time and place for all of those, but it puts the—again, it’s just complexity, right?
This is the problem that I’ve always had with DynamoDB. I just wish that we’d spent more time on improving the developer experience, right, enhancing the API, implementing some of these features that, you know, help. Let’s make single table design a first-class citizen of the DynamoDB API. Right now it’s a red—it’s a—I don’t want to say redheaded stepchild, I have two [laugh] I have two redhead children and my wife is redhead, but yeah. [laugh].
Corey: [laugh]. That’s—it’s—
Rick: That’s the way it’s treated, right? It’s treated like a stepchild. You know, it’s like, come on, we’re fully funding the solutions within our own umbrella that are competing with ourselves, and at the same time, we’re letting the DynamoDB API languish while our competitors are moving ahead. And eventually, it just becomes, you know, okay, guys, I want to work with the best tooling on the market, and that’s really what it came down to. As long as DynamoDB was the king of serverless, yes, absolutely; best tooling on the market.
And they still are [clear throat] the leader, right? There’s no doubt that DynamoDB is ahead in the serverless landscape, that the MongoDB solution is in its nascency. It’s going to be here, it’s going to be great, that’s part of what I’m here for. And that’s again, getting back to why did you make the move, I want to be part of this, right? That’s really what it comes down to.
Corey: One of the things that I know that was my own bias has always been that if I’m looking at something like—that I’m looking at my customer environments to see what’s there, I can see DynamoDB because it has its own line item in the bill. MongoDB is generally either buried in marketplace charges, or it’s running on a bunch of EC2 instances, or it just shows up as data transfer. So, it’s not as top-of-mind for the way that I view things in… through the lens of you know, billing. So, that does inform my perception, but I also know that when I’m talking to large-scale companies about what they’re doing, when they’re going all-in on AWS, a large number of them still choose things like Mongo. When I’ve asked them why that is, sometimes you get the answer of, “Oh, legacy. It’s what we built on before.” Cool—
Rick: Sure.
Corey: —great. Other times, it’s a, “We’re not planning to leave, but if we ever wanted to go somewhere else, it’s nice to not have to reimagine the entire data architecture and change the integration points start to finish because migrations are hard enough without that.” And there is validity to the idea of a strategic exodus being possible, even if it’s not something you’re actively building for all the time, which I generally advise people not to do.
Rick: Yeah. There’s a couple things that have occurred over the last, you know, couple of years that have changed the enterprise CIO and CTO's assessment of risk, right? Risk is the number one decision factor in a CTOs portfolio and a CIO’s, you know, decision-making process, right? What is the risk? What is the impact of that risk? Do I need to mitigate that risk, or do I accept that risk? Okay?
So, right now, what you’ve seen is with Covid, people have realized that you know, on-prem infrastructure is a risk, right? It used to be an asset; now it’s a risk. Those personnel that have to run that on-prem infrastructure, hey, what happens when they’re not available? The infrastructure is at risk. Okay.
So, offloading that to cloud providers is the natural solution. Great. So, what happens when you offload to a cloud provider and IAD goes down, or you know, us-east-1 goes down—we call it IAD or we used to call it IAD internally at AWS when I was there because, you know, the regions were named by airport codes, but it’s us-east-1—how many times has us-east-1 had problems? Do you want to really be the guy that every time us-east-1 goes down, you’re in trouble? What happens when people in us-east-1 have trouble? Where do they go?
Corey: Down generally speaking.
Rick: [crosstalk 00:10:37]—well, if they’re well-architected, right, if they’re well-architected, what do they do? They go to us-west-2. How much infrastructure is us-west-2 have? So, if everybody in us-east-1 is well-architected, then they all go to us-west-2. What happens in us-west-2? And I guarantee you—and I’ve been warning about this at AWS for years, there’s a cascade failure coming, and it’s going to be coming because we’re well-architecting everybody to failover from our largest region to our smaller regions.
And those smaller regions, they cannot take the load and nobody’s doing any of that planning, so, you know, sooner or later, what you’re going to see is dominoes fall, okay? [clear throat]. And it’s not just going to be us-east-1, it’s going to be us-east-1 failed, and the rollover caused a cascade failure in us-west-2, which caused a cascade—
Corey: Because everyone’s failing over during—
Rick: That’s right. That’s right.
Corey: —this event the same way. And also—again, not to dunk on them unnecessarily, but when—
Rick: No, I’m not dunking.
Corey: —us-east-1 goes, down a lot of the control plane services freeze up—
Rick: Oh, of course they do.
Corey: —like [unintelligible 00:11:25].
Rick: Exactly. Oh, we not single point of failure, right? Uh-huh, exactly. There you go, Route 53, now—and that actually surprised me is DynamoDB instead of Route 53 is your primary database. So, I’m actually must have had some impact on you—
Corey: To move one workload off of Dynamo to Route 53 [crosstalk 00:11:39] issue number because I have to practice what I preach.
Rick: That’s right. Exactly.
Corey: It was weird; they the thing slower and little bit less, uh—
Rick: [laugh]. I love it when [crosstalk 00:11:45]—yeah, yeah—
Corey: —and a little bit [crosstalk 00:11:45] cache-y. But yeah.
Rick: —sure. Okay, I can understand that. [laugh].
Corey: But it made the architecture diagram a little bit more head-scratching, and really, that’s what it’s all about. Getting a high score.
Rick: Right. So, if you think about your data, right, I mean, would you rather be running on an infrastructure that’s tied to a cloud provider that could experience these kinds of regional failures and cascade failures, or would you rather have your data infrastructure go across cloud providers so that when provider has problems, you can just go ahead and switch the light bulb over on the other one and ramp right back up, right? You know? And honestly, you’re running active, active configurations and that kind of, [clear throat] you know, deployment, you know, design, and you’re never going to go down. You’re always going—
Corey: The challenge I’ve had—
Rick: —to be the one that stays up.
Corey: The theory is sound, but the challenge I’ve had in production with trying these things is that one, the thing that winds up handling the failover piece is often causes more outage than the underlying stuff itself.
Rick: Well, sure. Yeah.
Corey: Two, when you’re building something to run a workload to run in multiple cloud providers, you’re forced to use a lot of—
Rick: Lowest common denominator?
Corey: Lowest common denominator stuff. Yeah.
Rick: Yeah, yeah totally. I hear that all the time.
Corey: Unless you’re actively running it in both places, it looks like a DR Plan, which doesn’t survive the next commit to the codebase. It’s the—
Rick: I totally buy that. You’re talking about the stack, stack duplication, all that kind of—that’s an overhead and complexity, I don’t worry about at the data layer, right?
Corey: Oh, yeah.
Rick: The data layer—
Corey: If you’re talking about—
Rick: —[crosstalk 00:12:58]
Corey: —[crosstalk 00:12:58] data layer, oh, everything you’re saying makes perfect sense.
Rick: Makes perfect sense, right? And honestly, you know, let’s put it this way: If this is what you want to do—
Corey: What do you mean identity management and security handover working differently? Oh, that’s a different team’s problem. Oh, I miss those days.
Rick: Yeah, you know, totally right. It’s not ideal. But you know, I mean, honestly, it’s not a deal that somebody wants to manage themselves, is moving that data around. The data is the lock-in. The data is the thing that ties you to—
Corey: And the cost of moving it around in some cases, too.
Rick: That’s exactly right. You know, so you know, having infrastructure that spans providers and spans both on-prem and cloud, potentially, you know, that can span multiple on-prem locations, man, I mean, that’s just that’s power. And MongoDB provides that; I mean, DynamoDB can’t. And that’s really one of the biggest limitations that it will always have, right? And we talked about, and I still believe in the power of global tables, and multi-region deployments, and everything, it’s all real.
But these types of scenarios, I think this is the next generation of failure that the cloud providers are not really prepared for, they haven’t experienced it, they don’t know what it’s even going to look like, and I don’t think you want to be tied to a single provider when these things start happening, right, if you have a large amount of infrastructure deployed someplace. It just seems like [clear throat] that’s a risk that you’re running at these days, and you can mitigate that risk somewhat by going with a MongoDB Atlas. I agree, all those other considerations. But you know, I also heard—it’s a lot of fun, too, right? There’s a lot of fun in that, right?
Because if you think about it, I can deploy technologies in ways on any cloud provider, they’re going to be cloud provider agnostic, right? I can use, you know, containerized technologies, Kubernetes, I can use—hell, I’m not even afraid to use Lambda functions, and just, you know, put a wrapper around that code and deploy it both as a Lambda or a Cloud Function in GCP. The code’s almost the same in many cases, right? What it’s doing with the data, you can code this stuff in a way—I used to do it all the time—you abstract the data layer, right? Create a DAL. How about a CAL? A cloud [laugh] cloud access layer, right, you know? [laugh].
Corey: I wish, on some level, we could go down some of these paths. And someone asked me once a while back of, “Well, you seem to have a lot of opinions on this. Do you think you could build a better cloud than AWS?” And my answer—
Rick: Hell yes.
Corey: —look them a bit by surprise of, “Absolutely. Step one, I want similar resources, so give me $20 billion to spend”—
Rick: I was going to say, right?
Corey: —”then I’m going to hire the smart people.” Not that we’re somehow smarter or better or anything else than the people who built AWS originally, but now—
Rick: We have all those lessons learned.
Corey: —we have fifteen years of experience to fall back on.
Rick: Exactly.
Corey: “Oh. I wouldn’t make that mistake again.”
Rick: Exactly. Don’t need to worry about that. Yeah exactly.
Corey: You can’t just turn off a cloud service and relaunch it with a completely different interface and API and the rest.
Rick: People who criticize, you know, services like DynamoDB, like—and other AWS services—look, these things are like any kind of retooling of the services, it’s like rebuilding the engine on the airplane while it’s flying.
Corey: Oh, yeah.
Rick: And you have to do it with a level of service assurance that—I mean, come on. DynamoDB provides four nines out of the box, right? Five nines if you turn on global tables. And they’re doing this at the same time as they have pipeline releases dropping regularly, right? So, you can imagine what kind of, you know, unit testing goes on there, what kind of Canary deployments are happening.
It’s just, it’s an amazing infrastructure that they maintain, incredibly complex, you know? In some ways, these are lessons that we need to learn in MongoDB if we’re going to be successful operating a shared backplane serverless, you know, processing fabric. We have to look at what DynamoDB does right. And we need to build our own infrastructure that mirrors those things, right? And in some ways, these things are there, in some ways, they’re working on, in some ways, we got a long ways to go.
But you know, I mean, it’s this is the exciting part of that journey for me. Now, in my case, I focus on strategic accounts, right? Strategic accounts are big, you know, they’re the potential to be our whale customers, right? These are probably not customers who would be all that interested in serverless, right? They’re customers that would be more interested in provisioned infrastructure because they’re the people that I talked to when I was at DynamoDB; I would be talking to customers who are interested in like, reserved capacity allocations, right? If you’re talking about—
Corey: Yeah, I wanted to ask you about that. You’re developer advocacy—which I get—for strategic accounts.
Rick: Right.
Corey: And I’m trying to wrap my head around—
Rick: Why [crosstalk 00:17:19]—
Corey: [crosstalk 00:17:19] strategic accounts are the big ones, potential spend lots of stuff. Why do they need special developer advocacy?
Rick: [laugh]. Well, yeah, it’s funny because, you know, one of the reasons why it started talking to Mark Porter about this, you know, was the fact that, you know, the overlap is really around [clear throat] the engagements that I ran when I was doing the Amazon retail migration, right? When Amazon retail started to move to NoSQL, we deprecated 3000 Oracle server instances, we moved a large percentage of those workloads to NoSQL. The vast majority probably just were lift-and-shift into RDS and whatnot because they were too small, too old, not worth upgrading whatnot, but every single tier, what we call tier-one service, right, every money-making service was redesigned and redeployed on DynamoDB, right? So, we’re talking about 25,000 developers that we had to ramp. This is back four years ago; now we have, like, 75,000.
But back then we had 25,000 global developers, we had [clear throat] a technology shift, a fundamental paradigm shift between relational modeling and NoSQL modeling, and the whole entire organization needed to get up to speed, right? So, it was about creating a center of excellence, it was about operating as an office of the CTO within the organization to drive this technology into the DNA of our company. And so that exercise was actually incredibly informative, educational, in that process of executing a technology transformation in a major enterprise. And this is something that we want to reproduce. And it’s actually what I did for Dynamo as well, really more than
anything.
Yes, I was on Twitter, I was on Twitch, I did a lot of these things that were kind of developer advocate, you know, activities, but my primary job at AWS was working with large strategic customers, enabling their teams, you know, teaching them how to model their data in NoSQL, and helping them cross the chasm, right, from relational. And that is advocacy, right? The way I do it is I use their workloads. [clear throat]. I use their—the customers, you know, project teams themselves, I break down their models, I break down their access patterns when I leave, essentially—with the whole day of design reviews, we’ll walk through 12 or 15 workloads, and when I leave these guys have an idea: How would I do it if I wanted to use NoSQL, right?
Give them enough breadcrumbs so that they can actually say, “Okay, if I want to take it to the next step, I can do it without calling up and say, ‘Hey, can we get a professional services team in here?’” right? So, it’s kind of developer advocacy and it’s kind of not, right? We’re kind of recognizing that these are whales, these are customers with internal resources that are so huge, they could suck our Developer’s Advocacy Team in and chew it up, right? So, what we’re trying to do is form a focus team that can hit hard and move the needle inside the accounts. That’s what I’m doing. Essentially, it’s the same work I did for [clear throat] AWS for DynamoDB. I’m just doing it for, you know—they traded for a new quarterback. Let’s put it that way. [laugh].
Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They’ve also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That’s S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.
Corey: So, one thing that I find appealing about the approach maps to what I do in the world of cloud economics, where I—like, in my own environment, our AWS bill is creeping up again—we have 14 AWS accounts—and that’s a little over $900 a month now. Which, yeah, big money, big money.
Rick: [laugh].
Corey: In the context of running a company, that no one notices or cares. And our customers spend hundreds of millions a year, pretty commonly. So, I see the stuff in the big accounts and I see the stuff in the tiny account here. Honestly, the more interesting stuff is generally in on the smaller side of the scale, just because you’re not going to have a misconfiguration costing a third of your bill when a third of your bill is $80 million a year. So—
Rick: That’s correct. If you do then that’s a real problem, right?
Corey: Oh yeah.
Rick: [laugh].
Corey: It’s very much a two opposite ends of a very broad spectrum. And advice for folks in one of those situations is often disastrous to folks on the other side of that.
Rick: That’s right. That’s right. I mean, at some scale, managing granularity hurts you, right? The overhead of trying to keep your costs, you know, it—but at the same time, it’s just different, a different measure of cost. There’s a different granularity that you’re looking at, right? I mean, things below a certain, you know, level stop becoming important when, you know, the budget start to get a certain scale or a certain size, right? Theoretically—
Corey: Yeah, for there’s certain workloads, things that I care about with my dollar-a-month Dynamo spend, if I were to move that to Mongo Serverless, great, but my considerations are radically different than a company that is spending millions a month on their database structure.
Rick: That’s right. Really, that’s what it comes down to.
Corey: Yeah, we don’t care about the pennies. We care about is it going to work? How do we back it up? What’s the replication factor?
Rick: And that—but also, it’s more than that. It’s, you know, for me, from my perspective, it really comes down to that, you know, companies are spending millions of dollars a year in database services. These are companies that are spending ten times that, five times that, in you know, in developers, you know, expense, right? Building services, maintaining the code that runs—that the services run.
You know, the biggest problem I had with MongoDB is the level of code complexity. It’s a cut after cut after cut, right? And the way I kind of describe the experience—and other people have described it to me; I didn’t come up with this analogy. I had a customer tell me this as they were leaving DynamoDB—“DynamoDB is death by a thousand cuts. You love it, you start using it, you find a little problem, you start fixing it. You start fixing it. You start fixing—you come up with a pattern. Talk to Rick, he’ll come up with something. He’ll tell you how to do that.” Okay?
And you know, how many customers did I would do this with? You know, and it’s honestly, they’re 15-minute phone calls for me, but every single one of those 15-minute phone calls turns into eight hours of developer time writing the code, debugging it, deploying it over and over again, it’s making sure it’s going the way it’s [crosstalk 00:23:02]—
Corey: Have another 15-minute call with Rick, et cetera, et cetera. Yeah.
Rick: Another 15—exactly. And it’s like okay, that’s you know—eventually, they just get tired of it, right? And I actually had a customer that tell me—a big customer—tell me flat out, “Yeah, you proved that the DynamoDB can support our workload and it’ll probably do it cheaper, but I don’t have a half-a-dozen Ricks on my team, right? I don’t have any Ricks on my team. I can’t be getting you in here every single time we have to do a complex data model overhaul, right?”
And this was—granted, it was one of the more complex implementations that I’ve ever done. In order to make it work. I had to overload the fricking table with multiple access patterns on the partition key, something I never done in my life. I made it work, but it was just—honestly, that was an exercise to me that taught me something. If I have to do this, it’s unnatural, okay?
And that’s—[laugh] you know what I mean? And honestly, there’s API improvements that we could have done to make that less of a problem. It’s not like we haven’t known since the last, I don’t know, I joined the company that a thousand WCUs per storage partition was pretty small. Okay? We’ve kind of known that for I don’t know, since DynamoDB, was invented. As matter of fact is, from what I know, talking to people who were around back then, that was a huge bone of contention back in the day, right? A thousand WCUs, ten
gigabytes, there were a lot of the PEs on the team that were going, “No way. No way. That’s way too small.” And then there were other
people that were like, “Nah, nobody’s ever going to need more than that.” And you know, a lot of this was based on the analysis of
[crosstalk 00:24:28]—
Corey: Oh, nothing ever survives first contact from—
Rick: Of course.
Corey: —customer, particularly a customer who is not themselves deeply familiar with what’s happening under the hood. Like, I had this problem back when I was traveling trainer for Puppet for a while. It was, “Great. Well, Puppet is obviously a piece of crap because everyone I talked to has problems with it.” So, I was one of the early developers behind SaltStack—
Rick: Oh nice.
Corey: —and, “Ah, this is going to be a thing of beauty and it’ll be awesome.” And that lasted until the first time I saw what somebody’s done with it in the wild. It was, “Oh, okay, that’s an [unintelligible 00:25:00] choice.”
Rick: Okay, that’s how—“Yeah, I never thought about that,” right? Happy path. We all love the happy path, right? As we’re working with technologies, we figure out how we like to use it, we all use it that way. Of course, you can solve any problem you want the way that you’d like to solve it. But as soon as someone else takes that clay, they mold a different statue and you go, “Oh, I didn’t realize it could look like that.” Right, exactly.
Corey: So, here’s one for you that I’ve been—I still struggle with this from time to time, but why would I, if I’m building something out—well, first off, why on earth would I do that? I have people for that who are good at things—but if I’m building something out and it has a database layer, why would someone choose NoSQL over—
Rick: Oh, sure.
Corey: —over SQL?
Rick: [crosstalk 00:25:38] question.
Corey: —and let me be clear here—and I’m coming at this from the perspective of someone who, basically me a few years ago, who has no real understanding of what databases are. So, my mental model of a database is Microsoft Excel, where I can fire up a [unintelligible 00:25:51] table of these things—
Rick: Sure. [laugh]. Hey, well then, you know what? Then you should love NoSQL because that’s kind of the best analogy of what is NoSQL. It’s like a spreadsheet, right? Whereas a relational database is like a bunch of spreadsheets, each with their own types of rows, right? So—[laugh].
Corey: Oh, my mind was blown with relational stuff [unintelligible 00:26:07] wait, you could have multiple tables? It’s, “What do you think relational meant there, buddy?” My map of NoSQL was always key and value, and that was it. And that’s all it can be. And sure, for some things, that’s what I use, but not everything.
Rick: That’s right. So, you know, the bottom line is, when you think about the relational database, it all goes back to, you know, the first paper ever written on the relational model, Edgar Codd—and I can’t remember the exact title, but he wrote the distributed model, the data model for distributed systems, something like that. He discussed, you know, the concept of normalization, the power of normalization, why you would want this. And the reason why we wanted this, why he thought this was important, this actually kind of demonstrates how—boy, they used to write killer abstracts to papers, right? It’s like the very first sentence, this is why I’m write in this paper. You read the first sentence, you know: “Future users of modern computer systems must have a way to be able to ask questions of the data without knowing how to write code.”
I mean, I don’t know if those were the words, but that was basically what he said, that was why he invented the normalized data model. Because, you know, with the hierarchical management systems at the time, everyone had to know everything about the data in order to be able to get any answers, right? And he was like, “No, I want to be able to just write a question and have the system answer that.” Now, at the time, a lot of people felt like that’s great, and they agreed with his normalized model—it was elegant—but they all believe that the CPU overhead at the time was way too high, right? To generate these views of data on the fly, no freaking way. Storage is expensive. But it ain’t that expensive, right?
Well, this little thing called Moore’s Law, right? Moore’s Law balanced his checkbook for, like, 40 years, 50 years, it balanced the relational database checkbook, okay? So, as the CPUs got faster and faster, crunching, the data became less and less of a problem, okay? And so we crunched bigger and bigger data sets, we got very, very happy with this. Up until about 2014.
At 2014, a really interesting thing happened. If you look at the top 500, which is the supercomputers, the top 500 supercomputing clusters around the world, and you look at their performance increases year-to-year after 2014, it went off a cliff. No longer beating Moore’s Law. Ever since, they’ve been—and per-core performance, you know, CPU, you know, instructions executed per second, everything. It’s just flattening. Those curves are flattening. Moore’s Law is broken.
Now, you’ll get people argue about it, but the reality is, if it wasn’t broken, the top 500 would still be cruising away. They’re not. Okay? So, what this is telling us is that the relational database is losing its horsepower. Okay?
Why is it happening? Because, you know, gate length has an absolute minimum, it’s called zero, right? We can’t have a logic gate that’s the—with negative distance, right? [laugh]. So, you know, these things—but storage, storage, hey, it just keeps on getting cheaper and cheaper, right?
We’re going the other way with storage, right? It’s gigabytes, it’s terabytes, it’s petabytes, you know, with CPU, we’re going smaller and smaller and smaller, and the fab cost is increasing. There’s just—it’s going to take a next-generation CPU technology to get back on track with Moore’s Law.
Corey: Well, here’s the challenge. Everything you’re saying makes perfect sense from where your perspective is. I reiterate, you are working with strategic accounts, which means ‘big.’ When I’m building something out in the evenings because I want to see if something is possible, performance considerations and that sort of characteristic does not factor into it. When I’m a very small-scale, I care about cost to some extent—sure, whatever—but the far more expensive aspect of it, in the ways that matter have been what is the expensive—what—the big expensive piece is—
Rick: We’ve talked about it.
Corey: —engineering time—
Rick: That’s what we just talked about, right?
Corey: —where it’s, “What I’m I familiar with?”
Rick: As a developer, right, why would I use MongoDB over DynamoDB? Because the developer experience [crosstalk 00:29:33]—
Corey: Exactly. Sure, down the road there are performance characteristics and yeah, at the time I have this super-large, scaled-out, complex workload, yeah, but most workloads will not get to that.
Rick: Will not ever get there. Ever get there. [crosstalk 00:29:45]—
Corey: Yeah, so optimizing for [crosstalk 00:29:45], how’s it going to work when I’m Facebook-scale? It’s—
Rick: So, first of—no, exactly, Facebook scale is irrelevant here. What I’m talking about is actually a cost ratchet that’s going to lever on midsize workloads soon, right? Within the next four to five years, you’re going to see mid-level workloads start to suffer from significant performance cost deficiencies compared to NoSQL workloads running on the same. Now you—hell, you see it right now, but you don’t really experience it, like you said, until you get to scale, right? But in midsize workloads, [clear throat] that’s going to start showing up, right? This cost overhead cannot go away.
Now, the other thing here that you got to understand is, just because it’s new technology doesn’t make it harder to use. Just because you don’t know how to use something, right, doesn’t mean that it’s more difficult. And NoSQL databases are not more difficult than the relational database. I can express every single relationship in a NoSQL database that I express in a relational database. If you think about the modern OLTP applications, we’ve done the analysis, ad nauseum: 70% of access patterns are for a single object, a single row of data from a single table; another 20% are for a row of datas—a range of rows from a single table. Okay, that leaves only 10% of your access patterns involve any kind of complex table traversal or entity traversals. Okay?
And most of those are simple one-to-many hierarchies. So, let’s put those into perspective here: 99% of the access patterns in an OLTP application can be modeled without denormalization in a single table. Because single table doesn’t require—just because I put all the objects in one place doesn’t mean that it’s denormalized. Denormalized requires strong redundancies in the stored set. Duplication of data. Okay?
Edgar Codd himself said that the normalized data model does not depend on storage, that they are irrelevant. I could put all the objects in the same document. As long as there’s no duplication of data, there’s no denormalization. I know, I can see your head going, “Wow,” but it’s true, right? Because as long as I can clearly express the relationships of the data without strong redundancies, it is a normalized data model.
That’s what most people don’t understand. NoSQL does not require denormalization. That’s a decision you make, and it usually happens when you have many-to-many relationships; then we need to start duplicating the data.
Corey: In many cases, at least my own experience—because again, I am bad at computers—I find that the data model is not something that is sat out—that you sit down and consciously plan very often. It’s rather something—
Rick: Oh yeah.
Corey: —happens to you instead. I mean—
Rick: That’s right. [laugh].
Corey: —realistically, like, using DynamoDB for this is aspirational. I just checked, and if I look at—so I started this newsletter back in March of 2017. I spun up this DynamoDB table that backs it, and I know it’s the one that’s in production because it has the word ‘test’ in its name, because of course it does. And I’m looking into it, and it has 8700 items in it now and it’s 3.7 megabytes. It’s—
Rick: Sure, oh boy. Nothing, right?
Corey: —not for nothing, this could have been just as easily and probably less complex for my level of understanding at the time, a CSV file that I—
Rick: Right. Exactly, right.
Corey: —grabbed from a Lambda out of S3, do the thing to it, and then put it back.
Rick: [unintelligible 00:32:45]. Right.
Corey: And then from a performance and perspective side on my side, it would make no discernible difference.
Rick: That’s right because you’re not making high-velocity requests against the small object. It’s just a single request every now and then.
S3 performance would probably—you might even be less. It might even cost you less to use S3.
Corey: Right. And 30 to 100 of the latest ones are the only things that are ever looked at in any given week, the rest of it is mostly deadstock that could be transitioned out elsewhere.
Rick: Exactly.
Corey: But again, like, now that they have their lower cost infrequent access storage, then great. It’s not item level; it’s table levels, so
what’s the point? I can knock that $1.30 a month down to, what, $1.10?
Rick: Oh well, yeah, no, I mean, again, Corey for those small workloads, you know what? It’s like, go with what you know. But the reality is, look, as a developer, we should always want to know more, and we should always want to know new things, and we should always be aware of where the industry is headed. And honestly, I’ve heard through—I’m an old, old school, relational guy, okay, I cut my teeth on—oh, God, I don’t even know what version of MS SQL Server it was, but when I was, you know, interviewing at MongoDB. I was talking to Dan Pasette, about the old Enterprise Manager, where we did the schema designer and all this, and we were reminiscing about, you know, back in the day, right?
Yeah, you know, reality of things are is that if you don’t get tuned into the new tooling, then you’re going to get left behind sooner or later. And I know a lot of people who that has happened to over the years. There’s a reason why I’m 56 years old and still relevant in tech, okay? [laugh].
Corey: Mainframes, right? I kid.
Rick: Yes, mainframes.
Corey: I kid. You’re not that much older than I am, let’s be clear here.
Rick: You know what? I worked on them, okay? And some of my peers, they never stopped, right? They just kind of stayed there.
Corey: I’m still waiting for AWS/400. We don’t see them yet, but hope springs eternal.
Rick: I love it. I love that. But no, one of the things that you just said that I think it hit me really, it’s like the data model isn’t something you think about. The data model is something that just happens, right? And you know what, that is a problem because this is exactly what developers today think. They think know the relational database, but they don’t.
You talk to any DBA out there who’s coming in after the fact and cleaned up all the crappy SQL that people like me wrote, okay? I mean, honestly, I wrote some stuff in the day that I thought, “This is perfect. There’s no way that could be anything better than this,” right? Nice derived table joins insi—and you know what? Then here comes the DBA when the server is running at 90% CPU and 100% percent memory utilization and page swapping like crazy, and you’re saying we got to start sharding the dataset.
And you know, my director of engineering at the time said, “No, no, no. What we need is somebody to come in and clean up our SQL.” I said, “What do you mean? I wrote that SQL.” He’s like, “Like I said, we need someone to come and clean up our SQL.”
I said, “Okay, fine.” We brought the guy in. 1500 bucks an hour, we paid this guy, I was like, “There’s no way that this guy is going to be
worth that.” A day and a half later, our servers are running at 50% CPU and 20% memory utilization. And we’re thinking about, you know, canceling orders for additional hardware. And this was back in the day before cloud.
So, you know, developers think they know what they’re doing. [clear throat]. They don’t know what they’re doing when it comes to the database. And don’t think just because it’s a relational database and they can hack it easier that it’s better, right? Yeah, it’s, there’s no substitute for knowing what you’re doing; that’s what it comes down to.
So, you know, if you’re going to use a relational database, then learn it. And honestly, it’s a hell of a lot more complicated to learn a
relational database and do it well than it is to learn how to model your data in NoSQL. So, if you sit two developers down, and you say, “You learn NoSQL, you learn relational,” two months later, this guy is still going to be studying. This guy’s going to be writing code for seven weeks. Okay? [laugh]. So, you know, that’s what it comes down to. You want to go fast, use NoSQL and you won’t have any
problems.
Corey: I think that’s a good place to leave it. If people want to learn more about how you view these things, where’s the best place to find you?
Rick: You know, always hit me up on Twitter, right? I mean, @houlihan_rick, that’s my—underbar rick, that’s my Twitter handle. And you know, I apologize to folks who have hit me up on Twitter and gotten no response. My Twitter as you probably have as well, my message request box is about 3000 deep.
So, you know, every now and then I’ll start going in there and I’ll dig through, and I’ll reply to somebody who actually hit me up three months ago if I get that far down the queue. It is a Last In, First Out, right? I try to keep things as current as possible. [laugh].
Corey: [crosstalk 00:36:51]. My DMs are a trash fire. Apologies as well. And we will, of course, put links to it in the [show notes 00:36:55].
Rick: Absolutely.
Corey: Thank you so much for your time. I really do appreciate it. It’s always an education talking to you about this stuff.
Rick: I really appreciate being on the show. Thanks a lot. Look forward to seeing where things go.
Corey: Likewise.
Rick: All right.
Corey: Rick Houlihan Director of Developer Relations, Strategic Accounts at MongoDB. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an upset comment talking about how we didn’t go into the proper and purest expression of NoSQL non-relational data, DNS TXT records.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Announcer: This has been a HumblePod production. Stay humble.
Join our newsletter
2021 Duckbill Group, LLC