How Scaling Turns Rare Occurrences Into Common Ones with Jason Cohen
Jason: That's another thing that's true of engineering. You can do anything if you really want, and you can write stuff in any language if you really want. Doesn't mean you should, doesn't mean it's a good fit, but okay.
Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Periodically, I will have people from a variety of different companies, doing different things for different reasons, come on the podcast. Every once in a while I like to track down, okay, who's a vendor that I've used a lot of and often don't necessarily think about?
And when I started framing it that way, today's guest became relatively obvious. Jason Cohen is the founder of WP Engine. Jason, thanks for joining me.
Jason: It's great to be here.
Corey: I have a painful history with running websites at- at even small-scale, then medium-scale, then large-scale. And WordPress has been sort of a thing that has taken over the world and [00:01:00] it felt like the late nineties. Now there's still a disgusting percentage of the world that runs on top of WordPress.
I've run it myself. It was a terrific demo app for teaching people how to use Puppet. It, it has, it touches a whole bunch of different things, and when it came time to decide, I should have a website that probably is useful to work with. The first iteration that I went with personally was building my own custom thing on serverless.
This was a bad idea when it became an actual real business. I went with WordPress and figured, ah, who can I wind up finding to run this for me that isn't me? And the answer was WP Engine pretty quickly. So I've been your customer for something like seven years now. Thanks for not going down that during that time. It’s appreciated.
Jason: Oh well. Sure. Thanks for giving us money. That's, that's what we like. Yeah. I mean, WordPress currently powers forty-three percent of every domain on earth, which as you say is a staggering and unbelievable number. But there's many different data sources who all point to that. [00:02:00] And, uh, yeah, it's, it, it's because it's open, it's because there's a community.
It's because- I think it is true that once you have some, some success and momentum there, it builds on it because people know it and then they build another site, just like you said. And so there's also a big set of, uh, design agencies that use WordPress. So they are almost essentially like a sales force for WordPress.
WordPress is free, so that's in quotes. But you know, hey, if you're gonna build your agency or freelancing business off of it, clearly you're gonna advocate for that. So I think you have this set of things like that, which made it so successful. Even 14 years ago when I started WP engine, WordPress was already 11 or 12% of the web, which is already kind of infinity for a new company to be able to sell to a 10th of the web.
That's huge. And so, yeah, it's only grown from there, which is hard to believe.
Corey: People love to talk smack about it in engineering circles. It, it's PHP Who wants to work in PHP? Great. Cool. I don't want to think about PHP. I don't have to. I care about the content. What the [00:03:00] technology stack that powers my website is is never gonna be a, a determining factor in did my company succeed or not? From where I see.
Jason: Yeah, but here's the thing. Engineers hate whatever is popular. Whenever the language becomes too popular, then everyone hates it. So like at first Java was cool 'cause it was new and weird, even though it's slow and actually kind of bad. Then Java became the most popular language and everyone hates Java.
Okay. I mean, you just hate whatever it is that- whatever bug tracking system you use, I. You hate it, like almost guarantee you hate it. Okay. I, I just feel like this is, this is the standard thing that we do. Also, there's, there's a general thing in engineering where it's not necessarily the highest quality best thing that wins.
There's other factors like it being easy to try, easy to troubleshoot, easy to understand, easy to dig in under the covers and so forth. So you have things like open source projects that have all those attributes. And are they as good as some of the commercial things? In many ways, no, but it has those attributes so it wins anyways. WordPress has [00:04:00] all of these things 'cause it's open source and it's easy and it's accessible for lots of people to use it and so on and so forth.
There's an old article called Worse is Better, uh, on this, and so it shows stuff like. Some of these, uh, text formats for, for moving stuff around. It's inefficient. Like we should use other formats, like, I know, but the thing with text is you can write it, you can read it, you can look at it in your packet dumping stuff.
You can, you can mess with it easily. You can use grep, you can dump it to a log and all that other stuff is harder. And so, right. So it's worse, but it's better because it's more accessible. It's more, you know, da-da-da-da. It's more observable and so forth. So I feel like there's a lot of things. And then when those things become popular for good reason, 'cause those other attributes are good engineers like to say, I hate it because, and then they list those other types of attributes and they're not wrong that those other attributes are missing or, or are not optimized for.
But, uh, I just feel like this is very common in many things in engineering. So it doesn't bother me. In fact, maybe it means you're winning.
Corey: For me, the big reason to go WordPress is not because I have [00:05:00] some deep-seated love affair with it, if anything, just the opposite. Uh, before they wound up dying slash being absorbed by MediaTemple.
I worked on large-scale, hosted WordPress at MediaTemple for about a year, year and a half, and that was enough to teach me I didn't wanna run WordPress myself if I could possibly avoid it because running it on a laptop or in a container or who knows, probably someone isn't working on Kubernetes these days is probably not that challenging, but running anything at scale introduces an entire series of separate problems.
Jason: Right. Yeah. We run it on Kubernetes. You can run it everywhere if you want, but of course, that's another thing that's true of engineering. You can do anything if you really want, and you can write stuff in any language if you really want.
Doesn't mean you should, doesn't mean it's a good fit, but okay. You know. But yes, uh, doing anything at scale obviously is, is hard.
Corey: Yeah. When I got started, I built a bunch of serverless stuff, uh, that to run the website and power the blog and the newsletter. And then I realized that, you know, at least in the website piece of it, other people could be much better at doing a lot of these [00:06:00] things than I could.
And I didn't want all the engineering to be bottlenecked on at the time, the four people on the planet who understood these technologies that came out last week. And, but with WordPress, you swing a dead cat, you'll hit 15 people who know how to work on this, basically, no matter what room you happen to be in.
Jason: Yeah, that's part of why the spoils continue to go to the winner. Those kind of things, like what you just said. Well, whatever it is, we definitely can hire people full-time or contractors or part-time or flex or that we can definitely do it. Also, will the cloud support the tech that's behind it? Yeah, of course.
It's 43% of the web. What are they gonna not support it? It's crazy. So it's those kind of things where you go, okay, well let's just do that. And so exactly as you say is WordPress or your, or your marketing website in general, is it incredibly core to your business that it'd be unique? And the answer is almost always no.
That's not what makes us unique. What CMS we use and how the marketers mess about with an article. Like that's, that, that is really far away from what makes the product unique for almost every company. Okay. So for almost every company, like you shouldn't, [00:07:00] you should spend the least amount of time on this and you should spend enough money only so that bad things don't happen.
Like the site goes down, the site is slow, the site is hacked. Okay. Yeah. We need to spend enough money to where that's not happening. 'cause that is bad. But beyond that, there's no additional benefit. Therefore, outsourcing it to us or a competitor of ours for, for that matter just simply makes sense. It's not, it's not where you're gonna get a comparative advantage, so why are you spending your time on it other than the core is needed for it to be functional and, and, and do its job.
Corey: One of the things that I think is lost on a lot of folks is the idea of scale as being its own particular skill set. As you say, you have a an awful lot of competitors. You are not the only company in the world that provides managed WordPress hosting, and you are also by a landslide, not the least expensive.
The trouble with just getting the result of all the various companies that do these things and sorting by price from low to high, is that there's a universe of folks out there who, well, I ran my own website for a couple of years on WordPress. Didn't seem that hard. Oh, it has a multi-tenancy [00:08:00] option.
I'm gonna go ahead and spin that up, and then I'll start making money by offering that to other folks that. That starts to fall apart extremely quickly. I wanted to, I wanted to trust a company that has been there before when there's something that's going on and the website goes non-responsive for some reason.
Okay? There are people who know what they're doing looking at this. It'll be a minute or two and it'll come back. As opposed to having to wake someone up in the middle of the night 'cause they didn't realize that that's how computers worked.
Jason: It's interesting. Uh, scale is an interesting topic. It's also interesting to be expensive.
If you're used to godaddy and you pay $2.90 per month for your website, then paying us $29 a month is 10 times as much. It sounds very expensive now, co. Okay. You get what you pay for, you get service, you get, it's fast, it's scalable, blah, blah. But it is, it is expensive too, so, okay, fine. On the other hand, we have tens of thousands of larger customers.
For them, we are the low-cost alternative to what they see as website- that with their website development, which is [00:09:00] things like Adobe Experience Manager or Drupal or Site Core, these kind of things, which are millions and millions of dollars to build a website and then millions of dollars to host it and millions of dollars every time you're gonna do marketing campaigns.
So for them we are 10 times cheaper and we're the low-cost alternative as opposed to the GoDaddy side of the market. The other end of the market where we're the expensive, we better be great at that price. And so it's very interesting since the, since it's the whole internet and we are at a scale, we have 200,000 customers.
So we're at a scale where we do see every kind of person. And so it's interesting like are we expensive? It depends how you, who you ask and what, what's going on. And, and it's interesting that, that, that there's that complexity to it. But yeah, the scale's interesting 'cause I think, uh, engineers who haven't done it before have this in mind.
They say, look, what we do is we write code and we have these tools that help us automate things as in particular: infrastructure. And so with cloud formation or Ansible or you know, Docker containers, or we have all these tools to say, I want something that looks like this, or I push a button and it [00:10:00] creates a set of services that are connected.
And if I can do that once, then I can just keep pushing that button and do it 10 times or write code that pushes the button. And now, now I have a thousand servers, 10,000 servers, I'm, I have to pay more money to allocate the physical resources. But the scale takes no effort is the thought. So why is that wrong?
It is wrong, but it's not obvious why it's wrong. Like it's computers. I should just keep pushing the button. And it works.
Corey: It becomes super obvious the second time, but the first time it completely catches people by surprise.
Jason: Yeah, but why is it, what are we missing? So here's, here's the answer. Let's say you have a laptop, and let's say it's pretty high quality and pretty stable, and so it only crashes once every four years.
Not bad. It locks up and you're like, eh, you have to reboot it. Well, what, what was that? Who knows? Some really odd thing, something crashed. The operating system has a bug, A cosmic ray hit it. Who knows Something that rare. You're not gonna diagnose it. You don't even care to diagnose it because you're like, whatever.
Okay. I reboot it every couple of years. Who cares? Like it's, this is pretty good. That [00:11:00] would be a high quality laptop. Now we have 17,000 servers. Okay? So let's say they're all this good that they only crash once every four years. Randomly unpredictably. Can't prevent it. Can't say when. 'cause it's the some weird thing.
What happens when there's 17,000 of them? And by the way, our servers are doing way more stuff than your laptop. So by all rights they should crash a lot more than that. But let's suppose, let's suppose, right? Well, 17,000, you know, four years is what? Like 12, 1200 days-ish. 1300 days, 17,000 servers. So you start doing the math and you're like.
We should have totally random, unpredictable unpreventable crashes like 10 times a day. Oh, wait, what? Yeah, like crap is gonna be blowing up constantly. And we just said you could never predict or prevent it. Wait, what? Yeah. And so then you might say, okay, well fine. We'll reboot them. Yeah, I know you will.
And then how many customers will get mad about that? [00:12:00] Well, yeah, but, but it's only this tiny, tiny fraction of our customers, right? But let's say hundreds and hundreds of customers a day have downtime from your weird, unpredictable thingy. So what do they do? They all call support and you have a thousand support tickets a day.
Just from this one thing. Wait, what? Or they, how many Go to Twitter and say, you suck. I don't know every day what ?
Corey: Very few people take social media to say good things about companies, but something goes wrong. Oh, it's all over the place. The best outage detector I've ever found.
Jason: But we just said, we just agree.
We just agreed. Well, we didn't agree, but I'm just, I'm pretending like you're agreeing that like, like this is a totally normal and, and expected, like we could be the greatest ever and like this is just gonna happen. So, How do you summarize that? That that's the story that shows why it is in fact true.
And you go, oh, okay. So I summarize that by saying rare things become common. Rare being hard to detect, hard to prevent, hard to, mm-hmm. And they become common automatically simply 'cause there's a lot of them. So [00:13:00] if you roll dice enough, then things happen, right? Kind of like million monkeys sort of thought.
We also see tens of billions of web requests per day across our platform. So what kind of quality percentage would you need to not see any errors? You know, it's like, I don't know. That's a lot of zeros. I don't know. Yeah, like something impossible. Clearly impossible. So impossible. That doesn't sound very nice.
Now a couple of things to take away. One is, okay, so when we talk about quality, it's just a whole other level. By which I mean orders of magnitude different, really, really different. So is that gonna make us have very different development processes and procedures, and what does testing mean? And, and, and it blah, blah, blah, blah, blah.
Yes. It means, it means those are gonna have to be quite different. Not because small companies are dumb. The small companies would actually be dumb if they implemented all that heavy weight process. While they're small, that's wrong too 'cause that's not a problem for you right now. But if you're at scale, it is.
And so the big companies that do all that [00:14:00] stuff, that's not dumb. It's mandatory because everything's multiplied by powers of 10. And so things appear that were there, you just didn't see them often enough to do anything about it. Rightly so. So, yeah, your processes have to get better 'cause the, you do need more, you know, percentages of, of quality or however you'd like to measure, you know, different ways of measuring that.
But the other thing is, but it's never gonna be perfect and at sufficient scale stuff's gonna happen. And so you also this different mindset of, well, given that it's gonna happen for sure, then what? Oh, well then our reaction time has to be faster. The reaction has to be automated. Remember, it's like this kind of meta second layer.
Prevent, prevent, prevent. But knowing that prevention completely is impossible and scale means that that will be common. Comma, oh, what kinds of detect? So you start getting to these numbers like mean, time to detect, mean time to recover, as opposed to how many incidents. Of course you do both. You do both.
But the, the number of incidents you want smaller and smaller as a percentage of everything. But smaller as a percentage of something that's growing, it's still an absolute number [00:15:00] growing. And so you still need to know like, but do we detect and re recover in like a minute or two versus an hour? And it takes a human.
That's a big difference, but it's a totally different question of- of detect, recover automatically than preventing it in the first place. Course is quote unquote better, but if it's impossible for it to be good enough. So in how you allocate your time or investment you might say, across these things.
And then we haven't even gotten to security, which is a whole ‘nother thing and often hurts things like performance and uptime, et cetera. So that's another thing that can be at odds with scale is security. So I, I don't mean to overcomplicate it, but it just goes to show. These are not, not only things that you don't think about at first, you shouldn't think about it at first, it would be a waste of your time. It would be premature optimization. So you shouldn't do that at first. But, um, on the other hand, if later you're not doing it, that's, that, that, that's bad.
Corey: And it applies at all layers of the stack too easy example, you said a few minutes ago that you have 17,000 servers.
Okay, great. That, that is a significant point of scale. You can almost [00:16:00] certainly get some incredible discounts from Dell or HP, whoever's making servers these days, Supermicro has been on their eyes for a while, but you are almost, if not entirely, based on AWS and Google Cloud. Based upon what I've seen over the years of various service offerings you have, I could sometimes pick which one of those two providers I'm hosted out of which, cool, fantastic.
I don't have a strong preference, believe it or not, for my corporate website. Why don't you run your own servers? You certainly have enough that people would say that people can do basic arithmetic and say, okay, if a server costs this much, a calculator tells me this much, and I, wow, that's a lot of money on instances.
Do you just hate money?
Jason: Uh, no. We love money, but Google and Amazon know that, and so they simply set their prices for us such that. It would be more expensive to move to rebuild and move and manage ongoing management. Let's not forget, it's not the price of the service. Of course, that's less, it's not that.
It's managing them. And as you say, everything I just said cross apply to the physical layer, so you have to [00:17:00] be ready for that now. But you could outsource that. I know, but all that's expense. So when you take the total cost of all of that stuff. Then you then, then what Google does is they know that, and so they set their prices such that we go, okay, if we were to do that, maybe we could save this much per month, but we would have to do this, that, and it'd be this distraction.
And is exactly what you just said about WordPress is how we feel about infrastructure. How exactly those SSDs get racked and powered does not affect our customers. It, they, it needs to exist and have high uptime. Beyond that, our customers don't care how that happens. So if we could save tons of money doing it, but they just simply set the prices for us so that it's not worth it.
So as we spend more and more with them, they're like, uh, you know, then it becomes more and more economical for us to do it our own, and then they change the price so that it's not.
Corey: Discounting at scale is very much a thing. I've yet to find an AWS environment. Of course, that's built out anywhere other than at a startup where the infrastructure costs more than the people working on the infrastructure.[00:18:00]
It's not the, it's hard to reliably replace SSDs at scale in a data center. It's that. It's hard to be able to afford the people to be able to do that until you're at a certain inflection point. And again, you folks are terrific at running WordPress at scale. I don't know, for example, that you would be nearly as effective at remembering to do generator maintenance on a consistent schedule and only one at a time.
So you aren't taken down both power rails in various ways and causing site-wide outages. And I wish I could there making that one up.
Jason: Yeah, it's just not, it's just not an expertise. It's not an expertise that we have and so you could choose to. Build that expertise or perhaps acquire something, et cetera, et cetera.
And then you, but then you start asking the normal strategic questions. Is this good for our customers? Does this make us more differentiated in the market? Does this, uh, add some innovation that's keeps ahead of trends or, or, you know, does something valuable? And the answer is no. To all the, the, the best thing it could possibly do is save us money, which is a good reason.
That's, that is a good reason [00:19:00] to do something. It's just the least strategic thing you can do is save money. Anything you could do for your customers, whether you're charging them more for it or maybe accepting that value rather than in price by things like retention or advocacy. There's many, many ways to trade value with your customer.
Um, I like to say what you should do is create more value for the customer and decide how to split it with them. It could be higher price, you know, but like there's many ways. And anyway, just create more value. That's number one. And then split it. That's the business side. Fine. Um, saving money is none of that.
It's good for us and customers don't care, so we should do it. It's stupid to, as you say, it's stupid to burn money for nothing. But again, since the vendors know that, they just simply set the line such that, uh, that isn't, that isn't a good use of our time. You know, uh, we, you hear stories like, oh, with Dropbox they did this and that with disks, right?
'cause at some point, at some, at some level, at some scale, for some co uh, companies. It's a good idea. Of course. Of course. No such thing as a law of physics. That's true everywhere.
Corey: There are a lot fewer companies with specific large scaled [00:20:00] out workloads that are running into capability barriers at that scale than there are people who look at that and say, yeah, we've got several hundred of these things now. We should definitely build a data center. No, please don't.
Jason: No, no, it, it's, it's hyper specialized to, to, to wanna do that. If you're Facebook, it makes sense to have data centers in Iceland, uh, for long-term storage. That does make sense because at some scale, in some situation it makes sense, but for almost all of us, including us, and we're a hosting company, so if by, if anyone should, it's us.
Right? It makes no sense. Um, it's just it be again, because at best it's a cost savings and P.S. it isn't,
Corey: and, and there's value to understanding the market you operate. A couple years ago I was profiled in the New York Times, which was great, but when I called into WP engine in advance to let you folks know, the response was not, okay, good for you.
You just called a gloat or what? It's like, no, no, there, there's none of that. They understood, oh great, so you're gonna start potentially seeing some scale. Here's what we can do to mitigate that and make sure the site doesn't go down at the worst possible moment. 'cause [00:21:00] they're not gonna run the profile a second time.
And there were processes and procedures set up. There was a migration from Share to dedicated for a four hour span, but things still stayed up during that time. It was clearly communicated and everything just worked. That's the sort of thing you only learn to do really well by doing it really poorly a few times first.
Jason: Yeah, we it, so we've done it so many times. You know, another thing that happens with scale is people, what, what does not scale Is one person doing anything. What scales is teams of people who do things and they have their policies and procedures and, and, and training and, and, and teaching each other and so on, so that the whole system is, is of higher quality.
And, uh, you have checklists and if one person leaves the team. The team progresses in any way. That's the kind of thing that you build at scale with humans. And so that's what you're describing too, with service. So that's also true. And of course we can do that because it's amortized over 200,000 customers and no one customer can do it.
'cause it's not amortized over 200,000 customers. It makes no [00:22:00] sense for 'em to try to become an expert. Like it just doesn't make any sense. So this is true of many things, like you said, like it's true for us in the cloud, like. We, we treat Amazon, like you're saying, you treat us right? Like we, we all treat the next layer down on the stack as a oh.
Uh, that's not my business. That's necessary. I need it to be high quality, but that's not my business. It's not my, what my customers are. That's not how I differentiate it. It's not how, how I'm going to win. Therefore it's not strategic, so I need something good and that's it. It's like an SLO, and it's like in the Google SRE like I need it to be, I need it to hit the SLO and then stop.
Don't, I don't need more. I don't need to pay more. I don't even need you to deliver more past the SLO. We're done here. If it goes below the SLO, we have a problem. But if it goes up to the SLO or at the SLO, then you say. The whole point of having an SLO line is when it's above the line, we all, you know, breathe a sigh of relief that there's no current problem.
Great. And we agree not to further invest because that's not giving us value. We [00:23:00] need to invest in whatever does, which is company specific. Obviously, I. And so that's how we think about the cloud. It's how you're thinking about, you know, us as as a WordPress provider, and it's correct. That's the correct attitude towards these things.
Another way to look at it is this. There are things in the company that you want to maximize, meaning there's no such thing as good enough. Revenue is one. Profit might be one, but revenue certainly is one. Gross margin is certainly one, but there's many kinds of things like what kind of customer value of delivering.
There's no such thing as too much. One of our core value propositions is performance, site performance. There's no such thing as a site that's too fast. Jeff Bezos famously talked about how there's no one will ever say the delivery was too fast. I ordered and it came too quickly. No, like the faster, the better probably.
You know, roughly speaking. So there are a few things, not a lot, but there's a few things in the company that you want to maximize again, 'cause it's strategic or important in some big way like that. Good. That's where you should be investing. That's kind of what that means. Most things in the company, even very important things [00:24:00] are things you wanna satisfy, not maximize.
Once they get to some threshold, some level, some whatever. Going beyond that is not that valuable either it's not valuable at all or just diminishing returns or otherwise, like it's not a good use of our time or money, whether 'cause the actual return is diminished, like diminishing return or simply the business value of it is not, is not enough.
Corey: If my website load time increase increases by 200 milliseconds from start to finish. Great. I don't make another dime in consulting revenue. I don't get one more sign up for the newsletter. None of it. It's all like at this point, it, it, it checks a box for me. One of the big values of going to you folks is that I come from a background where I used to run these things.
I do have the engineering mind where I. It's fun. On some level, I wanna set up WordPress and run it across this small cluster of things, but it adds zero value to my business and it's not what I need to focus on. So please take it off my plate.
Jason: Exactly. Fun's a whole different thing, right? Like, oh, fun, fun.
Corey: problem. You could throw away everything I just said. If you wanna do [00:25:00] fun, you know,
So many of us learn this stuff on open source software in our spare time, in the evenings and weekends or when we're students. And, and then money is very dear and hard to come by and our time just fills down to basically free.
So there in time in business that turns on its head and some people have trouble with that transition I did when I started.
Jason: No, that's absolutely right. Time is certainly the most expensive thing, there's no doubt. So it's really important that you be working on the highest priority thing. What is that?
Obviously it's gonna be very dependent, but almost for sure. Um, screwing around with attaching storage is not it. So almost certainly not the top three, top one most important thing. And so almost everything in the business should be something you're satisfying in that model. And, and so, um, that means outsourcing to something good, like again, the, the, that threshold for what's good enough to be satisfied can be high.
You can set that up really high and say, for example, website speed does matter to me because I rank higher in SEO if my site's faster. And that does equal more dollars [00:26:00] at the end of the day. For a media company or a e-commerce company, or perhaps even for a consulting company. And there's a lot of data about e-commerce that shows that faster sites, more people check out.
And even, and I don't know why this is, but even have higher trans average transaction sizes, like put more in the cart. I don't know why, but there's a lot of data, like lots of
Corey: studies. Yeah. We have least saw SEO when we did the analysis improve, uh, when we improved, uh, website speed by optimizing some things.
And then we, we checked that, we got it to a point where, yep, this, this is awesome. There is not believed to be any discernible benefit. If we, okay, if we drop this performance yet further, we're already getting A's in all the, all the grades and the tests that spit out, it's okay. Is this where we wanna really focus our time?
Jason: That's right. So once you get to that, so, so you can, you might say, I have a high bar for performance. 'cause I've seen what happens when it's not. And it really does help our business when the performance is high. So I have a high bar. But then saying like, I wanna spend 10 times more to push it, a slight bit is like, well, no, not that.
Like I'm, I'm setting a bar and maybe the bar is low for some things in the [00:27:00] business, they're, they just need to exist, but not very good. Sometimes the bar is very high, no worries, but still it just needs to be satisfied and then we need to move on. And that should be most things in the business. 'cause we don't have the time and number of people, none of us have the time and people to do more than that.
For most things in the business, the bars might be high or low, but, or, or wherever they're, but, but like, after we meet them, we need to move on to other things, especially the things where there is no limit of, of how good it is. And then it's okay if you pour forever and ever into that. Like Amazon pours forever and ever into delivery times or inventory and that kind of thing.
Corey: Yeah. In my experience with my own website is that it is far slower than I would generally find acceptable. And the reason behind that is that whenever I'm logged into the admin portal and moving around the site, you do a whole bunch of cache busting.
It is going direct. Everything is slow and latent because one of the worst problems in the world is, oh yeah, you can, you fixed the issue, but it's cached somewhere, so it looks like it wasn't. And then you mindlessly destroy your own website iteratively [00:28:00] trying to improve it. In there, got the entire wardrobe, let alone the T-shirt from those problems.
So it's like, yeah, this is slow. Why aren't people complaining? Wait, I'm logged in. Oh, okay. Just to test it. Log out. Boom, things are loading almost before I click and Okay. Good work. It's always fine when that catches you by surprise. Right,
Jason: right, right, right.
Corey: I have one last topic I want to get into a bit.
You mentioned you were running WordPress entirely on top of Kubernetes, uh, WordPress. The last time I looked at it in any seriousness, which was about 15 years ago, uh, it was a product of its times coming from the nineties and PHP. It is the been an era of servers being physical things. Virtualization was looked at very skeptically in the few places it was deployed in.
It is one of the least cloudy. Packages I've seen in a while. It assumes you're gonna have permanent named PET servers running this thing forever. Trying to get it to work in a cluster where it can sustain, uh, the outage of one of the nodes. Uh, storing assets on in object stores rather than on disk [00:29:00] requires a whole bunch of ridiculous patching.
So my question for you, given that you are the authoritative experts on running this stuff in the modern era, in a cloud at scale. How vanilla is the WordPress that you folks deploy versus how heavily have you had to either patch or completely fork the thing in order to get it to do the stuff you want?
Jason: So in terms of the PHP code in WordPress, it's vanilla and there's no forking. Both for just, you could say selfishly in managing the thing. 'cause there's always changes and there's plugins and like, there's all kinds of things that otherwise would break. But also we, we are also a product of the WordPress community.
We. We have benefited so much and always have from the WordPress community, and it's one of our core values actually to give back. That means several things to us. One of them is to give back to the WordPress community that gave us and continues to give us so much. It's part of what our DNA is, so we're not interested in forking it and I don't know, somehow, whatever we're interested in helping the community, which means the, the product it is [00:30:00] however.
Everything outside of that is super custom. And of course that's our secret sauce. So that's what what people are paying us for. And everyone else is free to do the same, by the way. It's not like, you know. Right.
Corey: Well, in my era, we had so many management scripts for WordPress that were all written in the most obfuscated Perl that you've ever seen.
It was awful. Not because we tried to, because we're bad at it and we needed
Jason: something in a hurry. Written an obfuscated pearl, or as we also say pearl. Hmm.
Corey: Yeah, that that's an unnecessary adjective. You can remove it. It's all the meaning is already implicit.
Jason: It really is. My joke is always like, no one ever admits they know Perl because then they're gonna be the one, oh, can you look at this?
And the answer is, no, no, no, no, no. Perl to me kind of looks like when they picked up the modem, your mom picked up the modem and then it went like, that's what a Perl script looks like to me. You know? Yep. Suddenly
Corey: your terminal's springing out complete garbage. Yeah. It's the right world's Only write-only language.
Jason: Yeah, absolutely. Right, right. Ones read never. Yeah. So, so, uh, yeah, so everything outside of that is hyper-customized, so. You can do things in Kubernetes, you can mount [00:31:00] disk that's read-write. You can recover entire things. You can make a set of containers that, uh, act like a, you know, a, a sort of a VM, but still using containers for things like each of the processes so that it's easier to manage and test and deploy and da, da, da, da.
All the normal reasons why one containerizes things. You can do that and have it move around like a little, like a little, um, I don't wanna say cluster. 'cause of course Kubernetes cluster means something else. So th this is possible. Now you can also do what you said, which is to try to make WordPress much more natively.
12 factor is what I would say. Right. That's also a good idea in terms of scale, but as you say, it takes a lot of effort and commitment on the, on the part of the site owner is, as you say, you have to write. The site with that in mind, like object storage, really using cache, probably thinking about how the database works and how much you're gonna hit it, like how much you're gonna abuse it if it's, if it's not gonna be local.
Making sure of course, that the disk is read-only and only used to deploy code and not [00:32:00] doesn't have media.
Corey: Part of the problem with the WordPress plugin ecosystem is so many of them are written in ways that are disastrous if you implement them either at any kind of scale or in anything other than exactly the scenario that the plugin author was envisioning.
Jason: That's right. So like, like a lot of the plugins aren't available to you, so there's a lot of things you have to accept. If you accept them, then WordPress can be 12 factor. And there's plenty of sites that do that. Um, in fact, we also have a product line called Atlas, which is headless. WordPress, explicitly like we are running your node in, in Kubernetes and also running your WordPress.
So that the whole thing is, you know, just what you'd hope, I guess you could say. Um, you know, the node is running as fast as it can. Things are cached really intelligently, but it's also talking to WordPress, which is local to it. So it's, um, so it's very fast. So it's just very fast, very scalable thing that uses all the new things like Node.js and blah, blah, blah.
The new, new hotness and helicites. So we have that too. So we have, we have sort of the whole gamut between like you would say kind of the old [00:33:00] style monolithic WordPress thing, which is running in Kubernetes, but in a situation where you're like, it's in coup, but it's kind of like not, right? It is like, yes, that's exactly right.
It it looks like to WordPress like it's not, but it is in Kube. And you might say like, what's the point of that? And there's actually a lot of points. And one of them is exactly what we were just talking about, meantime to recovery. So if a Kubernetes, uh, node goes down, of course Kubernetes will reconstitute the containers and move the traffic and blah, blah, blah.
And it does it pretty damn fast much faster than any kind of VM thing with detection and much more reliably and, and at scale better as well, especially with things like GKS or other things where that's managed for you. So, um. So even making a v you might say VM-like thing on Kube gains you things like some of these benefits of scale.
You also have things like, there's all these advantages of containerization. You get those, there's like, there's various benefits anyway. Plus we have products though that go down the line to, okay wait, are you willing to really make fill [00:34:00] factor like headless apps? 'cause if so, we've got a product for you.
That takes advantage of all that, so, so welcome. And so again, we have so many customers. So if it was a startup, you'd say you have to focus, you can't do all these things. That's crazy. But we have thousands of employees. We have been around for 14 years, and slowly we've built that from something simple in focus to, okay, we're gonna layer on this product, but line, but we're gonna have 50 people working on it.
We're gonna layer on this, this other kind of customer, as I mentioned, but we're gonna have. Hundreds of people between sales and marketing and support to, uh, to, to pay attention to that customer segment so that it's not in, in, uh, so that it's in addition to another customer segment, not instead of, or all amalgamated.
So if you're doing that in the right way, then you can layer on these other things and that's okay. So now we have all that stuff. Um, and it's, it's all right. But, um, uh, yeah, you're right. It's, it's one of those things. But once again, it's one of those things where. It's hard and in some ways unnatural. But if we solve it, which we do, [00:35:00] then we have this competitive edge and we have a product that's useful.
Um, sometimes companies tackle things that are hard and there's not really a big advantage on the other side. It's just hard. And you go, oh, well that's sucks. That sounds like just a hard business. In this case, doing that hard thing earns us something. Oh, this is a high uptime, high speed, whatever, like you say, WordPress.
Oh, well, and then since it's hard, we, a lot of competitors won't be able to do it or won't be willing to do it, so it'll be, I. Somewhat com, uh, differentiated, let's say. And you're like, oh, okay. So if we do the hard thing, there's these rewards. Oh, okay. That's worth doing the hard thing.
Corey: In that case, you've also nailed the pricing as well in that it's, it, it, you don't have one of those, uh, $4 a month website offerings that I've ever seen, which means that a lot of those very small, small dollar customers tend to need an awful lot of handholding as they're getting something up and running.
When I was at Media Temple, one of the things that, uh, we finally started doing was letting go of the bottom X percent of our customers every year just because [00:36:00] you're, you're spending five bucks a month or 10 bucks a month, whatever it was at the time, and you're expecting 80 hours of engineering support in a month, and that the juice in the squeeze don't align.
Jason: It's, it's about meeting the right customers and solving the right problems for 'em. No, no, you're, you're right. You're right. And, and it is this, there is this ironic inversion where like the less they pay, the more they want, um. So I was like, wait a second. Although, although that there's a, you there, 'cause then if they pay a lot, they also wanna like literally be on the phone every week with your product managers.
And maybe you should, but there is this interesting, it, I wouldn't say it's ethical dilemma, but it, it's a business dilemma, but it, it's, it's not obvious what to do, which is of course you're gonna have some customers that are more profitable and some customers that are medium and some that are unprofitable like you're describing.
To what extent is that just, okay, it's the cost of doing business and. There's value in having a brand that just says We always help no matter what. Even those people who are in that situation, they're gonna go and say that to others. And there's this momentum and [00:37:00] brand reviews, Twitter, referenceability, case studies, there's all this stuff that like if you, the more people are happy with you, you almost wanna say Karma works not in a precise mathematical way.
But in just, just a hand wavy, it kind of does work. Okay. To what extent do you want to keep that magic, even though you can't measure that and you're never gonna have metrics, like, I get that, but there's a truth to it. To what extent is it like, look, you're unprofitable? Well, some are unprofitable. Now if that gets too much, then we got a business model problem.
Fair enough. Or maybe some are so crazy on the edge, like so, so, so crazy that no, like, you know, you've broke the argument. Now you know that that just has to be against our terms of service somehow. Like we gotta write that into our AUP or something. Like, you just can't do that. Okay, so this, maybe there's this extreme.
Corey: Yeah. When you have people go, you're, you're made about, you help them transition somewhere else, but you make it clear that you're not able to serve them in the way they need to be served.
Jason: And like even the consulting side, we have the same policy. We [00:38:00] found someone paying us a hundred dollars a month and they were the number one bandwidth user.
Okay, well you can't do that. You know, like, like there's a limit to where it's like you can't do that, right?
Corey: You're in cloud too. So bandwidth is not, is very dear.
Jason: it's very dear. It was, it was not good. So do you wanna trim the tail, those costs? If you, if you imagine a graph which we've made, and maybe you did two at Media Temple.
Of basically the customers buy their net profit of per customer and, and you, uh, and of course there's this tail that's bad, like you say, okay, so the absolute, absolute worst ends of the tail, you trim fine. And trim doesn't necessarily mean you, you make them quit. In the case of this customer, for example, they did something, they had been doing something really dumb with their site.
We helped them fix it, and they're, they could remain customer, but you do have to fix this thing. You can't just, you know. So the fix it, but the maybe they have to leave, maybe not. Maybe they are willing to pay more and maybe they can fix it. So, okay. It's worth a conversation.
Corey: It's a, it's a data point. It is not an answer in and of itself.
Jason: Yeah. It's like, Hey, we can't let this continue, but like there's lots of ways forward here kind of a thing. Right.
Corey: My point, the only ones I was glad to see leave were the ones that were just [00:39:00] abusive to the support reps that was. That was awful. I, I made a point to at least once a month beyond the support for taking calls myself.
Just 'cause let, let's, let's experience what the actual, what other people are seeing and talk to customers. Imagine that it helps with things.
Jason: Yeah. I mean, abuse you, you can't allow abuse. And the way I look at it is I. If you do, sometimes you do that in the name of we, we love our customers, we care about our customers, customers never wrong.
That kind of stuff. And while that's generally the right attitude, it's the right default attitude. Let's say like in the absence of other information, yes, but do we love our employees less? Do we respect and trust and love and care about our employees less than our customers? Well, if you allow abuse, the answer is yes, you do.
I think the right way to do business is, the answer is no. They're all people. And so we all, we, we, none of them should be abused, neither our customers nor our employees. And so if you're abusing our employees, okay, we'll tell you, et cetera. But if you can't stop, that's not acceptable. I don't care what your paying, don't care what your profit margin is.
You just can't do that because it's not like we don't care about our employees, you know? [00:40:00] And so I, I think that's just the right attitude. That's just a good, a good relationship. These are all human beings. Let's have a, let's have a, a good. More or less respectful, more or less safe, more or less professional relationship, right?
I mean, that just seems like a good idea. Um, but, but how much do you trim? So you could argue, trim it all the way up to the people, you know, blah, blah, blah. And that's not a bad idea. Like that you're, you're, you'll have a profitable company for sure. And, uh, nothing wrong with that. And um, fine, uh, there's really nothing wrong with it, but I do think there's a magic there that you might wanna be careful before you snip it, especially.
'cause I don't know that you can measure it, at least I don't know how. And so I think there's an art there to like, what, what happens over there? It's, uh, it's not obvious, but I think it's quite interesting.
Corey: Yeah, understanding customer bases is always, is always incredibly important. I mean, especially when you're a hosting company.
You always have, you have a whole category of problems that many custom, many companies just don't realize exist stolen credit cards because effectively, [00:41:00] even before cryptocurrency came out, great. I can use this to send spam to 2 billion inboxes and I can do this to run control and, uh, can control attack sites and all kinds of other nonsense.
And it's, people will say all the right things. I, for better or worse, I, I have not yet found a way for my consulting projects to be turned into something actively harmful to the rest of the world. But I'm sure someone enterprise will come up with one sooner or later.
Jason: It's true. That is, that is a constant, uh, a constant worry and a constant challenge.
I. For us, one way is bad guys taking control of other people's websites and, and that could be tech or it could be social engineering. By the way, there's also, like you say, if you can run code, you can do whatever all the things you said and more so, you know, using a stolen credit card to get a website and then do stuff 'cause you're uploading code and doing stuff.
And so that could be arbitrary stuff. Even just as simple as bouncing through the site to something else. Just to cover your tracks some more. I mean, whatever, or going to a site and injecting something in the h in the, like some JavaScript in their site, so it's not quite [00:42:00] so obvious that they've been hacked, but it's doing whatever click fraud or whatever the heck it's doing in there.
So, yeah. Uh, constant thing. So we, security has always been a critical thing that we've had to invest a lot in. And one of the reasons why people pay to be with us is that of course there's no such thing as quote-unquote perfect security. But there is such a thing as having layer after layer thing after thing.
And you know, either you do or don't do all that stuff so that you're at least not being negligent and you're doing as much as you can. That's, that is true. So we have everything and you know, we have a whole security department and we're socked to an ISO, which is not exactly security as any security person will tell you, but it does show that we're trying to be organized and thoughtful about our processes and access control and so are, but we do things like every year everyone at the company has to go through a security training, including social engineering training, especially for sales and support.
That, that it's easier for that to happen. And of course, all of our code goes through all these reviews and there's, and there's automation as well as humans on that. And I mean, just everywhere you look, there's like stacks of things that have to do with, [00:43:00] with, with security. So yeah, security's definitely like performance.
There's no such thing as we're done. We did every, we've done all the things that needed to be secure. Hooray, we're finished. But if you're like, well, I installed a firewall, but there's, but, but I've never thought about social engineering. That's negligence. I mean, when you're first starting out, it's not, it's whatever, but like at some scale it's something, or especially if you're promising that security is one of your features or benefits or whatever you'd like, call it.
Okay, well then if you're not doing social engineering training, then that is negligence. If you don't have keylogger detectors on the laptops, that's negligence. When we installed that, by the way, it must have been a decade ago, we found like a, a 10% of the laptops had a keylogger on it. Oops. Like it's there.
It's definitely like all this shit is happening. For sure. No doubt. The only question is like, are you looking? Do you know to look? Are you doing something about it? So if you're doing a ton of stuff like we are, and there's still some crazy side route where this thing happened, it's like, right. I mean, that's not good.
We wanna do something about it, but it's not negligence. You know? There's a big [00:44:00] difference, you know? Right.
Corey: You don't inadvertently take something down that's important just 'cause someone doesn't have a full understanding of what's going on. It's. All this stuff is complicated at massive scale.
Jason: The way I look at it is if something happens, I want the story of how it happened to be ludicrous.
It's like they did this and then that, and they used this thing and we didn't even know about that. And, and then the customer wrote their own code and that's where they got in in the first place. And then, and it's like, okay, that again, that doesn't mean there's nothing we want to do about it. There may be things for us to do about it.
There may be lessons to learn. No pro, no worries there. But we can rest easy going like, well, jeez, if that's what it takes, then we're doing our job. You know?
Corey: Yeah. We've successfully raised the bar of required to get into something. Okay. Now it requires active, unlikely misbehavior or choices made on behalf of customers.
Jason: Yeah. And, and, and ex and, and like, not obvious. So like, uh, there's a security bug in some library we use, and it was reported a month ago and we still haven't upgraded, and they got in that way. Okay, that's, that's negligence. Why didn't we update it in a [00:45:00] certain timeframe? Right? Another way is a zero debug was just reported.
We patched it within 12 hours, but it was already exploited in hour one. Now is the negligence? No, that's, we're doing actually way more than any of our customers would've ever done for themselves. Far, far more. And we prevented it from nearly every single customer.
Corey: And you have the telemetry and the organization around it to be able to track that now and not say, but we have no evidence of any compromise.
Jason: Yeah. 'cause you don't have logs turned on. Right, right, right. So it's like, well we protected, you know, 99.996% of our customers from it, but a couple of them before. And it's like, well then we're really again then, okay, that's back to it. We have a thing that we use in our values that, again, I keep, I keep coming back to that because we actually have them and you know that because you keep referring to them and using 'em to make decisions.
So one of them I really like is. It's called Do the Right thing, which doesn't mean anything by itself. That's just nonsense. But what it says next is to define it is if it's right for the customer [00:46:00] and right for the company, and you're proud of your decision, then you've done the right thing. And that's what this discussion right now is.
Security is an interesting application of this. So when something happens, you look at it and you say, are we proud of this? Or are we kind of not proud about this? It's some crazy ass whatever thing. You're like, yeah, like, we're fine. Like I'm not, I don't, as opposed to, oh my God, you guys, we should have gotten that.
How did we miss? Like we, that should not have happened. I'm not proud of that happened. I, so what's funny is, on the one hand, being proud of something sounds so subjective. How is that an objective measure? But what's funny is it isn't subjective at all.
Corey: You know immediately whether you're proud of something, you know, it's hard to codify in a handbook somewhere, in a way that any that that, that you're gonna be able to distort into something that fits in a contract. But you know.
Jason: exactly right. If it's in a contract, nevermind. Right. But like just from one person to another. Are you proud of this? You know the answer immediately. And so it actually works really well. In [00:47:00] fact, because something happens, everyone's like, oh, and everyone knows we're already agreeing here.
It's, it is objective, you know, not that there's no gray hairs ever, ever, but you know, you get the idea. Um, so I'll give you another example where this is a fun application. Another thing you get in hosting is our customers put all sorts of stuff on the web. Is it okay? Are we okay with it being on the web?
How much are we looking? How much can we look? These are all kinds of things where you deploy these same kinds of questions. You have to be, be responsive to abuse reports, but that doesn't mean you're, you're basically crawling everyone's website and, uh, yeah. In your spare time to see what they've got.
Corey: Yeah.
Jason: So there's things like, you know, Cloudflare goes through this kind of famously all the time where there's somebody and they're, they're saying stuff and it's really offensive to some people. And, and some people say, Cloudflare should shut 'em down because this is over the line. And some people say CloudFlare should not shut them down because, uh, it's the internet and they shouldn't make those decisions.
It shouldn't be up to CloudFlare. I think both of them have a point, hence the dilemma. You know, it's like they both have a point to make.
Corey: CloudFlare is a way of finding themselves in the middle [00:48:00] of, of that debate over and over and over and over and over again in a way that other providers never seem to.
Jason: Well, so much of the internet goes through there and, you know, people use that to keep their websites up. So, I mean, we have that too. It's not quite as in the news, but we have that as well. So here's how, here's my attitude and of course you, you don't have to agree with me, but here's my attitude on that.
As someone, you don't have to agree with me. But let's say as someone who is. Had you wrestle with such things. And we, we as a company have had to figure out procedurally how do we wrestle with such things? And we've really struggled with it as this. I want to see the organization's struggle. I want them to, I wanna see them go on the one hand this and we value that.
And the other hand, this other thing, we value that too. We, you know, we, a lot of us don't like what they say, but that can't be it the right way to do it. And, and, and, uh, um. We do believe in free speech, we do believe in the internet, blah, blah, blah, blah. And, and, and that makes sense. And yeah, who are we to make that decision?
And yet we do have to make the decision because we are here, we do have that. And yet, yeah. Should we even, and well, we have this in our terms of service [00:49:00] and maybe we should invoke it, but you could read it this other way. 'cause of course there's gray areas, there's, you know, some things are black and white, but some things are not.
Corey: None of these conversations are simple.
Jason: Yeah. And like. So I wanna see that. I wanna see them going, ah, ah, but, but we really value this. We value that. Yeah, I want to see the struggle. And then I don't care how they resolve it, I wanna see them because that means they're trying to do the right thing, whatever that means to them.
And none of us will always agree on what someone else ends up deciding in a thing like that. None of us will agree with each other a hundred percent. That can't be the metric or that can't be the thing that decides whether they're trying to do the right thing, whether they're proud of that decision. So to me, if, if, if I see our team struggles and struggles and then comes up with a answer, I'm proud of that.
I'm proud that we tried really hard and we came up with something we had some rationale course on. Everyone will agree. I'm proud of that. I'm proud of that way of deciding. I think that has to be good enough. I mean, it has to be of [00:50:00] course, genuine. Right? But like, if that, if they're genuinely humans tried to figure it out and, and like, are we getting a million of those a day?
Or have we improved and improved and improved our AUP such that only the hardest, craziest things are still hard and everything else is, is, uh, is known 'cause of how we wrote Again. Like if the answer's no, oh, well then we're being negligent about having a good pro uh, policy. But if the answer's yes.
Yeah, almost everything is handled, and this is just one of those very few things that still fell in that, that, that gap. Again, I'm proud of that. Then I'm proud of our policy. I'm proud that this is, this is rare and, uh, then we struggle. Good. Like I, I mean that's, at least that's my approach. So there, that's latest.
There was a late, I, I don't remember when it was, but I remember. One of these times with Cloudflare, they put out this big, uh, letter explaining their struggle, and I remember reading the letter and just thinking, there's the struggle. That's what I personally, I'm like, see, so I'm, I'm happy [00:51:00] whichever way they went, because I feel like they're trying to do the right thing.
Corey: They cared enough to, for it to bother them.
Jason: Yeah. Yeah. And then they cared enough to tell everyone, like it's just. And so of course people are like, no. And some people are like, yay, you know, whatever. What can you do? So that's, of course, that's the outcome.
Corey: So, you know, I, I really wanna thank you for taking the time to speak with me.
If people wanna learn more about how you view this and so many other things, where's the best place for them to find you these days?
Jason: Sure. So for me personally, it's asmartbear.com like the animal. We both like animals, I guess. Uh, my previous company was called Smart Bear. That's why it's called that because.
Online identity from long ago. But, uh, and then, and then the, and then of course WP engine is WPEngine.com.
Corey: And we'll of course put links to both of those things in the show notes.
Jason: Thanks for having me. And, and I hope you see I didn't duck any of the questions.
Corey: No, you did not. So it's appreciated. Thanks again for agreeing to do this. I really appreciate you taking the time.
Jason: It was fun. Great topics.
Corey: Jason Cohen, founder of WP Engine. I'm Cloud Economist, Corey [00:52:00] Quinn, and this is Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that won't get published correctly because your platform of choice decided to run its own WordPress instance instead.
Join our newsletter
2021 Duckbill Group, LLC