Corey Screws Up Logstash For Everyone with Jordan Sissel
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.
Corey: This episode is sponsored in part by “you”—gabyte. Distributed technologies like Kubernetes are great, citation very much needed, because they make it easier to have resilient, scalable, systems. SQL databases haven’t kept pace though, certainly not like no SQL databases have like Route 53, the world’s greatest database. We’re still, other than that, using legacy monolithic databases that require ever growing instances of compute. Sometimes we’ll try and bolt them together to make them more resilient and scalable, but let’s be honest it never works out well. Consider Yugabyte DB, its a distributed SQL database that solves basically all of this. It is 100% open source, and there's not asterisk next to the “open” on that one. And its designed to be resilient and scalable out of the box so you don’t have to charge yourself to death. It's compatible with PostgreSQL, or “postgresqueal” as I insist on pronouncing it, so you can use it right away without having to learn a new language and refactor everything. And you can distribute it wherever your applications take you, from across availability zones to other regions or even other cloud providers should one of those happen to exist. Go to yugabyte.com, thats Y-U-G-A-B-Y-T-E dot com and try their free beta of Yugabyte Cloud, where they host and manage it for you. Or see what the open source project looks like—its effortless distributed SQL for global apps. My thanks to Yu—gabyte for sponsoring this episode.
Corey: This episode is sponsored in part by our friends at VMware. Let’s be honest—the past year has been far from easy. Due to, well, everything. It caused us to rush cloud migrations and digital transformation, which of course means long hours refactoring your apps, surprises on your cloud bill, misconfigurations and headache for everyone trying manage disparate and fractured cloud environments. VMware has an answer for this. With VMware multi-cloud solutions, organizations have the choice, speed, and control to migrate and optimize applications seamlessly without recoding, take the fastest path to modern infrastructure, and operate consistently across the data center, the edge, and any cloud. I urge to take a look at vmware.com/go/multicloud. You know my opinions on multi cloud by now, but there's a lot of stuff in here that works on any cloud. But don’t take it from me thats: VMware.com/go/multicloud and my thanks to them again for sponsoring my ridiculous nonsense.
Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’ve been to a lot of conference talks in my life. I’ve seen good ones, I’ve seen terrible ones, and then I’ve seen the ones that are way worse than that. But we don’t tend to think in terms of impact very often, about how conference talks can move the audience.
In fact, that’s the only purpose of giving a talk ever—to my mind—is you’re trying to spark some form of alchemy or shift in the audience and convince them to do something. Maybe in the banal sense, it’s to sign up for something that you’re selling, or to go look at your website, or to contribute to a project, or maybe it’s to change the way they view things. One of the more transformative talks I’ve ever seen that shifted my outlook on a lot of things was at [SCALE 00:01:11] in 2012. Person who gave that talk is my guest today, Jordan Sissel, who, among many other things in his career, was the original creator behind logstash, which is the L in ELK Stack. Jordan, thank you for joining me.
Jordan: Thanks for having me, Corey.
Corey: I don’t know how well you remember those days in 2012. It was the dark times; we thought oh, the world is going to end; that wouldn’t happen until 2020. But it was an interesting conference full of a bunch of open-source folks, it was my local conference because I lived in Los Angeles. And it was the thing I looked forward to every year because I would always go and learn something new. I was in the trenches in those days, and I had a bunch of problems that looked an awful lot like other people’s problems, and having a hallway track where, “Hey, how are you solving this problem?” Was a big deal. I missed those days in some ways.
Jordan: Yeah, SCALE was a particularly good conference. I think I made it twice. Traveling down to LA was infrequent for me, but I always enjoyed how it was a very communal setting. They had dedicated hallway tracks. They had kids tracks, which I thought was great because folks couldn’t usually come to conferences if they couldn’t bring their kids or they had to take care of that stuff. But having a kids track was great, they had kids presenting. It felt more organic than a lot of other conferences did, and that’s kind of what drew me to it initially.
Corey: Yeah, it was my local network. It turns out that the Southern California tech community is relatively small, and we all go different lives. And it’s LA, let’s face it, I lived there for over a decade. Flaking as a way of life. So yeah, well, “Oh, we’ll go out and catch dinner. Ooh, have to flake at the last minute.” If you’re one of the good people, you tell people you’re flaking instead of just no-showing, but it happens.
But this was the thing that we would gather and catch up every year. And, “Oh, what have you been doing?” “Wow, you work in that company now? Congratulations, slash, what’s wrong with you?” It was fun, just sort of a central sync point. It started off as hanging out with friends.
And in those days, I was approaching the idea of, “You know what? I should learn to give a conference talk someday. But let’s be clear. People don’t give conference talks; legends give conference talks. And one day, I’ll be good enough to get on stage and give a talk to my peers at a conference.”
Now, the easy, cynical interpretation would be, “Well, but I saw your talk and I figured, hey, any jackhole can get up there. If he can do it, anyone can.” But that’s not at all how it wound up impacting me. You were talking about logstash, which let’s start there because that’s a good entry point. Logstash was transformative for me.
Before that, I’d spent a lot of time playing around with syslog, usually rsyslog, but there are other stories here of when a system does something and it spits out logs—ideally—how do you make sure you capture those logs in a reliable way so if you restart a computer, you don’t wind up with a gap in your logs? If it’s the right computer, it could be a gap in everything’s logs while that thing is coming back up. And let’s avoid single points of failure and the rest. And I had done all kinds of horrible monstrosities, and someone asked me at one point—
Jordan: [laugh]. Guilty.
Corey: Yeah. Someone said, “Well, there are a couple of options. Why don’t you use Splunk?” And the answer is that I don’t have a spare
princess lying around that I can ransom back to her kingdom, so I can’t afford it. “Okay, what about logstash?” And my answer was, “What’s a logstash?” And thus that sound was Pandora’s Box creaking open.
So, I started playing with it and realized, “Okay, this is interesting.” And I lost track of it because we have demands on our time. Then I was dragged into a session that you gave and you explained what logstash was. I’m not going to do nearly as good of a job as you can on this. What the hell was logstash, for folks who are not screaming at syslog while they first hear of it.
Jordan: All right. So, you mentioned rsyslog, and there’s—old is often a pejorative of more established projects because I don’t think these projects are bad. But rsyslog, syslog-ng, things like that were common to see for me as a sysadmin. But to talk about logstash, we need to go
back a little further than 2012. So, the logstash project started—
Corey: I disagree because I wasn’t aware of it until 2012. Until I become aware of something it doesn’t really exist. That’s right, I have the object permanence of an infant.
Jordan: [laugh].That’s fair. And I’ve always felt like perception is reality, so if someone—this gets into something I like to say, but if someone is having a bad time or someone doesn’t know about something, then it might as well not exist. So, logstash as a project started in 2008, 2009. I don’t remember when the first commits landed, but it was, gosh, it’s more than ten years ago now.
But even before that in college, I was fortunate to, through a network of friends, get a job as a sysadmin. And as a sysadmin, you stare at logs
a lot to figure out what’s going on. And I wanted a more interesting way to process the logs. I had taught myself regular expressions and it wasn’t finding joy in it… at all, like pretty much most people, probably. Either they look at regular expressions and just… evacuate with disgust, which is absolutely an appropriate response, or they dive into it and they have to use it for their job.
But it wasn’t enjoyable, and I found myself repeating stuff a lot. Matching IP addresses, matching strings, URLs, just trying to pull out useful information about what is going on?
Corey: Oh, and the timestamp problem, too. One of the things that I think people don’t understand who have not played in this space, is that all systems do have logs unless you’ve really pooched something somewhere—
Jordan: Yeah.
Corey: —and it shows that at this point in time, this thing happened. As we start talking about multiple computers and distributed systems—but even on the same computer—great, so at this time there was something that showed up in the system log because there was a disk event or something, and at the same time you have application logs that are talking about what the application running is talking about. And that is ideally using a somewhat similar system to do this, but often not. And the way that timestamps are expressed in these are radically different and the way that the log files themselves are structured. One might be timestamp followed by hostname followed by error code.
The other one might be hostname followed by a timestamp—in a different format—followed by a copyright notice because a big company got to it followed by the actual event notice, and trying to disambiguate all of these into a standardized form was first obnoxious, and secondly, very important because you want to see the exact chain of events. This also leads to a separate sidebar on making sure that all the clocks are synchronized, but that’s a separate story for another time. And that’s where you enter the story in many respects.
Jordan: Right. So, my thought around what led to logstash is you can take a sysadmin or software IT developer—whatever—expert, and you can sit them in front of a bunch of logs and they can read them and say, “That’s the time it happened. That’s the user who caused this action. This is the action.” But if you try and abstract and step away, and so you ask how many times did this action happen? When did this user appear? What time did this happen?
You start losing the ability to ask those questions without being an expert yourself, or sitting next to an expert and having them be your keyboard. Kind of a phenomenon I call the human keyboard problem where you’re speaking to a computer, but someone has to translate for you. And so in around 2004, I was super into Perl. No shocker that I enjoyed—ish. I sort of enjoyed regular expressions, but I was super into Perl, and there was a Perl module called Regexp::Common which is a library of regular expressions to match known things: IP addresses, certain kinds of timestamps, quoted strings, and whatnot.
Corey: And this stuff is always challenging because it sounds like oh, an IP address. One of the interview questions I hated the most someone asked me was write a regular expression to detect an IP address. It turns out that to do this correctly, even if you bound it to ipv4 only, the answer takes up multiple lines on a screen.
Jordan: Oh, for sure.
Corey: It’s enormous.
Jordan: It’s like a full page of—
Corey: It is.
Jordan: —of code you can’t read. And that’s one of the things that, it was sort of like standing on the shoulders of the person who came before; it was kind of an epiphany to me.
Corey: Yeah. So, I can copy and paste that into my code, but someone who has to maintain that thing after I get fired is going to be, “What the hell is this and what does it do?” It’s like it’s the blessed artifact that the ancients built it and left it there like it’s a Stargate sitting in your code. And it’s, “We don’t know how it works; we’re scared to break it, so we don’t even look at that thing directly. We just know that we put nonsense in, an IP address comes out, and let’s not touch it, ever again.”
Jordan: Exactly. And even to your example, even before you get fired and someone replaces you and looks at your regular expression, the problem I was having was, I would have this library of copy and pasteable things, and then I would find a bug, and edge case. And I would fix that edge case but the other 15 scripts that were using the same way regular expression, I can’t even read them anymore because I don’t carry that kind of context in my head for all of that syntax. So, you either have to go back and copy and paste and fix all those old regular expressions. Or you just say, “You know what? We’re not going to fix the old code. We have a new version of it that works here, but everywhere else this edge case fails.”
So, that’s one of the things that drew me to the Regexp::Common library in Perl was that it was reusable and things had names. It was, “I want to match an IP address.” You didn’t have to memorize that long piece of text to precisely and accurately accept only regular expressions and rejects things that are not. You just said, “Give me the regular expression that matches an IP.” And from that library gave me the idea to write grok.
Well, if we could name things, then maybe we could turn that into some kind of data structure, sort of the combination of, “I have a piece of log data, and I as an expert, I know that’s an IP address, that’s the username, and that’s the timestamp.” Well, now I can apply this library of regular expressions that I didn’t have to write and hopefully has a unit test suite, and say, now we can pull out instead of that plain piece of text that is hard to read as a non-expert, now I can have a data structure we can format however we want, that non-experts can see. And even experts can just relax and not have to be full experts all the time, using that part of your brain. So, now you can start getting towards answering search-oriented questions. “How many login attempts happened yesterday from this IP address?”
Corey: Right. And back then, the way that people would do these things was Elasticsearch. So, that’s the thing you shove all your data into in a bunch of different ways and you can run full-text queries on it. And that’s great, but now we want to have that stuff actually structured, and that is sort of the magic of logstash—which was used in conjunction with Elasticsearch a lot—and it turns out that typing random SQL queries in the command line is not generally how most business users like to interact with this stuff, seems to be something dashboard-y-like, and the project that folks use for that was Kibana. And ELK Stack became a thing because Elasticsearch in isolation can do a lot but it doesn’t get you all the way there for what people were using to look at logs.
Jordan: You’re right.
Corey: And Kibana is also one of the projects that Elastic owned, and at some point, someone looks around, like, “Oh, logstash. People are using that with us an awful lot. How big is the company that built that? Oh, it’s an open-source project run by some guy? Can we hire that guy?” And the answer is, “Apparently,” because you wound up working as an Elastic employee for a while.
Jordan: Yeah. It was kind of an interesting journey. So, in the beginning of logstash in 2009, I kind of had this picture of how I wanted to solve log processing search challenges. And I broke it down into a couple of parts of visualization—to be clear, I broke it down in my head, not into code, but visualization, kind of exploration, there’s the processing and transmission, and then there’s storage and search. And I only felt confident really attending to a solution for one of those parts. And I picked log processing partly because I already had a jumpstart from a couple of years prior, working on grok and feeling really comfortable with regular expressions. I don’t want to say good because that’s—
Corey: You heard it here first—
Jordan: [laugh].
Corey: —we found the person that knows regular expressions. [laugh].
Jordan: [laugh]. And logstash was being worked on to solve this problem of taking your data, processing it, and getting it somewhere. That’s why logstash has so many outputs, has so many inputs, and lots of filters. And about I think a year into building logstash, I had experimented with storage and search backends, and I never found something that really clicked with me. And I was experimenting with Leucine, and knowing that I could not complete this journey because that the problem space is so large, it would be foolish of me to try to do distributed log stores or anything like that, plus visualization.
I just didn’t have the skills or the time in the day. I ended up writing a frontend for logstash called logstash-web—naming things is hard—and I wasn’t particularly skilled or attentive to that project, and it was more of a very lightweight frontend to solve the visualization, the exploration aspect. And about a year into logstash being alive, I found Elasticsearch. And what clicked with me from being a sysadmin and having worked at large data center companies in the past is I know the logs on a single system are going to quickly outgrow it. So, whatever storage system will accept these logs, it’s got to be easy to add new storage.
And Elasticsearch first-day promise was it’s distributed; you can add more nodes and go about your day. And it fulfilled that promise and I think it still fulfills that promise that if you’re going to be processing terabytes of data, yeah, just keep dumping it in there. That’s one of the reasons I didn’t try and even use MySQL, or Postgres, or other data systems because it didn’t seem obvious how to have multiple storage servers collecting this data with those solutions, for me at the time.
Corey: It turns out that solving problems like this that are global and universal lead to massive adoption very quickly. I want to get this back a bit before you wound up joining Elastic because you get up on stage and you talked through what this is. And I mentioned at the start of this recording, that it was one of those transformative talks. But let’s be clear here, I don’t remember 95% of how logstash works. Like, the technology you talked about ten years ago is largely outmoded slash replaced slash outdated today. I assure you, I did not take anything of note whatsoever from your talk regarding regular expressions, I promise. And—
Jordan: [laugh]. Good.
Corey: But that’s not the stuff that was transformative to me. What was, was the way that you talked about these things. And there was the first time I’d ever heard the phrase that if a new user has a bad time, it’s a bug. This was 2012. The idea of empathy hadn’t really penetrated into the ops and engineering spaces in any meaningful way yet. It was about gatekeeping, it was about, “Read the manual fool”—
Jordan: Yes.
Corey: —if people had questions. And it was actively user-hostile. And it was something that I found transformative of, forget the technology piece for a second; this is a story about how it could be different. Because logstash was the vehicle to deliver a message that transcended far beyond the boundaries of how to structure your logs, or maybe the other boundaries of regular expressions, I’m never quite sure where those things start and stop. But it was something that was actively transformative where you’re on stage as someone who is a recognized authority in the space, and you’re getting up there and you’re sending an implicit message—both explicitly and by example—of be nice to people; demonstrate empathy. And that left a hell of an impact. And—
Jordan: Thank you.
Corey: I wound up doing a spot check just now, and I wound up looking at this and sure enough, early in 2013, I wound up committing—it’s still in the history of the changelog for logstash because it’s open-source—I committed two pull requests and minutes apart, two submissions—I don’t know if pull requests were even a thing back then—but it wound up in the log. Because another project you were renowned for was fpm: Effing Package Manager if I’m—is that what the acronym stands for, or am I misremembering?
Jordan: [laugh]. We’ll go with that. I’m sure, vulgar viewers will know what the F stands for, but you don’t have to say it. It’s just Effing Package Management.
Corey: Yeah.
Jordan: But yeah, I think I really do believe that if a user, especially if a new user has a bad time, it’s a bug, and that came from many years of participating at various levels in open-source, where if you came at it with a tinkerer’s or a hacker’s mindset and you think, “This project is great. I would like it to do one additional thing, and I would like to talk to someone about how to make it do that one additional thing.” And you go find the owners or the maintainers of that project, and you come in with gusto and energy, and you describe what you want to do and, first, they say, “What you want to do is not possible.” They don’t even say they don’t want to do it; they frame the whole universe against you. “It’s not possible. Why would you want to do that? If you want to make that, do it yourself.”
You know, none of these things are an extended hand, a lowered ladder, an open door, none of those. It’s always, “You’re bothering me. Go away. Please read the documentation and see where we clearly”—which they don’t—“Document that this is not a thing we’re interested in.” And I came to the conclusion that any future open-source or collaborative work that I worked on, it’s got to be from a place where, “You’re welcome, and whatever contributions or participation levels you choose, are okay. And if you have an idea, let’s talk about it. If you’re having a bad time, let’s figure out how to solve it.”
Maybe the solution is we point you in the right direction to the documentation, if documentation exists; maybe we find a bug that we need to fix. The idea that the way to build communities is through kindness and collaboration, not through walls or gatekeeping or just being rude. And I really do think that’s one of the reasons logstash became so successful. I mean, any particular technology could have succeeded in the space that logstash did, but I believe that it did so because of that one piece of framework where if a new user has a bad time, it’s a bug. Because to me, that opens the door to say, “Yeah, you know what? Some of the code I write is not going to be good. Or, the thing you want to do is undocumented. Or the documentation is out of date. It told you a lie and you followed the documentation and it misled you because it’s incorrect.”
We can fix that. Maybe we don’t have time to fix it right now. Maybe there’s no one around to fix it, but we can at least say, “You know what? That information is incorrect, and I’m sorry you were misled. Come on into the community and we’ll figure it out.” And one of the patterns I know is, on the IRC channel, which is where the logstash real-time community chat… I don’t know how to describe that.
Corey: No, it was on freenode. That’s part of the reason I felt okay, talking to you. At that point. I was volunteer network staff. This is before freenode turned into basically a haven for Nazis this past year.
Jordan: Yeah. It was still called lilo… lilonet [crosstalk 00:20:20]—
Corey: No, the open freenode network, that predates me. This was—yeah, lilo—
Jordan: Okay.
Corey: —died about six years prior. But—
Jordan: Oh, all right.
Corey: Freenode’s been around a long time. What make this thing work was that I was network staff, and that means that I had a bit of perceived authority—it’s a chat room; not really—but it was one of those things where it was at least, “Okay, this is not just some sketchy drive-by rando,” which I very much was, but I didn’t present that way, so I could strike up conversations. But with you talking about this stuff, I never needed to be that person. It was just if someone wants to pitch in on this, great; more hands make lighter work. Sure.
Jordan: Yeah, for sure.
Corey: And for me, the interesting part is not even around the logstash aspects so much; it’s your other project, fbm. Well, one of your other projects. Back in 2012, that was an interesting year for me. Another area that got very near and dear to my heart in open-source world was the SaltStack project; I was contributor number 15. And I didn’t know how Python worked. Not that I do now, but I can fake it better now.
And Tom Hatch, the guy that ran the project before it was a company was famous for this where I could send in horrifying levels of code, and every time he would merge it in and then ten minutes later, there would be another patch that comes in that fixes all bugs I just introduced and it was just such a warm onboarding. I’m not suggesting that approach and I’m not saying it’s scalable, but I started contributing. And I became the first Debian and Ubuntu packager for SaltStack, which was great. And I did a terrible job at it because—let me explain. I don’t know if it’s any better now, but back in those days, there were multiple documentation sources on the proper way to package software.
They were all contradictory with each other, there was no guidance as to when to follow each one, there was never a, “You know nothing about packaging; here’s what you need to know, step-by-step,” and when you get it wrong, they yell at you. And it turns out that the best practice then to get it formally accepted upstream—which is what I did—is do a crap-ass job, and then you’ll wind up with a grownup coming in, like, “This is awful. Move.” And then they’ll fix it and yell at you, and gatekeep like hell, and then you have a package that works and gets accepted upstream because the magic incantation has been said somewhere. And what I loved about fpm was that I could take any random repo or any source tarball or anything I wanted, run it through with a single command, and it would wind up building out a RPM and a Deb file—and I don’t know what else it’s supported; those are the ones I cared about—that I could then install on a system. I put in a repo and add that to a sources list on systems, and get to automatically install so I could use configuration management—like SaltStack—to wind up installing custom local packages. And oh, my God, did the packaging communities for multiple different distros hate you—
Jordan: Yep.
Corey: —and specifically what you had built because this was not the proper way to package. How dare you solve an actual business problem someone has instead of forcing them to go to packaging school where the address is secret, and you have to learn that. It was awful. It was the clearest example that I can come up with of gatekeeping, and then you’re coming up with fbm which gets rid of user pain, and I realized that in that fight between the church of orthodoxy of, “This is how it should be done,” and the, “You’re having a problem; here’s a tool that makes it simple,” I know exactly what side of that line I wanted to be on. And I hadn’t always been previously, and that is what clarified it for me.
Jordan: Yeah, fbm was a really delightful enjoyment for me to build. The origins of that was I worked at a company and they were all… I think, at that time, we were RPM-based, and then as folks tend to do, I bounced around between jobs almost every year, so I went from one place that—
Corey: Hey, it’s me.
Jordan: [laugh]. Right? And there’s absolutely nothing wrong with leaving every year or staying longer. It’s just whatever progresses your career in the way that you want and keeps you safe and your family safe. But we were using RPM and we were building packages already not following the orthodoxy.
A lot of times if you ask someone how to build a package for Fedora, they’ll point you at the Maximum RPM book, and that’s… a lot of pages, and honestly, I’m not going to sit down and read it. I just want to take a bunch of files, name it, and install it on 30 machines with Puppet. And that’s what we were doing. Cue one year later, I moved to a new company, and we were using Debian packages. And they’re the same thing.
What struck me is they are identical. It’s a bunch of files—and don’t pedant me about this—it’s a bunch of files with a name, with some other sometimes useful metadata, like other names that you might depend on. And I really didn’t find it enjoyable to transfer my knowledge of how to build RPMs, and the tooling and the structures and the syntaxes, to building Debian packages. And this was not for greater publication; this was I have a bunch of internal applications I needed to package and deploy with, at the time it was Puppet. And it wasn’t fun.
So, I did what we did with grok which was codify that knowledge to reduce the burden. And after a few, probably a year or so of that, it really dawned on me that a generality is all packaging formats are largely solving the same problem and I wanted to build something that was solving problems for folks like you and me: sysadmins, who were handed a pile of code and they needed to get it into production. And I wasn’t interested in formalities or appeasing any priesthoods or orthodoxies about what really—you know, “You should really shine your package with this special wax,” kind of thing. Because all of the documentation for Debian packages, Fedora packages are often dedicated to those projects. You’re going to submit a package to Fedora so that the rest of the world can use it on Fedora. That wasn’t my use case.
Corey: Right. I built a thing and a thing that I built is awesome and I want the world to use it, so now I have to go to packaging school? Not just once but twice—
Jordan: Right.
Corey: —and possibly more. That’s awful.
Jordan: Or more. Yeah. And it’s tough.
Corey: This episode is sponsored in part by our friends at Jellyfish. So, you’re sitting in front of your office chair, bleary eyed, parked in front of a powerpoint and—oh my sweet feathery Jesus its the night before the board meeting, because of course it is! As you slot that crappy screenshot of traffic light colored excel tables into your deck, or sift through endless spreadsheets looking for just the right data set, have you ever wondered, why is it that sales and marketing get all this shiny, awesome analytics and inside tools? Whereas, engineering basically gets left with the dregs. Well, the founders of Jellyfish certainly did. That’s why they created the Jellyfish Engineering Management Platform, but don’t you dare call it JEMP! Designed to make it simple to analyze your engineering organization, Jellyfish ingests signals from your tech stack. Including JIRA, Git, and collaborative tools. Yes, depressing to think of those things as your tech stack but this is 2021. They use that to create a model that accurately reflects just how the breakdown of engineering work aligns with your wider business objectives. In other words, it translates from code into spreadsheet. When you have to explain what you’re doing from an engineering perspective to people whose primary IDE is Microsoft Powerpoint, consider Jellyfish. Thats Jellyfish.co and tell them Corey sent you! Watch for the wince, thats my favorite part.
Corey: And this gets back to what I found of—it was rare that I could find a way to contribute to something meaningfully, and I was using logstash after your talk, I’d started using it and rolling it out somewhere, and I discovered that there wasn’t a Debian package for it—the environment I was in at that time—or Ubuntu package, and, “Hey Jordan, are you the guy that wrote fpm and there isn’t a package here?” And the thing is is that you would never frame it this way, but the answer was, of course, “Pull requests welcome,” which is often an invitation to do free volunteer work for companies, but this was an open-source project that was not backed by a publicly-traded company; it was some guy. And of course, I’ll pitch in on that. And I checked the commit log on this for what it is that I see, and sure enough, I have two commits. The first one was on Sunday night in February of 2013, and my commit message was, “Initial packaging work for Deb building.” And sure enough, there’s a bunch of files I put up there and that’s great. And my second and last commit was 12 minutes later saying, “Remove large binary because I’m foolish.” Yeah.
Jordan: Was that you? [laugh].
Corey: Yeah. Oh, yeah, I’m sure—yeah, it was great. I didn’t know how Git worked back then. I’m sure it’s still in the history there. I wonder how big that binary is, and exactly how much I have screwed people over in the last decade since.
Jordan: I’ve noticed this over time. And every now and then you’d be—I would be or someone would be on a slow internet connection—which again, is something that we need to optimize for, or at least be aware of and help where we can—someone would be cloning logstash on an airplane or something like that, or rural setting, and they would say, “It gets stuck at 76% for, like, ten minutes.” And you would go back and dust off your tome of how to use Git because it’s very difficult piece of software to use, and you would find this one blob and I never even looked at it who committed it or whatever, but it was like I think it was 80 Megs of a JAR file or a Debian package that was [unintelligible 00:28:31] logstash release. And… [laugh] it’s such a small world that you’re like, yep, that was me.
Corey: Oh, yeah. Oh, yeah. Let’s check this just for fun here. To be clear, the entire repository right now is 167 Megs, so that file that I had up there for all of 13 minutes lives indelibly in Git history, and it is fully half of the size—
Jordan: Yep.
Corey: —of the entirety of the logstash project. All right, then. I didn’t realize this was one of those confess your sins episodes, but here we are.
Jordan: Look, sometimes we put flags on the moon, sometimes we put big files in git. You could just for posterity, we could go back and edit the history and remove that, but it never became important to do it, it wasn’t loud, people weren’t upset enough by it, or it didn’t come up enough to say, “You know what? This is a big file.” So, it’s there. You left your mark.
Corey: You know, we take what we can get. It’s an odd time. I’ll have to do some digging around; I’m sure I’ll tweet about this as soon as I get a bit more data on it, but I wonder how often people have had frustration caused by that. There’s no ill intent here, to be very clear, but it was instead, I didn’t know how Git worked very well. I didn’t know what I was doing in a lot of respects, and sure enough in the fullness of time, some condescending package people came in and actually made this right.
And there is a reasonable, responsible package now because, surprise, of course there is. But I wonder how much inadvertent pain I caused people by that ridiculous commit. And it’s the idea of impact and how this stuff works. I’m not happy that people are on a plane with a slow connection had a wait an extra minute or two to download that nonsense. It’s one of those things that is, oops. I feel like a bit of a heel for that, not for not knowing something, but for causing harm to folks. Intent doesn’t outweigh impact. There is a lesson in there for it.
Jordan: Agreed. On that example, I think one of the things… code is not the most important thing I can contribute to a project, even though I feel very confident in my skills in programming in a variety of environments. I think the number one thing I can do is listen and look for sources of pain. And people would come in and say, “I can’t get this to work.” And we would work together and figure out how to make it work for their use case, and that could result in a new feature, a bug fix, or some documentation improvements, or a blog post, or something like that.
And I think in this case, I don’t really recall any amount of noise for someone saying, “Cloning the Git repository is just a pain in the butt.” And I think a lot of that is because either the people who would be negatively impacted by that weren’t doing that use case, they were downloading the releases, which were as small as we can possibly get them, or they were editing files using the GitHub online edit the file thing, which is a totally acceptable, it’s perfectly fine way to do things in Git. So, I don’t remember anyone complaining about that particular file size issue. The Elasticsearch repository is massive and I don’t think it even has binaries. It just has so much more—
Corey: Someone accidentally committed their entire production test data set at one point and oops-a-doozy. Yeah, it’s not the most egregious harm I’ve ever caused—
Jordan: Yeah.
Corey: —but it’s there. The thing that, I guess, resonates with me and still does is the lessons I learned from you, I could sum them up as being not just empathy-driven—because that’s the easy answer—but the other layers were that you didn’t need to be the world’s greatest expert in
a thing in order to credibly give a conference talk. To be clear, you were miles ahead of me and still are in a lot of different areas—
Jordan: Thanks.
Corey: —and that’s fine. But you don’t need to be the—like, you are not the world’s greatest expert on empathy, but that’s what I took from the talk and that’s what it was about. It also taught me that things you can pick up from talks—and other means—there are things you can talk about in terms of technology and there are things you can talk about in terms of people, and the things about people do not have expiration dates in the same way that technology does. And if I’m going to be remembered for impact on people versus impact on technology, for me, there’s no contest. And you forced me to really think about a lot of those things that it started my path to, I guess, becoming a public speaker and then later all the rest that followed, like this podcast, the nonsense on Twitter, and all the rest. So, it is, I guess, we can lay the responsibility for all that at your feet. Enjoy the hate mail.
Jordan: Uhh, my email address is now closed. I’m sorry.
Corey: Exactly.
Jordan: Well, I appreciate the kind words.
Corey: We’ll get letters on this one.
Jordan: [laugh].
Corey: It’s the impact that people have, and someti—I don’t think you knew at the time that that’s the impact you were having. It matters.
Jordan: I agree. I think a lot of it came from how do I want to experience this? And it was much later that it became something that was really outside of me, in the sense that it was building communities. One of the things I learned shortly after—or even just before—joining Elastic was how many folks were looking to solve a problem, found logstash, became a participant in the community, and that participation could just be anything, just hanging out on IRC, on the mailing list, whatever, and the next step for them was to get a better paying job in an environment they enjoyed that helped them take the next step in their career. Some of those people came to work with me at Elastic; some of them started to work on the logstash team at some point they decided because a lot of logstash users were sysadmins.
And on the logstash team, we were all developers; we weren’t sysadmins, there was nothing to operate. And a lot of folks would come on board and they were like, “You know what? I’m not enjoying writing Ruby for my job.” And they could take the next step to transition to the support team or the sales engineer team, or cloud operations team at Elastic. So, it was really, like you mentioned, it has nothing to do with the technology of—to me—why these projects are important.
They became an amplifier and a hand to pull people up to go the next step they need to go. And on the way maybe they can make a positive impact in the communities they participate in. If those happen to be fpm or logstash, that’s great, but I think I want folks to see that technology doesn’t have to be a grind of getting through gatekeepers, meeting artificial barriers, and things like that.
Corey: The thing that I took, too, is that I gave a talk in 2015 or’16, which is strangely appropriate now: “Terrible ideas in Git.” And yes, checking large binaries in is one of the terrible ideas I talk about. It’s Git through counter-example. And around that time, I also gave a talk for a while on how to handle a job interview and advance your career. Only one of those talks has resulted in people approaching me even years later saying that what I did had changed aspects of their life. It wasn’t the Git one. And that’s the impact it comes down to. That is the change that I wanted to start having because I saw someone else do it and realized, you know, maybe I could possibly be that good someday. Well, I’d like to think I made it, on some level.
Jordan: [laugh]. I’m proud of the impact you’ve made. And I agree with you, it is about people. Even with fpm where I was very selfishly tickling my own itch, I don’t want to remember all of this stuff and I also enjoy operating outside of the boundaries of a church or whatever the priesthoods that say, “This is how you must do a thing,” I knew there was a lot of folks who worked at jobs and they didn’t have authority, and they had to deploy something, and they knew if they could just package it into a Debian format, or an RPM format, or whatever they needed to do, they could get it deployed and it would make their lives easier. Well, they didn’t have the time or the energy or the support in order to learn how to do that and fpm brought them that success where you can say, “Here’s a bunch of files; here’s a name, poof, you have a package for whatever format you want.”
Where I found fpm really take off is when Gem and Python and Node.js support were added. The sysadmins were kind of sandwiched in between—in two impossible worlds where they are only authorized to deploy a certain package format, but all of their internal application developer teams were using Node.js and newer technologies, and all of those package formats were not permitted by whoever had the authority to permit those things at their job. But now they had a tool that said, “You know what? We can just take that thing, we’ll take Django and Python, and we’ll make it an RPM and we won’t have to think a lot about it.”
And that really, I think—to me, my hope was that it de-stresses that sort of work environment where you’re not having to do three weeks of brand new work every time someone releases something internally in your company; you can just run a script that you wrote a month ago and maintain it as you go.
Corey: Wouldn’t that be something?
Jordan: [laugh]. Ideally, ideally.
Corey: Jordan, I want to thank you for not only the stuff you did ten years ago, but also the stuff you just said now. If people want to learn more about you, how you view the world, see what you’re up to these days, where can they find you?
Jordan: I’m mostly active on Twitter, at @jordansissel, all one word. Mostly these days, I post repair stuff I do on the house. I’m a stay-at-home full0 time dad these days, and… I’m still doing maintenance on the projects that need maintenance, like fpm or xdotool, so if you’re one of those users, I hope you’re happy. If you’re not happy, please reach out and we’ll figure out what the next steps can be. But yeah. If you like bugs, especially spiders—or if you don’t like spiders and you want to like spiders, check me out on Twitter. I’m often posting macro photos, close-up photos of butterflies, bees, spiders, and the like.
Corey: And we will, of course, throw links to that in the [show notes 00:38:10]. Jordan, thank you so much for your time today. It’s appreciated.
Jordan: Thank you, Corey. It’s good talking to you.
Corey: Jordan Sissel, founder of logstash and currently, blissfully, not working on a particular corporate job. I envy him, some days. I’m Cloud
Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment in which you have also embedded a large binary.
Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Announcer: This has been a HumblePod production. Stay humble.
Join our newsletter
2021 Duckbill Group, LLC