[00:00:00] Jonathan: Hello. Please come in, join me. I’m Jonathan Crowe, Director of Community at NinjaOne, and this is IT Horror Stories. Tonight, the lights are dimmed, the servers are humming suspiciously, and something sinister is lurking in the logs. This isn’t just any episode, this is the season finale of IT Horror Stories. And to mark the occasion, we’ve summoned a few especially haunting tales from our most fearless contributors: Listeners, just like you.
[00:00:35] We called upon a handful of courageous IT pros and sysadmins to relive their most harrowing tech terrors. Stories of outages, breaches, and configurations gone horribly wrong. But it’s not all doom and gloom. In the aftermath of disaster, there’s always a glimmer of hope. Today, we sit down with three IT leaders: Kallum Kyle, Senior Sysadmin at Storable; Josh Adcock, Director of Client Services at The Tech Doctor; and Adam Walter, Founder of Humanize IT.
[00:01:08] Before we get into our stories, I have three questions for you, dear listener. What’s scarier: Having to deal with end user problems, or having to rely on end user help as your only solution? Having vulnerable software on your network, or grinding down business critical operations without it? Losing your job because you automated your way out of it, or staying stuck in that job forever?
[00:01:36] So grab your flashlight, keep an eye on the firewalls, and whatever you do, don’t ignore that blinking red light.
Our First Story: Crowdsourcing solutions
[00:01:48] Jonathan: For our first story, everyone, I’ve got Kallum Kyle, Senior Systems Administrator at Storable. WinAdmins overlord?
[00:01:58] Kallum: Moderator, not quite an overlord. My name is still yellow. It’s not red for the admins yet.
[00:02:02] Jonathan: Not yet. Not yet. And Microsoft MVP, across several categories.
[00:02:08] Kallum: You say several, I say two. I’m a Microsoft MVP in Intune, and recently awarded PowerShell earlier this month.
[00:02:16] Jonathan: Congratulations for that. Kallum, I really appreciate you coming on to relive your horror story. And I understand that turnabout is fair play here because you recently did this to other people. So now I get to do this to you.
[00:02:31] Kallum: Yes. At the Midwest Management Conference or MMS, as a lot of people know it, that was in May of 2025, I hosted a session. I won’t say I presented a session because it was a lot of audience participation on IT horror stories – where we invited every single person in the audience to get up and tell their tale. And had an absolutely amazing time with it. And being able to share stories about the worst things that have happened to us professionally are… it’s extremely helpful for both feeling like, “Hey, I didn’t take down the government of Ottawa, that’s fine. My mistake’s not that bad.” And being able to meet people who have been in the trenches with you and make those connections and network in the way that is fun and not in the way that Cisco does it. So it’s great fun.
[00:03:26] Jonathan: You know, I think of the phrase, There’s always a bigger oops. You know.
[00:03:33] Kallum: Yep.
[00:03:34] Jonathan: Someone has always done something worse. I think that’s great. I think it’s also super cool that you put that together and you gave people an opportunity to get up on stage and practice feeling comfortable with doing that and public speaking.
[00:03:48] Talking about something scary for most, I think normal people, that’s a scary thing to do. So the fact that you’re able to do that and get them to talk about these horror stories, but help them overcome that fear. Get used to that and give ’em practice. That’s just really cool. And so good for you for doing that. Now let’s turn the tables on you.
[00:04:09] Kallum: Okay.
[00:04:10] Jonathan: Tell us about, tell us about your horror story. I understand, we talked about this a little bit. Of course you don’t just have one, you have many, but we’re gonna focus on one today, and I understand it’s one of the more recent ones. So maybe set the scene for us. What’s going on before we get into the hour where the real horror strikes?
Systems down
[00:04:31] Kallum: I had been working with this company for about a year, so I still felt like I was getting my legs under me as the primary like Microsoft person. I have a very small team and I am the Microsoft person who, if you’re, if you’re watching the video here, I have an Azure tattoo, so I am THE Microsoft person at my job. And we are a fully remote company. We’re split as to whether we’re on Macs or PCs. So about half of the organization is mine, that I am managing. This fleet is my baby. And then the first person that got affected was our senior security, IT security person, was the first person who reported an issue.
[00:05:24] Jonathan: So when you’re getting tickets and you see, I’m sure you probably have the people that, okay, yes, you got another ticket from them. That makes sense. When you see it coming in from your senior security person, does that hit a little bit different?
[00:05:39] Kallum: I will say in his defense, he did not open a ticket. He went into Slack and said, Hey, can I have my BitLocker recovery?
[00:05:44] Jonathan: He didn’t open a ticket.
[00:05:45] Kallum: So, he asked for his BitLocker recovery key and we’re like, okay, sure, Bob. We’ll call him Bob. Sure. Bob, here’s your BitLocker recovery key. Just, did you do anything that would make BitLocker trigger? No, not that I can think of. Okay.
[00:06:03] Hey, I am still not able to get in…Okay.
[00:06:08] And then that’s when we start seeing reports from other people who are not on the East Coast start rolling in as they’re starting and it’s like, oh no. This is a lot of people for an organization of our size to be having this problem. And that is about when I sort of popped into WinAdmins, which if you’re not familiar, WinAdmins is a Discord server with almost 15,000 users worldwide, to say like, Hey, what’s going on?
[00:06:47] And it was just in absolute flames, everybody trying to figure out what is going on. And if you feel like your team is too big, to get anything done, imagine that across countries.
[00:07:02] Jonathan: Well, in those moments too, those moments of panic, I mean, it’s, you know, cue the gif of community where, you know, Donald Glover is coming in and everything’s on fire.
[00:07:14] But seeing that, you know, there’s such a rush of everyone trying to find the information, trying to share as quickly as possible.
[00:07:43] Sometimes information is not always pointed in the same direction. It really turns into just an insane atmosphere to be operated in, in any state, let alone like a calm, cool, and collected rational state. Do you think those things, obviously it’s helpful, but do you think, you know, when you’re in that moment, this is also like, let’s talk more about like the remote aspect of this too, because.
[00:07:47] You know, as opposed to other offices where you see these reports kind of all happening within, you know, a department or within your bill. Like you’re seeing these things happening across, as you mentioned, different geos and then you’re going in and you’re seeing this. Okay, this is a huge widespread event. What’s going through your head there? How are you feeling about this?
[00:08:08] Kallum: I didn’t mess anything up. That’s the main thing that immediately went through my head when I saw colleagues in like Germany and Iceland in the UK also having this problem. It’s like, okay, this wasn’t my fault. Okay. So I don’t have to worry about that portion of it.
[00:08:27] Jonathan: You are, but you are powerful when it comes to your fleet, but you are not, that is not your –
[00:08:35] Kallum: But I did not do this. I have messed other things up, but this one was not me. But seeing that it’s, there are so many very highly intelligent, very, very technically minded people in the communities that I work in. And being able to say like, okay, I’m also not the only one who doesn’t know how to fix this. And being able to work together to sort of crowdsource something. One person finds an article and 70 companies take use of one person being able to find a piece of research is so, so nice to have.
[00:09:20] Jonathan: How are you, I guess, dividing your attention here in these moments where you’re drawn into these like WinAdmins, you’re drawn into finding out, okay, what’s going on? What is the world telling me right now?
[00:09:36] How much information can I get? You’re the person that your company’s going to for answers. I’m imagining too, you know, you’ve got people, what’s going on? What can I do? How do you kind of manage and start to triage in your mind where your attention’s going?
[00:09:50] Kallum: So we have a fairly good setup where I work, and that is that our manager runs comms. So I ignore anything if it is not from somebody that is tagged VIP on Slack. When we get into something like that. Because the VIPs are the people that are on my team, they will either be bringing me information or asking for information to disseminate because there are a lot of people asking them. So that way I only have to say something once and it gets spread. So it’s sort of like opposite phone tree at that point. There are six people who can talk to Kal right now, and only those six, anybody else will get ignored.
[00:10:38] Jonathan: Yeah, do not disturb. You go into a place where you know, you know, it’s a reliable place. People are gonna be sharing information that is lit up. People are starting to, starting to find things. They’re sharing helpful resources. What happens next?
[00:10:51] Kallum: At that point we were very much so in like there is a point of emergency where you just tell the company like, look, we know. We’ll let you know when we know something else. We are lucky in that we have these (holds up phone) now, so every single person on our team or in our company has to have a phone for MFA. Right. So for their multifactor, they have to use their phone. We don’t use UB keys or anything like that. So they’re signed into their Slack on their phone. We can still communicate with each other.
[00:11:33] We can let everybody know if something is down, without them having to be able to log into their company computer to see that. That’s the point that we get to, is we know there’s a problem. We’re investigating and we hate to see that on status pages, for like major service providers.
[00:11:52] Like Google says, Hey, there’s a problem with these, we’re investigating. It’s like, well, tell us what the problem is. We don’t know what the problem is. That’s sort of, we’re trying to figure it out. We’re working on it.
[00:12:02] Jonathan: The timeframe here, you know, it feels like forever. These things compare to how they used to roll though, I mean, information gets around a lot quicker because of all these things we’ve been talking about. And so you start to get information about, okay, there, I’m trying to remind myself about the timeline of events here.
[00:12:22] Kallum: I know it was the day before I was supposed to go on vacation.
[00:12:25] Jonathan: Of course it was. Did you end up being able to go on this vacation? This is the spoiler.
[00:12:34] Kallum: Yeah. I made it.
[00:12:38] Jonathan: How did you make it? How did you all resolve this?
[00:12:43] Kallum: How did we resolve this? Well, there was the official guide for resolving it that got sent out the same day, thankfully. So everybody knows you booted in a safe mode, delete one file outta the source folder, and when the definitions update on its next run, it pulls down the fixed version of that binary, right.
[00:13:04] So we wrote up documentation in our Confluence for our users as to how to do this themselves, and sent them an email with their BitLocker key, with like the full step-by-step documentation on it. And we opened the flood gates, because we had initially said, “Hey, we’re aware of this. Don’t send tickets on it.” We opened the floodgates and said, “Hey, if you have this problem, like this problem, here are screenshots of the problem, send us a ticket.” And we sent out the documentation and I think we had to handhold something like seven users total through it.
[00:13:54] Jonathan: That’s amazing. Chalk one up for having faith in humanity, right?
[00:13:58] Well, Kallum, thank you so much for taking time to share how you, your company, navigated that horror story that impacted so many people. Very cool approach. And just wanna say thanks again for doing everything you do, not just here sharing your story, being such a great community advocate and person who is really helping everyone, encourage everyone to be sharing their information. Everyone who’s listening, you can find Kallum and WinAdmins. Kallum, thanks so much for joining us. Really appreciate it.
[00:14:31] Kallum: Thanks for having me. Happy to be here.
Second Story: What happens when you nuke the wrong thing from orbit?
[00:14:35] Jonathan: We’re back with our second story, and this time we have guest Josh Adcock, Director of Client Services at The Tech Doctor. Josh, thanks for taking time to come here and relive a dark, scary moment of your past.
[00:14:51] Josh: Well thanks for having me, even though you’re dragging all the trauma out.
[00:14:57] Jonathan: You know, I will say when you and I talked about you coming on the show, you seem pretty excited and saying that you have quite a few of these stories, so, yes, we’re asking to relive dark times, but sounds like you’re having some fun with them.
[00:15:12] Josh: Oh definitely. Always.
[00:15:14] Jonathan: You gotta laugh, right?
[00:15:16] Josh: Hey, what else is there to do if you don’t collect stories of horribly timed things and horribly done things and reminisce on them other than just sit and stare at reports all day. So, you know, gotta have something to keep you entertained throughout the day.
[00:15:33] Jonathan: Well, let’s get into it. Before you share the details, let’s set the scene a little bit. Let’s lean in. The fire’s crackling. What’s going on in this day before, the bad thing strikes.
[00:15:46] Josh: This particular day was pretty simple and basic day. I was just kind of coming into work first thing in the morning, doing my normal reading emails, going through everything. And, you know, one of the things that I do on a pretty regular basis is go through the CISA emails and everything for, you know, different vulnerabilities and what may affect me and my environment or my clients.
[00:16:08] Just to kind of get an idea there of what’s going on. That’s what I was doing when I noticed that they released a pretty severe vulnerability for MSMQ. And later on in that same day, I happened to notice that it was enabled in almost every one of our environments with an on prem server.
[00:16:28] So after doing a little, at the time, I considered diligent research, just kind of looked through and everything I found told me that MSMQ was horribly antiquated. Nobody really used it for anything anymore. It hasn’t been a standard tool in a really long time. So me, on a Monday morning decided to make the decision that I was just gonna go with that as the answer and just throw together a quick PowerShell script and run it across everything in our Ninja environments, which, you know, I know better than to do, but it’s Monday morning, what am I gonna do?
[00:17:04] Jonathan: It is awesome. I mean, hey, like, aliens, right? The safest thing to do, nuke it from orbit. Just get rid of it. Why run the risk?
[00:17:14] Josh: So, you know, we ran that and most of the day continued on as a normal day until I received a phone call from the one and only 24-7 clinic that we have. That is always very frantic when anything doesn’t work. Like if they can’t print it is an entire meltdown. So you can imagine how frantic they were when no one could access the one tool they use for literally everything in the entire business.
[00:17:44] I guess what really made this a nightmare more than anything was the thought that this is probably gonna be pretty simple. All I have to do is go in and turn MSMQ back on on the server and everything will be golden. And I quickly realized that was not the case. There was a whole lot more involved into reenabling MSMQ and ultimately ended in having to reinstall the application from the server to all 45 of their workstations throughout the whole thing, before any single workstation would work again. That was two very miserable days.
Good intentions, wrong execution
[00:18:24] Jonathan: So two days. So this one Monday morning decision, legacy software, outdated. No one’s using this thing. Let’s be safe. This came from a good place. You’re trying to keep your clients safe, right? And, lo and behold, so tell us a little bit about what those calls are like. You’re saying they’re frantic, right? It’s a 24-7 place. It needs to be online, needs to have access to things. Do you have multiple people yelling at you at this point?
[00:18:52] Josh: No, with this client particularly if anything goes wrong, it’s generally the owner of the business that is immediately on the phone. And then it’s one of those clients where if they don’t necessarily like our answer or our way of being expedient about it, it’s immediately a phone call to the person that owns our company.
[00:19:10] And then that’s when it really becomes super frantic. You know, it’s a very long time client. It’s been a client for a very long time. Whole owner to owner relationship between the business. And of course, you know, when that call gets made to the next level, it’s always far more, I don’t know if frantic is the word, but – we will go with frantic. Because it’s always far more frantic to that call, the point the owner is calling another owner than the initial call coming in. So, generally, you know, in this situation it was something that the owner was not necessarily aware of what had happened yet ’cause he wasn’t at the office while we were doing everything.
[00:19:55] So it turns up the intensity level quite a bit, when you receive that phone call. And no one is fully aware of the whole situation other than me stuck there trying to figure out how to put it all back together.
[00:20:10] Jonathan: And so you’re pretty much on your own fixing this thing. You’re able to do it. What changes after that? I mean, aside from, you’re gonna make extra sure, but yeah. Do you have a new policy now? Is that just a painful lesson learned? Is there anything else that changed after that?
[00:20:27] Josh: Yeah, that definitely generated some new guardrails, so to speak, being put together. We also generally, anytime we have any sort of major incident or big thing that occurs like that, that affects everybody and gets the higher levels involved at that level. We generally tend to, you know, sit down with the client afterwards and have kind of a debrief and let, you know, give a couple days for tempers and emotions and everything to just kind of go away and fade into the background so we can sit down and game plan in the future what is to occur. It’s funny ’cause that one particular clinic tends to be a great source of horror stories. Purely just because it is 24/7 and it’s, you know, they have one server on location, so you’re supposed to, you know, reboot that server relatively frequently, but you can’t really reboot the server because their whole operation is running on the server.
[00:21:25] And you know, we’ve had some instances where you try to reboot it, thinking it’s gonna be a normal 15, 20 minute thing, and five hours later the server’s still not coming back on and just stuck configuring an update. Lots of stories have come from there, but it’s also been something that has been very helpful for us overall to go through those trials and tribulations because they give us a guiding path for the future. And also kind of inspire you not to take that moment when you say to yourself, I know better than to do this, but it’s gonna be fine. Because I’ve, you know, bent this rule 150 times and the one time you don’t makes all the other 150 not worth it.
[00:22:09] Jonathan: Yeah. Yep. Yep. Never let a good crisis go to waste. It sounds like you’ve taken a positive spin on it and you’re doing a, making a constructive path forward from it. Now I have one question for you, Josh, and that is after you’re able to get things restored. You know, we’re still dealing with this vulnerability. I’m just waiting for the sequel to drop. Is there another horror story? I mean, you can imagine that would be the icing on the cake, right?
[00:22:34] Josh: I mean, ultimately there’s not much. I mean, the piece of software is kind of required. It comes to some of the situations where like you may know what the correct move is, and it may be replacing something that someone is super used to, but at the end of the day, that’s not the decision I get to make.
[00:22:53] So, you know, you just kind of have to monitor and do what you can to try to make sure that that vulnerability isn’t being exploited. You know, firewall rules in place and just kind of keeping everything, keeping an eye on everything and paying a little extra attention to that particular environment.
[00:23:11] But yeah, in that particular situation, at least there really wasn’t a whole lot we could do, aside from just get it back to functioning and kind of go from there.
[00:23:21] Jonathan: It’s not unusual for folks to have more than one story. And you obviously are no exception. But it sounds like from this story too, it’s really more about how you’re able to respond to ’em. You’re not gonna prevent them all together. There’s going to be another one. It’s gonna be a surprise whenever it happens.
[00:23:40] Josh: Oh, for sure. There’s two things that you can almost guarantee in IT that I’ve learned. And that’s every day is gonna be a Monday. And, horror stories are gonna happen, especially in the MSP world. ’cause I mean, that’s essentially what we do. We get paid to deal with horror stories, and that’s honestly what makes me love what I do because it is something different every day.
[00:24:06] Jonathan: You know, ironically, it sounds like the mundane, the lack of these quote unquote horror stories is the real scary thing.
[00:24:13] Josh: In a way yeah, absolutely. There’s nothing worse than coming in for a whole, well, I say nothing worse. Sometimes it feels like there’s nothing worse than coming into like a whole eight to 10 hour shift and not one ticket comes in, the phone doesn’t ring one time, and you just kind of sit here trying to remember all those things that you’ve been trying to do because you’ve been so busy and could never get around to.
[00:24:37] Jonathan: Well, Josh, thank you for coming in and for reliving that tale. Really appreciate you and we’ll talk with you again soon.
[00:24:47] Josh: Absolutely appreciate it. Thanks for having me.
Our Final Story: Help! I worked myself out of a job.
[00:24:54] Jonathan: Moving on to our next IT horror story. We have Adam Walter, Co-founder of Humanize IT, also a frequent guest and DM on our own show, Backups and Bandwidth. Adam, thank you for being brave and coming here to share one of your many, many, many IT horror stories.
[00:25:15] Adam: Find me at any conference event and ask me for a horror story. I’ll give you a new one every time. I do have my lucky dice here, so maybe I’ll have people, you know, roll for a story.
[00:25:27] Jonathan: I like that.
[00:25:27] Adam: I’ll have an encounter sheet maybe, and tell people like, what kind of story do you want today?
[00:25:34] Jonathan: Give us a little context about your background because it’s interesting, you’ve seen, you have your, obviously your horror stories that you can share that show that you have the legit signs of a sysadmin, but you’ve also sat in a few different roles in IT.
[00:25:48] And what I think is interesting, you’ve had leadership roles, and now you’re working with MSPs, but you’ve been on the internal side too. Tell us a little bit about your background.
[00:25:57] Adam: Yeah. Bachelor of Science, computer Science. I went to college on an art scholarship actually, and an athletic scholarship. So finding a nerd in computer science that did all three of those. I’d love to meet one other person. Though I worked for the state patrol as a desktop support technician.
[00:26:16] I was the only one for the entire state of Nebraska, so I had to cover all 500 miles by myself. We had a small team, you know, like server admins and things like that back then. These are NT4 days people. Corel WordPerfect, these should all resonate with you. Went from there to a sysadmin job.
[00:26:35] That actually will be the focus of my horror story today, for a marketing firm, actually an athletic marketing firm. And the next job after that, I worked myself out of a job there actually, which will be the result, which of a fun little hook there for the story. And, the next job was I worked for a, did cybersecurity and sysadmin work for a bank marketing firm.
[00:27:00] And that was really fun. I got some great horror stories there that are just on your toes kind of things. And then I went from there to consulting and, contract work for major corporations. As a generalist, they wanted somebody who just knew every aspect of it and could command teams. That was my first real leadership role.
[00:27:20] And then decided that I missed small business and came back and started a consulting company and then kind of flew into like in 2022, buying a software company that I was doing gap analysis with to build up what you now know as Humanize IT. A great software for account managers to manage their clients and show their value and find gaps in their technology stacks.
[00:27:43] Jonathan: So, what an amazing journey and starting off, adding more and more stakes to the oops that are happening. The size of the oops. And the oops can get bigger and bigger. Right.
[00:27:58] Jonathan: You’re finding what it looks like to do things right and wrong at these other levels. They’re getting bigger and bigger. And then finally you come back to working with, well, computers – technical stuff isn’t hard enough. You’re gonna start managing, dealing with people and providing leadership challenges, addressing those. And then, uh, really to cap it off, now you’re working with MSPs. The horror.
[00:28:21] Adam: Different breed, just a different, different group of people.
[00:28:24] Jonathan: I love, I love me some MSPs. Let’s go back to your horror story now that you wanna share with us.
[00:28:32] Adam: This one in particular was at the athletic marketing firm.
[00:28:35] Jonathan: Oh, at the athletic marketing firm.
[00:28:37] Adam: They did athletic materials for certifications, and they were more of an education slash marketing, kinda helping people do strength training.
[00:28:45] Jonathan: What was the role like? I mean, were you the only IT person? Were there others involved?
[00:28:50] Adam: Set the scene here, it’s 2007 people. If you’ve got more than one sysadmin, you must be, you know, just rolling in employees. Finding one sysadmin back then was really hard. You know, a lot of wannabes, a lot of people who just didn’t understand. So I came into a role where I was replacing a sysadmin who was moving back.
[00:29:11] So this happened a lot back then, is that, you know, you, your sysadmin left and you couldn’t find anybody to replace him, so you double their salary and try to make them come back. That happened here. That’s how this story starts. Adam Walter first job outside of government, I’d been moonlighting kind of as a small consultant for a while. Could not make the IT consultancy work full time. I couldn’t, like I did the math and it just didn’t work. And I realized that a lot of you did the same thing where you knew it wouldn’t work, but you did it anyways and you just didn’t sleep for three years.
[00:29:44] Jonathan: Yep. Yep. And so here we have fresh off that, you’re going into this job. You’re young, you gotta bounce in your step. So how long have you been at this job before this horror story takes place.
[00:29:56] Adam: I wanna say it takes about a month into the job. Somewhere within that first month, I discovered this. I was let go from that job within eight months of being hired. What we’re about to hear here, lays the foundation for how I worked myself out of a job.
[00:30:13] Jonathan: Let’s hear it.
[00:30:14] Adam: And so yeah, I come right in. I have worked with some really great… so the people I worked under at the state patrol were really great server and network admins that I’d been learning from.
[00:30:24] Very, very talented people. And you just assume that all environments are like yours. So I walk into this next environment and I’m like, I’m a little intimidated, right? ’cause now I’m just the only guy with a couple developers. A SQL admin, a PHB developer and me. So I’m like, okay, there’s nothing to fall back on here. And like, it’s not like, it’s not today where you can like go out anywhere and ask questions. You go to Google and answer, but still kind of Experts Exchange is about it.
[00:30:50] Jonathan: You had Jeeves. That’s about it.
[00:30:52] Adam: Yeah, Lycos. When you start my job and I start getting my first tickets, and one of the first tickets was, Hey, we’ve had this issue going on for a couple months now. I just wanted to see, like, they email us and then they get a rejection notice after about three days. Like, Hey, your email was not allowed. And this is when smart filters just became like… I think we were using Proofpoint, like early Proofpoint this time. The previous IT admin had configured it and put it in place.
[00:31:27] Really cool, ’cause now instead of having an on-prem email filter who had a cloud-based – cloud wasn’t a thing back then – it was just a smart filter at Proofpoint that all your email would route through and then come down to you. And I loved it. It was a really cool concept. I’d never used one like that before. And so I’m chasing this ticket trying to figure out what’s going on.
[00:31:48] We must have some filters in there. By the way, we had a couple emails that were being blocked and I found out why, due to foul language. And because it was a strength and conditioning firm, the word snatch kept getting blocked, hang snatch. And so I was like, oh, cool, this must fix the issue. So I fixed the issue.
[00:32:12] Jonathan: Give yourself a pat on the back.
[00:32:13] Adam: Gave myself a pat on the back, sure. Those emails start coming through and they’re still getting rejection notices after about three days. And I’m like, gosh, what is this? Or is it, maybe it was four hours, some standard window of rejection.
[00:32:27] And they sent me the things like, we’ll try again in four hours, or we’ll try again in 12 hours. And they kept doing that and finally says, failed delivery. And so I’m like, okay, something is blocking. So I’m looking through my filters again, trying to figure out what’s going on, and I’m getting on with Proofpoint.
[00:32:44] I’m asking them all sorts of questions. And I start finding in logs that Proofpoint is the one that is receiving the email, then rejecting it. It’s not going anywhere. I’m like, okay, what’s the deal here? It’s being held up and never delivered. And something like, okay, we’re getting closer here. And Proofpoint can’t communicate with my Exchange server.
[00:33:08] I’m like, okay, something’s wrong with my Exchange. I’m going through my Exchange, figuring it out, troubleshooting, and my Exchange server never sees the connection request. This is what I figured out. I’m like, God, what is this? It’s gotta be a firewall issue. So I go to the firewall, right? And I look for my Proofpoint, my firewall rules.
[00:33:24] And, sure enough, there are Proofpoint allow rules in there. And I go and I check it out and I look at the Proofpoint, you know, allowed IPs that are at like registered at Proofpoint. And I look at the ones that are on the firewall. They’re the same IP ranges we’re allowing through, but I’m seeing rejection notices on the firewall. The previous sysadmin put in the IP addresses 172.84.0.0 through 172.16.0.0. So he allowed the 16 addresses so he allowed the 16 IPs.
[00:34:12] He did not know how to read CIDR notation. It was supposed to be a slash 16. He put a rule for 16 IP addresses in, rather than a slash 16. So 90% of emails were getting rejected if they came from any IP other than zero through 16 was getting rejected ’cause our firewall said these are not allowed. And it took me a second because that 16, zero through 16, not slash 16, zero through 16 was the rule that was created.
[00:34:42] Jonathan: Yep.
[00:34:42] Adam: It took me a second to kind of correlate. So the next rule down was a zero through 24. The next one down was like a zero through 16. I’m like, why did you choose those ranges? And I realized he did not know how to read CIDR, and had no idea what CIDR notation was. And so I’m like, oh crap, where else is this happening? And so part of my job every day was to go in and troubleshoot email rejections and troubleshoot backup issues and make sure the backups have been going.
[00:35:11] Because they had a backup network that was on its own subnet that, not on a subnet, was on its own VLAN to avoid collisions and avoid noise. And so I’m like, if he didn’t know CIDR notation for Proofpoint, where else is this causing problems on the network? And that’s when Adam realized he wasn’t going home anytime soon. I went to the backup network, found out, he’d done the same thing there. Except for he did not only not know what CIDR notation was, he did not know the difference between a VLAN and a subnet. So everything on this network is on a flat network. I mean, it’s on one little, like 3Com switch in the back, and he had just created different scopes.
[00:35:53] On the same VLAN and our collision’s happening left and right all night long. Like these backups are fit, these are disc space backups. So this little switch is barely keeping up in the first place, and it all happens to be on the same one. And it’s just dying every night. We’re getting a corrupted backup.
[00:36:09] So I fix that. Get another switch, put ’em on their own switch, put ’em on their own VLAN, fix this IP scoping, and all the sudden backups are going smoothly. I no longer have to do backups every morning. Emails going slowly. Then my final thing as a sysadmin was to verify that the database upload to the website was happening.
[00:36:26] Took about 45 minutes to upload this database of activity to the website, and then if there was ever an error during upload, I had to figure out what the error was, work with the SQL administrator and get it submitted. And this was most of your day. This was like, you know, five hours of your day.
[00:36:44] Jonathan: Okay.
[00:36:44] Adam: Well, I realized that they only had a 1.5 meg connection. For an extra 50 bucks, we could move up to a 10 meg connection.
[00:36:52] Jonathan: No brainer.
Gone, but praised
[00:36:53] Adam: Now my upload took about a minute. And so we were able to troubleshoot and get the SQL upload every day in about five minutes to 10 minutes, after like figuring the errors out. The rest of the day I had nothing to do. So my boss actually said, you got an Alienware laptop, play WoW. Just be ready when we’re ready for you. And so this, this horror story of like all these things that kept unraveling because an overconfident sysadmin built a network without knowing what he was doing, with having just a little knowledge, just enough to be dangerous and had crafted a little like workload.
[00:37:35] And by the time I was done there, I had about a half an hour of work every morning. I felt like office space. And the rest of the time I’m just kinda sitting there just trying to help out around the company where I can, and eventually they’re like, why don’t we just hire a consultant to do this? We’d save a ton of money. So eight months into my job, I got let go for having no work to do, and got a great eight week severance package with full benefits, with a huge apology and a glowing recommendation.
[00:38:03] Jonathan: I mean, I was gonna say, this is a true horror story. I mean, this is a thing that people worry about, especially with automation all the time. Are you going to do this so well that you work yourself out of a job? And that sounds like the true horror, but then, I don’t know, the way you’re phrasing this at the end sounds like this actually wasn’t the worst thing in the world.
[00:38:22] Adam: I mean, it was nice, but I loved that job. I liked the people I was working with, and the horror of it was like, you know, discovering that… like when my brain made the connection that he did not know what CIDR notation was. If you didn’t understand something, this basic, I’m pulling at this thread and I’m like, this is a big sweater, guys.
[00:38:42] And like it’s unraveling. It’s unraveling, it’s getting worse and worse and worse. And at the end, like I’m just, I started having to come with the assumption of anything I see that’s built, I have to assume that somebody built it who is smart, but not smart enough to do research and understand.
[00:38:59] Jonathan: It’s, all of a sudden you don’t trust anything around you. I mean, it’s kind of like moving into a home and realizing that the prior occupants had known enough to do some handy work. And then you find out, oh, wait. They’ve done the, they’ve done all the wiring, they’ve gotten into the plumbing. What else can’t you trust? What am I living in here?
[00:39:21] Adam: I think the horror is knowing that there’s people like that out there. Is that you as, we’re all in one network, people. Every IT person is in one physical network, we’re all connected. Your neighbor might be that overconfident sysadmin, who doesn’t understand what they’re doing and causing problems for you. And so if your upstream routing is not working, you’re gonna have to call them and correct them. Like knowing that there are people out there… like I feel like today it’s such a much more mature field. We can filter out those guys, and those people that are, they don’t really know what they’re doing. But throughout my career, I kept finding them.
[00:40:06] And I don’t wanna make people feel like idiots, but I do want to enlighten them.
[00:40:10] Jonathan: So having, I mean the maturity of the space, having more, people sharing more and talking more together. Do you think that’s the check and balance against that or is there anything else?
[00:40:22] Adam: Being honest. Be honest and be a part of a community where you say, Hey, I built this thing. Anybody see any issues with this? Or, Hey, I’m having problems like this back and forth of like, you don’t have to be the person who knows everything. And when you’re in a silo and you’re the only sysadmin or you’re the only person in your role, you can get a bit of an ego going because there’s no one to call you out. And if you haven’t screwed up in the past year, you’re probably either one delusional or two, you probably need to have some people around you to call you out.
[00:40:55] Jonathan: I love it.
[00:40:58] Adam: You just don’t know you’ve screwed up yet. ’cause we in IT, there’s too many ways to mess up and that’s why we have the fail fast methodology.
[00:41:05] Jonathan: Adam Walter, co-founder of Humanize IT. Thank you so much for taking time to share your story and your wisdom.
[00:41:12] Adam: What little of it there is.
[00:41:16] Jonathan: Adam, thank you again. Really appreciate your time and I can’t wait to hear another one of those stories.
Closing
[00:41:24] Jonathan: And just like that, we’ve reached the end of this week’s episode and the end of season one of IT Horror Stories. If you’re up to the task though, you can relive every terrifying tale from this season on Spotify, Apple Podcasts, or at www.ninjaone.com/it-horror-stories. We can’t thank you enough for tuning in week after week and joining us through this chilling ride through all sides of IT, but our journey doesn’t end here.
[00:41:50] You got a tale of your own? Share it with us on LinkedIn or creep into our Discord community. You might just find yourself featured on our season two. Until then, lock your doors, patch your systems, and remember, whatever your horror story, you’re never in it alone. We’ll be back with more soon. In the meantime, stay safe out there.