Ep. #102 The CodeCov Breach with Jerrod Engelberg and Eli Hooten

[00:02:00] Guy Podjarny: Hello, everyone, welcome back to The Secure Developer. Today we have a very special episode, we’re going to dig into the CodeCov breach, which you might have heard of, and we’re very fortunate to actually have the people that were able to handle this and willing to share their experience doing it, which are Jerrod Engelberg, who is the CEO of CodeCov and Eli Hooten, who is the CTO. Eli, Jerrod, thanks for coming onto the show, and for being willing to share your learnings and the experiences, even if not all of them, were maybe the pinnacle of joy as some of this journey.

[00:02:37] Jerrod Engelberg: Absolutely, Guy. Thanks for having us on the podcast, I’ve listened to many of the episodes. If we can help at all, giving some of our learning and knowledge back to the community, I think we’ve done a good thing here.

[00:02:48] Eli Hooten: Yeah, thanks for having us guy.

[00:02:50] Guy Podjarny: So before we dig in, let’s get the listeners educated a little bit about CodeCov in the first place, to set the stage a bit. Can you tell us a little bit about what is CodeCov? What do you do? Maybe a bit about how is it that you integrate into customer environments instead, kind of sets the stage for what happened?

[00:03:09] Jerrod Engelberg: Sure, I can talk about CodeCov and then I’ll pass it over to Eli to talk about how we integrate. CodeCov, fundamentally, is a code coverage tool, but really we think of ourselves as helping customers improve systematic testing during the change management process. A lot like if you all know the product Snyk, that Guy has founded, like that, right? We’ve brought code coverage into the PR, or pull request or merge request flow, where we’re looking, not just add, what is your code coverage absolutely. Code coverage and metric of how well tested your code is, but rather how is that testing changing from commit to commit from PR to PR and it helps our customers understand at a more fundamental level.

Hey, am I making my code base, my product, my application more or less well tested as we go by testing the more important parts of my application, etc. We’re used by over a million people. We came from open source routes, but we’re also used by large enterprise customers now. I think that gives a pretty good starting point to hand over to Eli to talk about how we integrate.

[00:04:10] Eli Hooten: Yeah, the integration pathway is pretty straightforward when you consider the pantheon of developer tools that are out there today to help customers gain more insights into their code quality and this thing. Generally, as you run your tests, you generate a coverage report. This is a machine readable file that you can then pass along to CodeCov, using something we’re going to talk about later in great detail, our uploader.

Once we receive that report, we process it give you coverage information. The most logical place to do this handoff of generating the coverage report and passing the CodeCov is in your continuous integration provider. Reason being, you’re likely running every commit or majority of your commits through continuous integration. You’re passing those coverage reports to CodeCov and this gives you a really nice time series history of your code coverage, and allows us to tell you things, like when code coverage is meeting certain thresholds that’s increasing or decreasing and we pass this information via CI into the repository provider.

GitHub, GitLab, or Bitbucket, in the form of a status check on a pull request that can tell the reviewer or the engineer, “Hey, your coverage is too high, your coverage is too low. Use this information to make whatever engineering decision you see fit. Here’s what’s happening with your code coverage.” That’s how we integrate.

[00:05:20] Guy Podjarny: Got it. Yeah, thanks both. Clearly, Jerrod, you mentioned a million people using it. Those are almost entirely developers, right? Building it in, making it part of their CI process and it’s natural build. It’s running and it’s executing or collecting data as part of the build process within proximity to the source code and in other areas, I guess we’ll come back to that.

So, thanks for setting the stage here a bit. Let’s talk about the meat of it. So in January of this year, 2021, you’ve had a security breach. Can we just start off by you giving a bit of a description of what is it that has happened?

[00:05:58] Eli Hooten: Yeah, a little context initially is talking about this uploader, right. In my last answer, I talked about how you send us machine readable coverage report using an uploader. Well, what this was at the time of that security incident was a Bash script. You would curl Bash script from our servers, you would pipe it into Bash, it would find your coverage reports and it would upload to our servers. Specifically, this is curl pipe to Bash, is the way you hear this referred to in the industry. There are a lot of conflating opinions on curl pipe to Bash. A lot of really great things, I encourage your listeners to go read about the development opinion about curl pipe to Bash. Suffice to say, at the time of this incident, that’s our uploader function.

This uploader was hosted in a CDN bucket. That was a bucket that was private right, but public read. It needed to be public read, so customers could request it and get the actual bash script. The security incident itself, become the nature of this attack was that an attacker was able to extract a credential from a compress layer of our CodeCov Enterprise Docker Image that we make available on Docker Hub for our self-hosted on premises users. They extracted this image, they determined that that image gave them access to this bucket or a batch uploader was stored. Then they were able to use that credential to make alterations to the bash script, potentially malicious alterations.

Once, they were able to do this, since they could alter the bash script in place in the CDN, when it was requested, that altered script could be pulled into a CI pipeline. That was the fundamental nature of this attack, referred to in the industries and supply chain attack, right? You have CodeCov in place in your CI, in your supply chain and an attacker has made a malicious alteration or potentially malicious, and now you are suddenly executing that alteration in your supply chain. That’s the 30,000 foot view of the hack in what happened here.

[00:07:37] Guy Podjarny: This bash script you’re describing, what does it look like? Is it five lines of code? Is it 5000 of them?

[00:07:46] Eli Hooten: Bash scripts can vary pretty widely, speaking for this particular bash script, it was it was quite complex. Reason being the bash script is has been around in CodeCov’s history for a long time. It’s evolved to meet the needs of the customers that pull it down. Over time, it added more and more bells and whistles to work easily in more and more varied workflows. As a result, you end up with a bash script that is quite long. I think, several 100 lines of code or what can process and it’s all batch, right. So the script itself was big, evolved and a lot of moving parts and as a result, I think that it made it easier potentially to change that or alter it in a way that a customer, or we might not be able to tell immediately.

[00:08:26] Guy Podjarny: Got it. So the attacker manages to get a hold of the credential, which in turn, I guess at that point the S3 bucket where the uploader scripts sat down, they have the access, they modify the file, inject something into that file. I guess, from a customer perspective, what did that imply, this was a compromise and you’ve touched on this, but maybe you’re drilling a step better. This is how the CodeCov system was compromised and how the attacker manipulated it. Then on the customer side, can you elaborate a bit more?

[00:08:59] Eli Hooten: Yeah, so what the attacker did essentially was enumerated the environment where the CI was running, basically running an ENV command, to see environment variables printed. The issue that arises is that in some CI pipelines, these environment variables may represent secrets or credentials that access other parts of a company’s infrastructure, right? Perhaps maybe, as for example, there is an environment variable that’s a token to access AWS, or some infrastructure, maybe there’s a Google Cloud Platform service account credential, right or something in there. That can then be extracted from the environment and use in a follow on attack, specifically targeted at that company.

The END was taken and it was piped to a third party server where the attacker could then review later at their leisure. So how that impacted customers was depending on the configuration of their CI, what they had in their environment, how things were set up. They could leak credentials to this attacker.

[00:10:01] Guy Podjarny: Got it. Yeah, these secrets, I guess they can vary a fair bit, depending on some of those potential customers – that would move around?

[00:10:09] Eli Hooten: Of course, if you’re an open source library, and you’re used to your CI being totally open and everything being visible, you may not have anything of consequence in there, there might not really be any real reason to worry about it. If you’re in a closed source case, or you’re potentially putting a lot of secrets in your CI or CI processes doing a lot of work across your infrastructure or your stack, there could potentially be quite a few sensitive secrets or environment variables there, that will be concerning it leaked.

[00:10:32] Guy Podjarny: Stealing environment variables may be more or less severe, depending on the surrounding but it’s not it, correct me if I’m wrong, but it’s not the worst thing that an attacker could have done running in this scenario is it not, can we just get a little bit lucky that you’re stealing environments, or in theory, the attacker could have also sent the source code or things like that, or there are other mitigating controls that kept him?

[00:10:54] Eli Hooten: Right. I’ll be very clear that just for the sake of your listeners, that what happened in our case was a numerating environment variable to a third party. In the grand scheme of things you could do in a CI, if you’re executing and you’re executing bash, you’ve modified in a CI environment, you basically have access to whatever that she knows, right. If you’ve cloned source code onto the machine, it’s possible that a script could be written to attempt to find it and extract it, depending on how it’s cloned. Where it is, what you’re doing with it, this kind of thing. If you’re building assets or secondary assets in your CI, they could perhaps be taken, while this is possible, my own opinion is, it can be difficult, because what you’re having to do is pull things that may be specific to the CI or specific to the company and at the point that you’re making modifications to this bash script, you don’t know those things, right? They’re specific to the CI configuration themselves. Yes, there’s more that could happen, definitely, but in our case, it was the environment variables being curl to the server.

[00:11:47] Guy Podjarny: Got it. I think, just to allude to another supply chain security problem that happened around the same time, which was the solar winds hack, I guess, whether there was natural manipulation of the output of the build system to introduce malware, but fortunately, we didn’t have that in this case. If I backtrack a second there, you mentioned this secret credential that gets stolen out of the container image. Is there anything more to share about that? Was that just a local concern?

[00:12:15] Eli Hooten: The method with which the credential was extracted is perhaps interesting, perhaps unique to us, but may apply to others, depending on how they’re issuing their software. CodeCov has an on premises solution, CodeCov enterprise or CodeCov self-hosted. And customers that are interested in running CodeCov, in their infrastructure can use that product, how that product is shipped, is a Docker a series of Docker images that you can orchestrate together and run in your infrastructure.

These images that are themselves public, you can go to Docker Hub, and you can see them and the customer can pull them easily. This is to make pulling easily from Docker Hub, to make sure your infrastructure stays to date and this kind of thing. You can pull that image, you can attempt to extract layers, you can use certain tools to look at the intermediate layers of any Docker image, you can attempt to find things, maybe those intermediate layers you can then exploit.

This is actually I’m sure I’m butchering a little bit of speculation at this point, but a more common problem than you may think, in Docker images themselves. So if I had any advice for individuals who may be shipping code this way, it’s to look at tools like dive, or other layer analysis tools, just to get a really good picture of everything that’s happening in their Docker build chain, to understand what they’re distributing. In this case, the attacker was able to get your credential by basically going from image to layer to tarball of the layer, two decompiling the code that happened to exist in that tarball, to then digging in and finding a particular variable of interest.

There were quite a few steps there, but this is a particular line of compromise that could impact anyone that’s shipping their software in this way depending on their Docker image build process and what they’re shipping.

[00:13:54] Guy Podjarny: Yeah, absolutely. Basically, if I go back and the segments here, you have a secret leak at the beginning, I supposed one step I guess of the element, then following that, you have this little curl bash, which we can maybe dig a little bit into in a second about if you still like that approach or not, and subsequently we have the environment variable leaks out of it, which I guess isn’t a one size fits all, depending for some customers that can be disastrous, for others that might not be that terrible. It’s probably never pleasant to them, because it’s built into the system. Almost at a minimum, it would allow some source of access to the platform.

[00:14:35] Eli Hooten: Yeah, I think, they you have it. They definitely are some were indicators, quite a few steps here, that you had to go through to get from beginning to end.

[00:14:41] Guy Podjarny: Yeah, I think sometimes in the world of security, we get to enamored with the hacker scripts of, “Hey, I’m in.”

[00:14:47] Eli Hooten: Yeah.

[00:14:48] Guy Podjarny: We forget that generally attacks almost always include some form of lateral movement, some form of like sequence of weaknesses that lined up in a row to allow an actual data leak and an actual breach took place.

Well, thanks for this. I guess maybe we move on. So this is the technical, I think it’s pretty clear about what was the weekend assembly was the chain, it’d be let’s shift gears and talk a bit about the process, what happened when this actually occurred in terms of the customers. For starters, I just want to take a moment and pause and say that I, again, I really appreciate you sharing all of these details right now it might feel obvious to people listening to it, but it’s really quite uncommon for people to share these details and help others learn from their mistakes.

Sometimes it’s easier to hide your scars or those learnings, but we’re not going to get better as an industry if we don’t do that. I’m probably find myself echoing these things later on, but it is much appreciated. I think with that, let’s talk a little bit about the timeline of the disclosure. This was the weakness and was the attack. What happened here? How did you discover this? What’s the sequence of events that ensued?

[00:15:57] Jerrod Engelberg: Sure. As part of our distribution of this bash script, we also include an optional shasum check or checksum. When that handshake was not aligned, a customer reported the handshake was off, that allowed us to understand that this bucket had been compromised. That was on, believe it or not, April 1 of this year.

[00:16:27] Guy Podjarny: Was there ever a moment where you thought someone was pranking you when he got that e-mail?

[00:16:32] Jerrod Engelberg: I did not think there was a prank. I don’t know. [Inaudible 00:16:34].

[00:16:35] Eli Hooten: Not necessarily, it was a prank, it was more, you read that and you think, surely this person is mistaken, right? Like we introduced a change and it just got missed or something. You of course, take an e-mail seriously, right? You dig into what they reported and what they saw and you just have this growing realization that, “Oh, wait, something happened here.” So it creeps up on you, as you as you as you unravel the details and understand what happened.

[00:16:57] Jerrod Engelberg: Yeah, that was certainly an all hands on deck moment for us as a team. We, by that same day, just within a few hours, we had previously patched the uploader and locked down both our bucket as well as the server where data was being exhaled to and also within those first 24 or 48 hours had that forensic firm teams all in place to go do what was a very deep dive. We’re a relatively small team, to expose the report to relevant federal authorities around what was going on. Then it was just an effectively dead sprint to get this in front of customers as quickly as we can.

[00:17:35] Guy Podjarny: How did you know what to do? You got the report from a customer. How did you figure out that you need to engage a forensic firm and who to reach out to?

[00:17:46] Eli Hooten: I can speak to the first part of that, which is how do you how do you find out or how do you know what to do. Some of the early steps, the internal steps are fairly obvious, right? A customer’s reporting something, this could be a problem. Let’s flex our internal muscles here and dig in and see what we know. I think you either hit one or two points, one, “Oh, this is definitely a problem. It’s time to sound the alarm and reach out the way we need to, which Jerrod can talk about, or the second thing is, “I don’t know what this is.”

Then even in that case, it’s time to reach out and sound the alarm and find somebody. So I think, unless you’ve definitively shown it’s nothing, you can definitively prove that I think all steps lead to some form of engagement with individuals that are very well versed in this world. Jerrod can talk about the reach out and how he did that.

[00:18:31] Jerrod Engelberg: Yeah, and I think many people will highlight aspects of for example, sock two type two and its weaknesses are the type of thing that I’d to talk about that later in the podcast, but one thing we did we had done by requisite policy was actually do dry runs, like what do we do when this thing happens? How does it work? Who do we reach out to? What are the phone numbers? What’s our cyber insurance policy?

One piece of advice that I’ll keep coming back to through this conversation in case there’s any other executives, especially at small companies, who maybe are not as well versed in these types of experiences, is try everything you can to not be flat footed in the moment that you make this discovery, because then prior to this point, you have time, you can think about things in this moment, every second, every hour matters a lot. We did have to figure out some things on the fly, but there were a few places where we did have some dominoes that were able to start falling very quickly that that helped us a lot.

[00:19:30] Guy Podjarny: Yeah, I mean it’s great to hear that the dry run kicked in and I think, we often get to think about these things it’s just the necessary evil to get a sale done and forget what there’s an actual security value to them. We’ll come back to that. I like to think we need to learnings, coming back to the flow of it so you got the disclosure, you dug in, you understood as a real problem, you knew few of the steps like informing the authorities and the forensic firm right away. How did you decide what to do in terms of like outreach, in terms of public disclosure, user communications?

[00:20:02] Jerrod Engelberg: Yeah. Already I talked about exactly how we reached our users but just from an ethical consideration perspective, I will tell you in these moments, there’s different ways to approach these incidents. I’d say, we had advisors and counsel in our quarter and not everyone’s going to tell you the same thing. The only thing that I would say is try to make those ethical considerations prior to these moments as well, right. One of our core values as a company is transparency, frankly, not just internal transparency, but transparency to our stakeholders.

One thing that was clear, going back to the fact that there were environment variables in play here. We needed affected customers to take action, it wasn’t enough just to tell them that this thing had happened, had they checked the box to be disclosed. If there were printed environment variables in the CI environment, we needed customers through all those variables, right? Or it was we very much wanted to work with them to do so. I think one thing that Eli and I would say to each other at those moments was, if even one customer is not able to hear from us, and doesn’t take the appropriate action, that’s one customer too many, right?

So if we just optimize for the idea of how do we get this information in front of the customer, so they can take the action. That was the ethos or the core thesis of how we approached disclosure, and then Eli, I’ll let you talk about how we actually went about doing that.

[00:21:23] Eli Hooten: Yeah. Before I get into that, I want to echo Jerrod’s point about following your ethics and having these strong values. I think you’ll know you’re doing it right, when the answer is obvious, even if the way ahead is painful or difficult, right? You know what you have to do. That decision was made long before security incident, you just have to march forward and do it. I think in that respect, CodeCov was in a very good place. Now, how we went about reach out, it’s, it was unique to CodeCov the way it works. Since you sign in through a social sign-in or OAuth through GitHub, or GitHub or Bitbucket, in those platforms, you can opt to not share your email, via those authentication mechanisms, or authorization mechanisms.

So for a lot of our users, or for some number of our users, we may not have even had their email. So we can’t just email blast everybody. We had to really tackle it on all fronts. So every email we had, have impacted users, we emailed, we had public disclosures that we promoted, try to point everybody to try to really make announcements out of this. The tech reporting media helped here by signal boosting that stuff and getting more people to see it. The other thing we did is just dumped notifications in application. If you log into CodeCov you’re going to see this was a problem and you need to be directed to this disclosure, and you need to know what to do next.

So it wasn’t a question about how bad is this going to hurt? How embarrassing is this going to be to tell our users this happened? It’s like, “No, we’re doing it.” By any means necessary, we’re going to do our level best to reach every single customer we can. That’s what we did.

[00:22:51] Guy Podjarny: I think that’s very much the case and I guess it’s worth also highlighting that this is, you knew that their information was being leaked, right? It wasn’t a weakness, it wasn’t vulnerability, it was a breach, that you’ve seen the line. You don’t know what the attacker did with those environment variables, but you know the environment variables were sent out. I think definitely the right move to do it. Were there I guess voices in the room and not to point any fingers, but like was there a lot of pool? I’d imagine on the legal side or such, there must be some temptation to say no, let’s not take any accountability, not admit failure almost, did that happened here? Did you have to resist that poll? Do you talk about your ethics, it sounds like as opposed to some other polar desire.

[00:23:36] Jerrod Engelberg: Without pointing fingers in any way. There were times where we received some amount of counsel to at least consider not going public. I’m not saying it was necessarily even bad counsel, per se, but just be aware that if and when you go through this, you probably will either feel something internally pulling you that way or possibly externally. That’s why I say, try to make that decision ahead of time what you’re going to do, because it’s so much hairier in the moment to think through that.

[00:24:05] Guy Podjarny: When the emotions are running high.

[00:24:07] Eli Hooten: Sure, my perspectives when we were going through that it was, I’ll go back to the strong ethics, because when you’re in the room, there are people who perfectly well-meaning stakeholders, who are very interested in trying to make sure that the company comes to this in the best way possible, right. So they may have some opinion or input that might run candidate your values or ethics as a company.

So once again, if you have those strong ethics, you can enter those conversations and say, “Look, this is the way we’re doing it. How do we do it in the way that best suits the company? How do we do it in a way that’s best for us and our users, but according to our values, this is the way we have to approach this.” It takes all that contention off the table, it becomes more of a question of how do we solve this problem? How do we communicate this way? Instead of should we communicate this way? I think that’s a much easier problem to solve.

[00:24:53] Jerrod Engelberg: To add one more thing, he brought up the legal framework. I actually do feel very lucky to exist in a Western legal framework where according to our lawyers and I think many others disclosure is a strong legal standard, right? It actually helps, right? I think where both where I’ve seen massive blowback against incidents in the public eye, but additionally, it seems to be when people try to sweep something under the rug, right? Yeah, maybe you get very lucky and no one ever finds out, but just a really strange way to approach this collective thing that we’re all building, the open internet.

[00:25:30] Guy Podjarny: That’s definitely good. I would say legislation is probably headed that way more and more with GDPR and extreme penalties on it. I think you’re spot on Jerrod, saying you have to exercise this a little bit or have conversations about how you will react without the emotions running high. I guess is one aspect of it, was it to stick to the facts and sure you disclose things, but try to avoid any opinions or statements of that and be very factual. Is that an element of such a disclosure?

[00:25:59] Jerrod Engelberg: Yeah, I look at some other breach disclosures from history. I think developers and security folks and secure developers collectively, can very much sniff out PR talk, Press talk. There is a connection that you draw with your user when you are quite direct literal, don’t try to paper over the facts, give them what they need as quickly as possible. We tried to aspire for that. We were inspired on that by some other breaches that we looked at. We’re like, “Wow, that’s a great way to word that. I feel like the person who wrote that disclosure understands me as a developer, not me as a press person or something like that.”

[00:26:43] Guy Podjarny: So we’re going to get a little bit of remedial actions around learnings and make this more resilient, but maybe before we do that, just talk a little bit about the emotions in the moment. I mean, this is not the best moment of your lives, probably. Anything to share? How does it feel to do this, and maybe what helped in terms of not getting sucked into the bummed out sensation, but push forward?

[00:27:09] Eli Hooten: The first step is just to sit with it, you may not know how it happened, you may not know that pathway, that’s a hard problem to solve and you have to figure it out, but once you understand the impact, I think it’s okay to sit with it and get sucked into that for a little bit, and just give yourself space to feel very bad about that. Then you get to work. Yeah, there’s dread. Yeah, there’s stress. There’s also this underlying current of having a solid, very, very challenging technical problem, to figure out how this happened.

Earlier in this conversation, we talked about all the steps it took to get to where this attack was successful for the attacker, right? None of that was free. None of that was given to us, right? We just saw the end result and had to work backwards and fill in every piece of the blanks. It was difficult and challenging and you feel the thread, you’re moving as fast as you can, because you think about your customers. So you just do your best to balance that. For me, it was sitting with it the beginning, “Hey, this was bad. Okay, let’s move on, try to solve it.” Then you just move forward and solve the problem. Yeah, it’s incredibly challenging in a moment.

[00:28:06] Jerrod Engelberg: The other thing that was always powering me through those hardest moments, two things. One was the fact that, yes, I got into this industry, working on CodeCov to make CodeCov successful, but also, really, because I wanted to make this industry successful. I thought there was an edge or opening to build this product that can be widely used and loved. So if you take that thought process, then the best thing that you can do for that same industry that you love for the same developers that you try to serve, is just step forward fearlessly, right. So that definitely helped me power through those moments.

The other thing I would call out is, there’s humans on both sides of this equation. Yes there’s legal corporations and all these things like govern these relationships, but every time you get on a call with one of your customers, the human empathy that was there, person to person was extremely helpful. I think the security professionals at large deserve a lot of thanks at least for me I saw some really thoughtful, helpful advice from massive companies, I can’t name names, but who you are, if you’re listening.

The fact that they also saw the bigger picture, even I saw security professionals working together on this, that were direct competitors, where their companies would never be working together, but the security professionals that are in each of those respective corporations are able to really collaborate in a way that’s very, very cool. So those two pieces really I think, helped with the emotional as you said, dread of, hey like, this hurts, right? This was on both sides.

[00:29:46] Guy Podjarny: Yeah, it’s great to see growing empathy indeed, in the security and I guess, tech scene as a whole. And I think they’re both good tips on how to cope with the moment.

Let’s maybe switch to the rise after. So maybe starting more tactically, so what have you changed or applied from a security perspective in CodeCov post-breach?

[00:30:10] Eli Hooten: Sure. We immediately in terms of remediation, revoked the credential, find any other evidence of credentials, that could accidentally get into source code or whatever. All of that’s removed and gone. The next step is you immediately put active monitoring on the bucket holding your Bash uploader, such that if there’s any change, you’re getting an alert and you can look at that alert and go, “Oh, yeah, we made that change.” Or “Oh, what is this?” That was table stakes, stuff to do really after this happen.

In longer term, we took a look at the Bash uploader, and it had actually been on our agenda for a long time to replace it for reasons that aren’t even necessarily inherent to security. For example, Bash is tough to maintain, it’s hard to test, it’s hard to modularize, there were just a lot of issues with working with Bash generally that a modern language would help overcome. So we were already undergoing efforts to rewrite this Bash uploader into more modern languages, because it’s easier to test, easier to take contributions on, and this kind of thing. So we expedited that work to get it across the finish line even faster.

We ended up changing the mechanism distribution from [inaudible] to Bash, to using a compiled binary that we can distribute that was signed and shasum verified, and all this so that you could be much more certain of the integrity of what you were pulling from us, if you chose to do it. We could have done it with bash as well. In fact, the bash had a checksum. It didn’t have cov signature verification, but we could have added those pieces. The reason we chose to go with to go with the new uploader anyway is we were already writing and doing network, because we put those security pieces there. We’ve also started a 12-month deprecation window for the Bash uploader. They’re actively in the process of moving our customers from Bash to this uploader as well.

[00:31:48] Guy Podjarny: I guess a couple of questions on this, one is, how much do you attribute to Bash? I mean, does it really matter if this was Bash or some other tool that was included in it?

[00:31:58] Eli Hooten: I think there’s generally a pretty negative stigma around curl pipe to Bash, I think, earliest conversation, I mentioned that listeners could Google it, read about it, understand the problems, those problems are overcome, or at least partially overcome with things like Shasum verification or having a signature to verify that, “Oh, hey, CodeCov signed this.” So I know it’s there change and in this kind of thing. Is that enough to overcome a curl pipe to Bash stigma? For some people, maybe, for some people, may not.

I think in our case, the move away from Bash, like I mentioned, was already happening, we had already decided there were problems here. Not even just security problems, but problems from a modern software development aspect, that just made it challenging to work on. So moving to a new language in our case, we’re building this off NodeJS.

[00:32:38] Guy Podjarny: That was precisely my next question, which is like, what language did you consider –

[00:32:42] Eli Hooten: Yeah, yeah, so we chose Node – there were some business reasons for that as well, I can get into it if you’re curious, but we ended up choosing Node, that was more consequence of, it’s just really hard to work on Bash in the team setting and layering on the security pieces, we could have easily done to Bash, we chose to the new uploader and push that out and move everyone towards that.

[00:32:59] Guy Podjarny: The mode, if I think about this, from a customer side, you wrote the relevant functionality in NodeJS as supposed to in bash, but that also implies users consume it in a different fashion, right? They don’t curl into NodeJS script, they – another [inaudible 00:33:15].

[00:33:15] Eli Hooten: Yeah, we actually distribute this as binaries for all the major CI platforms. If you’re building on Mac OS, you’re building on Windows, Linux, Alpine, we have a current Node binary that you can wgit and pull into your CI. Then as we go to our documentation, we encourage our users, you can Shasum, check this for its integrity, you can signature verify it to make sure that we provided it. Then you use it with an interface very similar to the Bash uploader. After you bring it in and verify it, the actual use is fairly similar, at the end of the day.

[00:33:47] Guy Podjarny: How do you get users to actually apply that checksum. Fundamentally you can Wgit without doing checksum verification as well. That binary can also be tampered with. It’s a bit harder than Bash, but it can also be tampered with. How do you get users to actually run the right line?

[00:34:04] Eli Hooten: Yeah, of course. So ultimately, it becomes an education problem, right? I can’t step into your CI and write these lines of code for you to do the verification for this thing. In our public documentation, where we provide the new uploader and talk about it immediately on that page is, “Hey, here’s how you verify. Here’s how you shasum check. It’s exactly how you should do it, we highly recommend you do it.” but at the end of the day, I can’t make you and so if a customer chooses not to, there’s not a lot we can do.

[00:34:29] Jerrod Engelberg: However, whenever we can wrap our own pleasure. So for example, like GitHub actions, CircleCI Orbs, then we can actually add these things out of the box for the user as well. But make no mistake Guy, our industry has a distribution problem, right? Until this is zero trust and I don’t want to step on a future question, but there is always this handshake, right? We can make the handshake more and more sophisticated, but it’s definitely something that I think a lot about as we move forward and what can come next.

[00:34:58] Eli Hooten: Yeah, to echo that in parts of the distribution of the uploader we can control, specifically our GitHub action, or CircleCI Orb [inaudible] step. We do the Shasum checking and the signing of the binding or uploader that those things pull in before customers use them. So in those cases, we do control it and sure it happens, but if you’re bringing down the uploader yourself, in your own CI, it falls onto an education problem for us to make sure that you can do these things and why you should do them.

[00:35:21] Guy Podjarny: We’ve talked about the actions the immediate monitoring actions and remove the secrets, the change, the Bash uploader to an NodeJS. Any other notable technical learning, I guess, or tools added?

[00:35:34] Eli Hooten: Yeah, there’s the Docker side of this, right. Ultimately, what kicked this off was the ability to find something meaningful in a Docker layer. Docker has some experimental support for squashing that may become more in the mainline release. You can flatten your Docker image and not add those intermediate layers. That’s one thing. We did that. We also move to multistage builds, such that all your source used to build what you’re packaging into the image isn’t one layer, but what you’re distributing is a secondary image that just contains your built binary.

The other thing we did is, we forked a project called Dive, which is a very great open source project that can programmatically explore Docker images for these layers. We forked it to specifically add rules to make sure that artifacts that are very commonly in our source code and should be there aren’t present in the built image. So that’s one final thing that we work into our CI, such that whenever we want to push a new release of self-hosted out, that check has to pass and we know there’s nothing present that an attacker could potentially use.

[00:36:33] Jerrod Engelberg: Then speaking to the compliance or SOC 2 policy type side of this one key generation and head rotation. We actually built our own in house key management solution, more than what you get from a vault. Like one of the metadata around this key once use for one of the recently generated, when should regenerate it next, you just keep those tokens cycling fresh. Two is, yes, I said, there was some things that we had momentum on, we had practiced on for an incident like this, but there were some that we had not. I think learning from these points of the process where it’s like we could have been more prepared X, Y or Z place, and actually recording those and writing those in.

Lastly, staffing changes, just really building our dedicated security muscle here, even as a small company, because we are widely used, right? We need to respect the role that we have in our industry software supply chain. The last thing I’d say about sock two is, like a gym, like a fitness routine. It’s only as good as, as you actually make the routine. Yeah, you can go to the gym twice a week, and you go for a run, but only you know your body. I think some of the tweaks that we made were very specific to CodeCov and it felt really good to level up, how we test ourselves, beyond what is the bare minimum.

[00:37:50] Guy Podjarny: Yup, definitely great to apply the practice and hear you loud and clear and fully agreed, which is indeed, we tend to park outside again to do the minimum when it comes to some of these regulations and sock two included and boy, can they come in handy at a time of need. Yeah, definitely understood.

Well, these are all really good learnings, sounds like from a company culture and fire drills, learnings around it from the technical system, I would even put the key rotation visiting in that bucket, although it is about just cleaning the house regularly and lay everything you described around the pipeline security components and then the users. How about the business side of this activity? I guess, maybe we start like just asking in general, like how did this affect and everything we focused on, on certain securities and securing users? If I’m asked – what was the impact on CodeCov business for starters, the bridge and then also your response to it?

[00:38:52] Jerrod Engelberg: I would be lying if I said there weren’t some customers that said, “Hey, we can’t use you at this time.” Or, champions of our product said, “Hey, I love using your product, but the moratorium has come down and I can’t use this product.” It did happen and it hurts to know that you now have to break that relationship with that customer, but honestly more often than not, there was consideration, there were thoughtfulness, there was re-auditing. I think, as much time and focus as we put into the remediation effort itself, thinking about these forward, some of the points that we just walked through about what we change about our system. We did re-audit with customers. I’m very thankful that a lot of those customers have actually retained their business with us.

I think if we weren’t transparent, if we weren’t very forthright with where this came, and those customers found out, like the chance of retaining them, I think, starts to converge towards zero very quickly. I think going back to that human element, most people understand, most companies will go through some severe vulnerability or breach at some point, it doesn’t make it feel any better in the moment.

I definitely feel like, we got ours relatively early in the lifecycle of our business, but from a customer relationship perspective, I’m very grateful, frankly, that people gave us a chance. Were at least willing to look at CodeCov with fresh eyes and say, “Hey, if we were looking at this business for the first time, based on all the controls that they have, and things have put into place, where would we land on a security audit?” Versus, “hey, they had an incident. Therefore, we’re out.” Again, I’m not blaming someone for feeling that way, but it was cool. I think, very first principles oriented, I guess you’d say, that people looked at this with fresh eyes and say, “Hey, is this a security approach and measurement that we can sign up for?”

[00:40:39] Guy Podjarny: Yeah. It’s great to hear that understanding. I must imagine that your transparent handling of it has played a big role in that approach. I have a bunch more questions, but I think we’re getting a bit tight on time. We’ve already been going a bit long. Maybe to shift a little bit to straight education phase. I think, you’ve gone through this process and learned from it. Would love to maybe spend the next few minutes talking a little bit about some advice, and maybe then level up a bit to the industry level, and indeed, talk about what should we do there.

For starters, you gave a bunch of good advice here already around how you handled it, could people can learn from. Straight up, if you’re talking to a fellow CEO, CTO of a company, what would your primary advice be, having experienced what you did?

[00:41:26] Eli Hooten: One of the most important things I learned from this process, and we’ve talked about it a bit, is that your most important tool in this process, in my opinion, is people. Lean on your network, lean on your team, lean on individuals that are legitimately interested in creating a more secure software ecosystem.

Jerrod mentioned earlier on in this conversation that he had anti-thesis of security teams who may be on direct competitors, working together, to try to help solve these problems and understand what’s happening. I think, that’s just a great story for how the human side of security works. Everyone legitimately wants to help you. No one wants these problems to exist. No one wants companies, or people to be dramatically impacted by things like this.

Don’t be afraid to talk about it. Don’t be afraid to reach out. Don’t be afraid to have honest conversations with people that can work hard to help you. That applies both externally and internally. One of the things that that we had to do inside the team was split focus, right? I mentioned earlier that this is a hard problem, like figuring out how all this happened on the technical side was very difficult.

We immediately split the team and said, “Hey, half of us are going to work on this. Everybody else, you got to keep the business running forward. You’ve got to keep doing the day-to-day and getting things across the finish line.” Having that trust internally, I think is what helped us to have the outcome we did. Then having the trust externally, with other individuals in the security space to give us good advice to be helpful to do their best to make sure we’re successful. We couldn’t have gotten to where we were without it. So if anything, trust people, they’ll help you. In this case.

[00:43:01] Jerrod Engelberg: My advice for a CEO or CTO going through this, especially one who is at the earlier stage of growth of their company. You may not have a CISO or some super senior security professional on your team, that’s going to be the champion of this. Like with all hard problems in the business, it ends up falling to the executives, if no one else can take it. First of all, just internalize the fact that this is going to be on your shoulders, right? That’s okay, but knowing that that will happen, what is everything that you can have in place ahead of time, knowing that again, I’m sorry to tell you, but most companies will go through something of some type of severe vulnerability incident.

So just thinking about, hey, in that eventuality, who’s the first phone call I’m going to make? I’m sure they don’t have a CISO, and maybe have a great security advisor, right who’s gone through. I don’t want to name names, because we haven’t announced this investment yet, but we have some great security folks on our cap table that honestly made a huge difference. It took days to hours and making the right contacts, getting the right device, jumping through some of these things. Knowing who your first phone call is, and knowing what that process is, having the cyber insurance policy in place, please do it for so many reasons, that’s everything before and then in that moment, stick to your guns, think about the reasons that you’re doing what you’re doing.

Stay super customer-centric, because I think the best thing that you can do for the longevity of your business is just keep your customers absolutely front of mind. Then coming out of the event, ask a ton of questions. I mean, a lot of the changes that we made to our systems, yes, we had some good ideas, but we also talked to a ton of customers and said, “Hey, what do you want to see here? How would you want to stand back from this? Where were your concerns?” Some amazing, very senior security folks at large companies gave us great feedback on our actual post mortem and how we’re changing things to hey, that’s great, but have you thought about x and so you’re just not going to know all of this in one go. So finding that advice and counsel all the way through and being very humble in that discovery process and truth seeking, I think it really, is what got us through here.

[00:45:11] Guy Podjarny: I think, all is sound advice and I guess it all comes back to like, don’t try to sweep it under the under the rug, it’s you actually engage in the trust and you open up, you’ll reach a better outcome. Talking a bit industry wide here you were talking about the industry has a bit of a distribution problem, or maybe not a bit, I guess, how do we think about these types of – I still would say, CodeCov is a great tool, a good solution. It’s used very, very broadly and probably a big part of this, or maybe waves or news that it made were related to the prevalence. I think that’s also true for heart bleed and the likes. Yes, the vulnerabilities or the breaches themselves, are noteworthy, but also just the widespread nature of the solutions. I don’t know if you agree with that or not, how do we tackle this as an industry? How do we get better? What are our problems that might want to get actions – we should?

[00:46:01] Jerrod Engelberg: I do believe that in a zero trust future for our industry, I believe that we will find a way to get there, or at least, very proximate to zero trust, because we’ve taken the steps we’ve shown the IDs, we’ve taken the phone calls, right to prove we are who we are, and all the things that happen within that and we’re close to now projects like Sigstore that I think is really promising. I can talk more about that, but fundamentally, I’m a huge believer still in open source software, I’m guessing that’s part of the reason that you do what you do as well, Guy. I think it’s been a huge driver of this open source movement that the dependencies that we have all around us in this huge innovation curve that we’ve experienced over the last few decades.

So instead of trying to turn our backs on that, and becoming insular, the question is, how can we, in a maximally safe way, interact with one another, right? Often, I think, probably in the open, that can bring – that can try new sites.

[00:47:03] Guy Podjarny: Can you actually say a few more words about Sigstore and I’m familiar with it, not everybody is going to say a few words about that?

[00:47:10] Eli Hooten: One of the things we asked ourselves after this after this happened is we ensure the integrity and safety of what we are contributing to the supply chain, right? In this case, CodeCov can bash uploader. We looked around for solutions, we ended up settling what shots I’m checking and signing, I talked about, but there’s an emerging project called Sigstore has a lot of industry backers. Its specific purpose is to code sign the things that you’re shipping and verify the things that you’re doing in supply chain. It does this through the use of a public open ledger. I’m really hand waving it to get it across to your listeners, but there’s a study of a public open ledger, that basically you could use to say, “Hey, CodeCov did the latest change the uploader at this time, here’s proof that it was in the did it. This is just publicly viewable and verifiable by anyone.” Then the second piece would be around that code verification that, hey, here’s a shot or some calculated digest that tells us this is exactly what code government to distribute.

So they distributed it, and what you’re receiving is what they distributed. It’s similar to what we did in a more bespoke fashion. What you see other companies do that operate in the space, I think Sigstore is moving to make that more official, more standardized, more accessible. I think all those things are good. So interns like Sigstore, others they’re trying to solve this problem. I think are great for helping to ensure the integrity of the supply chain. That solves one side of the problem, which is, the individuals or companies placing things in its supply chain for you to use are taking these steps to make sure that what they’re placing there is what they say it is.

The other side of this is the consumer themselves. If I’m building a CI pipeline, if I’m cobbling these tools together to solve some problem for me, I need to understand what I’m using, I should verify, I should go through the steps they provide to make sure these things are verified and correct. I should probably audit what I’m bringing in to make sure that, “Hey, is this dependency worth it from our organization for this project, etc.”

I think we’re only going to solve this problem if it occurs in a two sided fashion, where those that are contributing are providing ways to verify the integrity of what they’re doing. The consumers of this of these things are ensuring that the integrity is there, for doing the verification. I think making those steps helps get us to the zero trust feature that Jerrod mentioned and I think with time, we can make it there and that would be a net benefit for everyone operating in this space.

[00:49:25] Guy Podjarny: That’s great advice. Actually a great note to maybe like end our episode on. Jerrod great job actually reacting to the breach and being transparent about it and helping users actually overcome it. Again, just the big thanks for sharing the story and being transparent about it, and helping all of us learn from it. When inevitably or at least hypermobility, more people find themselves in this situation or they’re a bit better equipped at handling it. Thanks a lot for coming on.

[00:49:55] Eli Hooten: Thank you.

[00:49:56] Jerrod Engelberg: Thank you Guy.

[00:49:57] Guy Podjarny: Thanks, everyone for tuning in. Hope you found this educational helpful, and I hope you don’t find that you need it soon. I hope you’ll join us for the next one.

[END OF INTERVIEW]

[00:50:10] ANNOUNCER: Thanks for listening to The Secure Developer. That’s all we have time for today. To find additional episodes and full transcriptions, visit thesecuredeveloper.com. If you’d like to be a guest on the show, or get involved in the community, find us on Twitter at @DevSecCon. Don’t forget to leave us a review on iTunes if you enjoyed today’s episode.

Bye for now.

[END]

The CodeCov Breach

About this episode:

Tags:

Episode Transcript

About Eli Hooten

About Jerrod Engelberg

Up next

About The Secure Developer

Hosted by Guy Podjarny