Inside The Matrix Of Container Security: A Deep Dive Into Container Breakout Vulnerabilities

[INTRODUCTION]

[00:00:00] Guy Podjarny: Hello, everyone. Thanks for tuning back in. I’m Guy Podjarny, the host of The Secure Developer podcast. But today we’re going to hear a couple of different people. We’re going to do a deep dive into some really interesting security research that Snyk’s security research team has performed that uncovered some pretty high-impact container breakout vulnerabilities. And to do that, my colleague, Liran Tal, will interview Rory McNamara, who’s one of the key researchers that worked on this project. And Liran and Rory will dig into the security research process. How the vulnerability weren’t covered? What they are? What do they mean? And how you as a security leader should respond to keep yourself safe? I’m sure you will enjoy this conversation. And on to Liran and Rory.

Rory McNamara: “Even I was surprised when I was working on this Docker. It’s a very large software solution. It has a lot of functionality. It includes a lot of codes. It’s a very big project but it’s what happens when you have something that’s as flexible and as usable as Docker, you end up with a lot of inherent complexity, and dependencies and things that are linked all over the place.”

[00:01:07] ANNOUNCER: You are listening to The Secure Developer, where we speak to leaders and experts about DevSecOps, Dev and Sec collaboration, cloud security and much more. The podcast is part of the DevSecCon Community, found on devseccon.com, where you can find incredible Dev and security resources and discuss them with other smart and kind community members.

This podcast is sponsored by Snyk. Snyk’s developer security platform helps developers build secure applications without slowing down. Fixing vulnerabilities in code, open-source containers and infrastructure as code. To learn more, visit snyk.io/tsd.

[INTERVIEW]

[00:01:54] Liran Tal: Hello, everyone. And thank you for tuning in to The Secure Developer podcast. My name is Liran Tal. And I’m a developer advocate at Snyk. With me today is Rory McNamara. Thank you so much for joining us, Rory. And I guess we all know about how container technology isn’t entirely an isolated boundary as we thought. Thanks to your container escape security discoveries. Rory, please introduce yourself and we’ll get started with diving into the details of the security research.

[00:02:20] Rory McNamara: Thank you for having me. I’m Rory McNamara. I am a staff security researcher here at Snyk Security Labs where we focus on this kind of research that impacts developers. Quite long history in security. I’m very passionate about this kind of research, Linux Userland. I’ve been doing this kind of thing for a very long time mainly in kind of bug bounty projects. But, yeah, this is my favourite kind of research. I’m glad that it’s been successful and we’ve got something to share.

[00:02:44] Liran Tal: Yeah. Definitely. And since you mentioned Linux and your background, I’m going to do some quick meat fact-checks about hackers. I’ll get two questions for you. One is you sit in the dark with a black hoodie when you are running the proof of concept exploits.

[00:02:59] Rory McNamara: Unfortunately, not middle of the day. Sun shining in the window.

[00:03:04] Liran Tal: At least a cup of coffee maybe?

[00:03:06] Rory McNamara: Not so much. No. I don’t drink that much coffee. But I’m not quite the stereotype, unfortunately.

[00:03:13] Liran Tal: Gotcha. I also had a dead joke on container escaping in VI. Maybe I’ll skip that for the audience. Rory, what’s the TLDR of the research findings? What are we going to look into?

[00:03:25] Rory McNamara: Sure. Starting at kind of the least impactful and going to the most impactful. The focus of the research was building Docker images and running Docker images. And what we found is, in one instance, if you build my Docker image from a Docker file that I provide, I can delete whatever files I like from your file system.

In the remaining cases, we’ve got a total of four vulnerabilities. That was one. The remaining three, if you build a Docker image with my Docker file, I can break out of that Docker build process and compromise your host operating system. So, where you’re building it. And in one of those cases, the resultant Docker image that has been built, if you then run that at any time in the future, that will also be able to break out of the container and compromise your host operating system. Including when you run it on your Kubernetes infrastructure. When you run it on your Docker Swarm infrastructure. Wherever you run that resultant image, it can break out and compromise your host.

[00:04:17] Liran Tal: Well, I mean, this is kind of like a very regular workflow for me, right? As a developer, what you just mentioned is actually pretty common for me to like use some image from the cloud. From node or something. Or maybe clone or a repository. For me, cloning a repository and using someone else’s Docker file is potentially a pretty common scenario for developers and engineers. There seems to be very vital issue within the workflow of developers and engineers building on top of containers.

[00:04:44] Rory McNamara: Yes. Precisely. And it comes into the same category as anything else you’re downloading and running. Really, you ought to be checking what you’re executing. You want to know what you’re working with before running arbitrary things.

[00:04:54] Liran Tal: Yep. Great job. You mentioned a lot of container breakouts. Can you briefly explain what container breakout is and kind of like its significance on the whole domain of container orchestration and container technology like Docker and Kubernetes? How do those fit into this?

[00:05:09] Rory McNamara: Sure. Taking, I guess, a little bit of a step back, containers are generally used to bundle up software. You write your code and then you put it inside a container and then you run it somewhere. Containers are kind of an isolated bit of infrastructure, I guess, that then can be run. And they’re generally – in your big organization, they’ll be built in a specific place. They’ll also be built locally on your developer machine during testing. And, yes, you build it in your CI/CD environment. And then you’ll send it off and it’ll be run in your shared production environment, for example.

The interest in breakouts is essentially when building in a CI/CD environment or when running in your production, Kubernetes infrastructure, for example, that’s a shared environment. There are going to be the things that you have to work with as an application. So, your set of secrets. Your database access, for example. But then there’s a lot of other things running potentially on the same machine, the same host. Other people’s applications. Other build pipelines with access to different sets of things. Different credentials. Different database access. Et cetera.

The ability to break out of your container means that you can go from having access to the things that you were intentionally provisioned with, potentially given, to having access to anything that anyone has been given. And that’s a very high risk. People put a lot of effort into making sure that their containers are given the minimum requirement of secrets of access required to complete the job. Be able to kind of ignore that in essence by breaking out and accessing anything. It’s a very high risk. There’s a leak of secrets. Leak of access to sensitive systems. Other compromises that shouldn’t be expected of a containerization environment.

[00:06:46] Liran Tal: It’s basically compromising the host itself that may be running like my containers and other people’s containers. Potentially, a shared hosting kind of environment.

[00:06:55] Rory McNamara: Precisely. Yes.

[00:06:57] Liran Tal: Okay. Yeah. That seems dangerous if it happens. Maybe before we can dive into the very technical details of what all the container vulnerabilities that you find or some of them, which we’ll touch base on here, maybe kind of having a baseline here to explain what is this Docker file. And how does that relate to part of the vulnerabilities that you find? What role does it play? What is it made of? Where does it fit with the vulnerabilities that you basically found?

[00:07:24] Rory McNamara: A dock file is kind of the base configuration file essentially for a Docker image. There’s a plain text file. It’s got a handful of single lines. And it defines what you want your resultant container image to look like. You start at the top where the line says from. And that will tell Docker what the kind of upstream image – do you want it to be based on a Ubuntu? Do you want to be based on Debian or Alpine, et cetera? There are thousands of base images you could use.

And then the rest of the kind of dock file is going from that base image and turning into an image that you want to work with. Installing your build time dependencies potentially. Installing your runtime dependencies for your application. If you’re using Go, for example, you might want to install a Go toolchain to be able to build your binary. And then there’s a lot of other kind of smaller or I guess smaller in scope commands as part of the Docker file. Things like copy to get your application inside the image.

And the result of this Docker file after you built it is your kind of reusable image. You can put it into a registry and then download it onto a different host and rerun it there or run it locally for testing purposes. And that’s the kind of main aim. It’s turning a configuration file into a reusable block of infrastructure you can then pass around.

[00:08:36] Liran Tal: It’s funny because it’s both entertaining and scary at the same time. Because the vulnerability you found is related to one of the configuration or directives within the Docker file. And I think it’s one of the least ones that I would find as an issue. It’s the work there thing, right?

[00:08:52] Rory McNamara: Yes. Precisely. One of the configuration parameters in a Docker file is called WORKDIR. What it does is it will specify the working directory somewhat unsurprisingly of your code when it runs. Generally, it’s set to the root or it’s set to your application directory. When you just run my app, it will be relative to wherever you put your app. It’s mainly for copying things around and making sure that you know where you are in the image file system. But as we identified, there are some potential issues with that specific configuration item.

[00:09:22] Liran Tal: That is a scary part. Because when I build images and I talk about security of Docker files and containers, I think of make sure that your rand to install dependencies is npm ci or something instead of npm install, so you get deterministic builds. It’s about having a user-defined. It’s at least a privilege. It’s everything else basically except for the WORKDIR, which is the thing that I would imagine is like the list of something that would add problems, let alone security issues to me building container images.

[00:09:53] Rory McNamara: Yeah. Precisely. It was a bit of a surprise to me as well. It’s one of those ones that’s kind of small. It’s in the core. You don’t really think about it very often. But, yeah. It can lead to some quite high impacts.

[00:10:00] Liran Tal: I’m going to say great job at finding the least thing to worry about and making sure we do worry about it, Rory. Great job. Okay. Cool. Let’s talk about the vulnerability itself. How did you first identify these vulnerabilities in Docker?

[00:10:16] Rory McNamara: I have kind of a standardized workflow for when I’m looking at services on Linux or Userland Linux in general. The kind of first step is you instrument the thing you want to look at. In this case, I use the Docker engine, Docker daemon. And in this case, I use strace. Because I’m very comfortable with it. And then I just ran it and I played around. I exercised the functionality of Docker. Essentially, I built some images. I ran some images. And then I looked at the output. And there’s quite a lot of, I guess, depth to this. That’s why I’m going to shout out the deep dive blog post that we’ll be publishing alongside this podcast I believe, which goes into the nitty-gritty of how you use strace to find this.

But the kind of long and the short of it is strace will list every single syscall. And a syscall, it’s essentially anything interesting that a program does. It will log when you open files, when you write to files, when you make network connections, when you interact with the file system in general. When you’re interacting with anything of interest on a Linux system, it probably is a system call. And strace is a very handy tool to just log them and the parameters, et cetera. It gives you a lot of output. But if you know what you’re looking for, you can breathe through it. You can see ordering of things. You can see things that aren’t checking things the way they should be.

[00:11:28] Liran Tal: It’s like a process detective.

[00:11:30] Rory McNamara: Essentially, yes. And, yeah, you can use that to kind of ignore all the complexity of how things work and you just focus on what’s happening in that. I noticed, “Oh, this thing is out of order. And this thing doesn’t check. The parameters aren’t the way that it should.” I’m being intentionally vague here because it’s very – you’ve got to see the quotes for it to make sense. For that, I would suggest, if you’re interested, go look at the blog post.

But, yeah. Essentially looking at that. Looking at the output. Noticing things. Staring off into the distance for some time to try and focus and figure out how to reorder these things and make them exploitable. And then we ended it with some proof of concepts.

[00:12:02] Liran Tal: Gotcha. Basically, doing a lot of – or employing a lot of tools to – like the strace itself for you to be like a detective to find all those vulnerabilities. And the way that the program, or Docker, or some other aspect of that behave within the operating system.

[00:12:18] Rory McNamara: Precisely. Yes.

[00:12:20] Liran Tal: Amazing. Okay. I bet that was a lot of fun just following through.

[00:12:25] Rory McNamara: It’s my favourite kind of vulnerability research. There’s a lot of scope of interesting things to see.

[00:12:30] Liran Tal: It’s like an unfolding novel when you read it, right? There’s bits and pieces of maybe what could go wrong. Maybe then you find out that it’s okay. And then you get some clues to the areas and you start unfolding them. Like onion, piece by piece. And then you go and discover it.

[00:12:47] Rory McNamara: Yeah. Precisely. Yeah, there’s a lot of comparing lots of different files. And, “Oh, this timestamp was a few milliseconds before this time stamp.” That’s potentially interesting there. Yeah.

[00:12:55] Liran Tal: All right. Cool. Are all of the vulnerabilities actually exclusive to Docker? What were the findings kind of like related to?

[00:13:02] Rory McNamara: Of the four, three of them are Docker exclusive. They happen due to things that are specific to Docker engine. But the final one, the one that we’ been discussing and the one that is the most interesting, the WORKDIR one, actually, it turns out in a specific component of Docker known as runC.

RunC is a dependency that is quite widely used. And its job is essentially to do the actual nitty-gritty of bringing up a container. There’s a lot that goes into running a container. You’ve got your container root file system, which is from the image. And you’ve got the configuration of that. Things like the WORKDIR. You’ve got things like shared mounts. So, when you want to mount a file system or a directory rather inside container, runC is in charge of that. And then other resource management. So, limits on CPU usage. Limits on memory usage. Those kinds of things are all handled by runC.

Docker will take the image or the Docker file and convert that internally essentially into a format that runC understands. And then runC will create your container accordingly essentially. And it turned out that the root cause of whatever was part of runC due to the way it was handling some of these parameters essentially.

[00:14:10] Liran Tal: RunC is a component that Docker uses.

[00:14:15] Rory McNamara: Yes. It is a dependency. It’s also used by other things. It’s used by containerd, which is another component which is used by other things. Quite often, a Kubernetes environment. The underlying containerization solution. Because Kubernetes is quite flexible in this regard. But quite often, it will be containerd. And then containerd itself will use runC to actually spin up the containers, which means the impact is quite wide for this one. It’s not just Docker engine. It’s anything that uses runC. And, therefore, anything that uses containerd. Kubernetes, for example.

[00:14:45] Liran Tal: Again, I mean, I find that pretty wild. Because just like WORKDIR, where a lot of us wouldn’t be suspect as that being an issue. I think probably a lot of Docker users. Maybe not heavy Ops and [inaudible 00:14:57] and stuff like that were super knowledgeable in that space maybe. But definitely, end users of Docker. Developers like myself. We may not understand or know that there’s like a runC component somewhere underlying that would be potentially problematic or something that we should even kind like take care of or like understand that it’s potentially a security issue.

[00:15:16] Rory McNamara: Yeah. And when you dig into it, even I was surprised when I was working on this, Docker is a very large software solution. It has a lot of functionality. It includes a lot of codes. It’s a very big project. But it’s what happens when you have something that’s as flexible and as usable as Docker, you end up with a lot of inherent complexity, and dependencies and things that are linked all over the place.

[00:15:39] Liran Tal: Yeah. 100%. Maybe let’s kind of now spend a bit of time deep-diving into the technical details of the findings. I know there’s several CVEs and vulnerabilities in the works. But let’s dive into one of them. Maybe the WORKDIR one. And talk about how it works. How did you find it? What did you find there? How would the attackers exploit it? And just generally, the story around this WORKDIR directory.

[00:16:02] Rory McNamara: A couple of items of background I guess before we get into the actual vulnerability. On Linux, when you’re interacting with files or directories, et cetera, you’ll open them and then your process will have what is known as a file descriptor. In practice, it’s just a number. But it’s a special number. It’s tracked by the kernel, et cetera.

For our purposes, there is a number and there’s a certain directory in the proc file system, which is like an introspection file. You can see a lot of information about your own process. About other processes. And there’s a lot of detail there. But for our purposes, there is a directory in there where you can list all of the file descriptors for your own process. It’s used for things like automatically closing anything that you don’t need open. Or validating certain security expectations by reusing an open file descriptor. And it’s very useful for us as we’ll see in a second.

Another item that’s I guess worth discussing is these file descriptors are inherently privileged especially when it comes to containers. Anything that you open will then stay open for all of your children processes. Or in this case, containerized processes. File descriptions are inherited. But they are the access requirements or – sorry. The access provided stays the same no matter who inherits it.

RunC, as the kind of relevant component here, makes sure that by the time the container is running and the control is handed over to whatever processes is running in the container, which is usually user-supplied. Your main entry point process in your image. RunC will ensure that all of these file descriptors are closed. If they’re not closed, it’s a file descriptor leak and the processes inside the container could then potentially do things with those file descriptors, which is potentially bad.

[00:17:45] Liran Tal: This is how everything connects together, right? The file descriptors are the stuff that you see with the strace that you mentioned before. That’s how you connect.

[00:17:52] Rory McNamara: Yeah. Exactly. Exactly. And there are a lot of kind of numbers in the strace output that are small numbers. And you kind of have to follow them through and say, “Oh. Well, there’s a seven here. There’s also a seven here.” And that’s kind of how you follow it.

Yes. The way runC will close these is implicitly. There’s a specific flag that you can set. It’s like metadata in the file descriptor. And you can say, “Don’t let this specific file descriptor be inherited.” It’s called the cloexec. And it’s close-on-exec. When you perform the exec, when you execute the process inside the container, the file descriptors with this flag are closed and they’re not inherited.

This is fine. This works as expected. This is great. When you use runC in the normal way or anything that uses runC, by the time you get into your container and you have control over the code that’s running, all these file descriptors are closed.

The interesting thing that I noticed again with strace, when I was looking at the specific bring-up of a process and the way it kind of executes, I was looking for arguments to syscalls that I control as the writer of the Docker file essentially. As an attacker. I want to see where my data is going. I want to see what syscalls can influence.

And what I was seeing was runC was setting up all of the environment as expected. Doing everything normally. It would set all of the file descriptors to be closed on execute, this flag that I’ve mentioned, to make sure that they’re gone. And then immediately after that, the directory change as defined by the WORKDIR argument. The CHD syscall, which essentially sets your working directory. It’s like CD on the command line. It’s the same kind of principle. That was then immediately executed afterwards.

And the interesting thing to know and what I kind of had noticed at this point is, when that flag is set, when the cloexec flag is set, it’s an implicit close. At that point in time, the file descriptors are not closed. They will be closed. They will be closed before we get to anything interesting. But at that specific point in time when the CHD is executed and there’s a handful of other kind of syscalls around that time period, the file descriptors are still open at that point. And that’s potentially interesting.

There’s an action happening in the CHD with a parameter that I control defined by my WORKDIR argument in the Docker file in a time when there are things that are available that should not necessarily be available. In this case, the open file descriptors.

At this point, I had a hunch. I thought, “Well, there might be something here.” There are potentially a lot of vouchers open. RunC and Docker as a wrapper project deals with a lot of things. It runs as high privileges. It has access to privileged things. There might be files here that we could play with. There might be something interesting that we want to do.

What I did, I spent some time figuring out how to do this. Because it’s quite complicated. This is discussed in detail in the blog post as well. But I used strace again to pause the container when the CHD was happening. And why I wanted to do that was because of this proc file system I discussed earlier. Anyone can look in with appropriate privileges. You can go and look in that directory and see what’s happening. You don’t have to be really fast typing urgently on the command line to go and have a look at the right time. You can just stop the process and leisurely go in and look at the open file descriptions for that process.

I could stop my container at the point where the CND was happening. Go into the directory for that specific process and look at the file descriptors that were still there. A lot of them weren’t interesting. There are a lot of kind of system management file descriptors for things like events, and timers and locks. And there’s nothing much you can do in this context.

But what I did notice was there was a specific file descriptor which had the host file system directory open. And this was immediately potentially interesting. Because the file descriptors inside the proc file system are essentially real. They’re pointers to the directory or the file as a symbolic link. But they’re special symbolic links because they also work across mount namespaces. And this is very complicated.

Mount namespaces are what is used in containers to make the root file system look different to the host. You can have whatever you want. You can have an Alpine container running on top of an Ubuntu host machine. It doesn’t matter. It’s fine. Because it’s isolated. But file descriptors in the proc file system can ignore that in certain situations.

You can read files that are not in your mount namespace. You can look in directories that are not in your mount namespace. And that one specifically is what we have here. There’s a directory that we know is in the root or I knew because I could validate that is in the root file system. And there’s a reference to that directory open as a file descriptor inside the process that we are running.

What we can do now is obviously immediately test. Because keeping and reading strace, there’s only so far you can get. Immediately then, you can take the word to your argument of a Docker file and you can say, “Well, I want to set my current working directory to be what is essentially proc, self, FD and then the number. And this is the path to that file descriptor in the current process.”

And then I ran it. I noticed that it’s successful. It’s working. And what that had done is in the time window between when the file descriptors are marked to be closed and when they’re actually closed when the CHD happens, that CHD can be used to enter into the file descriptor that is still open for the small time being. And that, essentially, in a way, keeps the file descriptor open.

When the file descriptor is closed implicitly as we discussed on execute with the flag, our current working directory is still that same location. And that’s not explicitly closed, or reset, or anything. Because it’s our configuration. We asked to be there. Why would it undo that?

And at that point, even though all of the file descriptors have been closed by the time our own code is executing, we have managed to kind of retain access to outside of our container in a host file system. The specific directory is the cgroups V2 directory, which is used as part of the bring-up of the container to set some limits, to set some grouping of processes. To keep track essentially. Very internal, nitty-gritty. But quite interesting.

But crucially, the real host. The real host file system. And from there, our code, as long as it only ever uses relative paths, can traverse up the directory structure using dot dot specifically or Linux. And if you go far enough up, in this case, it was three directories up, you are in the real root file system of the host OS. We’ve successfully retained access to a file descriptor that’s only open for a little while by going into it. And we then have access to the root file system.

In general, in the kind of standard default case of a Docker engine running on Linux, Docker will be running as root. This access is therefore as root. Because your process is root. There’s a lot of configuration here. You can have rootless containers. You can have a specific user on the container itself. But in general, default settings, you’ll be there. You’ll then have root access to the root file system. And you’ve got a lot of options. You can change etc shadow. You can put SSH keys somewhere. You can put a Cron job somewhere. There are as many options as you can think of essentially to then break out of your container fully and have a full process executing in the host operating system.

That’s kind of the top to the bottom of it. Very complex. And I would encourage, if you’re interested, to go and read the blog post. Because it’s more well-written than I have just spoken it. And it’s the same detail.

[00:25:13] Liran Tal: Yeah. It’s mind-blowing just understanding or hearing here the walkthroughs that you have to do between the different details and operating procedures of like what you mentioned before. Docker is like actually a large software stack beneath it. There’s like a lot of running assets behind the scenes. And I think you mentioned a lot of time where you mentioned the importance of time. Is this sort of like a time of check, time of use type of vulnerability? Is it resembling something like that?

[00:25:43] Rory McNamara: Not really. Prior to the fix obviously, the fix does do some checks. But prior to the fix in the vulnerable case, the directory is not checked. There is a timing component but not in a race condition kind of way. It’s more of the order that things happen means that there’s a specific window where we can exploit this vulnerability or when we can perform its action essentially and get inside the file descriptor. But the window is kind of constant for our purposes. Always done in the same order. And there are no checks at that point in the vul condition.

It’s just kind of a time of use is possibly a little earlier than you’d expect it to be. And incidentally, the fix checks it to ensure that after it’s been entered. It’s still not a kind of race condition. It’s an ordering component rather than timing.

[00:26:33] Liran Tal: Gotcha. Maybe you should update your LinkedIn title then to like a time travel detective with all the timing stuff and then pausing containers while they run takes here.

[00:26:42] Rory McNamara: If you are particularly interested in race conditions explicitly, some of the other vulnerabilities are more race conditioning. And this is kind of the gist of the long blog post. We’ll cover three of the vulnerabilities becoming more and more actual race conditions. The last one is a complete race condition. As you say, time of check, time of use race condition. This one, the work, the one we discussed isn’t really. But it’s still a jumping-off point. Because the complexity is quite low and the complexity increases as you get more race conditiony. Yeah. It’s an interesting read, I think. I would encourage people to read it.

[00:27:13] Liran Tal: Oh, yeah. I’m pretty sure it is. I mean, especially the fact – I was thinking about it. And the more I was thinking about this type of racing conditions and timings, and those kind of vulnerabilities are mostly the unfamiliar or like they are uncommon or not even in the OWASP top 10. I’m trying to look at it not from the lens of a security researcher but that from end users. And timing-related stuff sounds something incredibly rare to hop and let alone to find it. There’s like a lot of stuff going into play here.

[00:27:47] Rory McNamara: I think the issue, it’s not as rare as people might like it to be. I think the issue is it’s quite difficult to detect. It’s not the kind of thing that you can scan for particularly.

[00:27:57] Liran Tal: More reassuring thoughts from Rory.

[00:28:01] Rory McNamara: You can be careful. If you build things using best practices, generally you’ll be fine. But it is a lot of case of bringing in different areas of the same process and making sure they happen in the right order in time as well as with the appropriate checks. And it’s not the kind of thing that you can just look at and know for sure.

As a result, it’s hard to find. It’s found less. But in my career, I found quite a lot of this kind of thing. Race conditions on Linux is something of a speciality of mine. And, yeah, they exist. You just got to look for them.

[00:28:33] Liran Tal: The immediate thing I was thinking of as a developer when I was thinking of timing attacks here and all of that was the comparison or the insecure comparison of passwords, hash password or something, where you have equal, equal, equal sign. And there’s like the timing attack aspect to it. So, people can come up with it. But even then, the more that I think of it, it’s like those things are even – I think are even hard to exploit over the network because there’s already a boundary of the actual back and forth with the packet jumps over not something that’s a LAN. But that is like the most hardcore thing I can think of. I mean, I know you’ve probably seen with this vulnerability and others that you’ve found that there’s even more greedy details for timing attacks.

[00:29:16] Rory McNamara: Yes. Definitely. There’s a lot of scope for making it more complicated. Because sometimes the window is tiny and you’ve got to spend some time making the window bigger. And then there’s a lot of complexity, as you say, with network and network effects and those kind of things. They can get very complicated very quickly and try to make them into a usable exploit can be a challenge.

[00:29:34] Liran Tal: As we jump off from the vulnerability deal itself to kind of the impact of this, how does this discovery challenge the core premise of containerization technology like isolation and everything that comes at top of that?

[00:29:48] Rory McNamara: It is a concern. It is a concern. A lot of people use this kind of thing as a security solution. You want processes that run independently that can’t be influenced by other processes. But I think there is kind of a bit of a disconnect here. Because when – at least from my perspective knowing the way Docker works, the isolation guarantees of Docker are more around isolation of environment, isolation of your specific application with respect to other applications. Less so isolation in terms of a pure security solution. And I respect that’s a very small distinction.

But as an example, there’s still quite a lot of shared resources with containers in the standard case when we’re talking about Docker, et cetera. And the big one is the kernel. All Docker containers on the same host use the same kernel. The same running kernel that is not just like a copy. It is the kernel. All you need is quite a big ask. But all you need is a kernel vulnerability and you can necessarily do it from a Docker container because they have access to the same kernel.

There are some restrictions. Things like SELinux can complicate things. But it’s still the same host. I think if you want strong security guarantees out of containerization, I think there’s more things that you should do or could do to improve isolation as a security feature rather than isolation as an environmental separation feature. I think it doesn’t really break the idea of containerization. I think containerization is still going to be very well used. I think it should be. It’s very valuable. But I think people should be aware of the security boundaries and what their attacks and if it’s actually is when they’re using these containerization technologies. But I think we will discuss in a little bit more detail on this.

[00:31:30] Liran Tal: For anyone tuning in, this is another breaking all illusions of container security with Rory. Shattering all pieces of confidence we had in containers up until now. Thanks again, Rory.

[00:31:43] Rory McNamara: Sure. Sure. Doing my job if it didn’t make people’s lives difficult.

[00:31:47] Liran Tal: A 100%. What do you think are the broader implications of this on software supply chain security? Because I think you mentioned before how this might be impacting us as like people who consume images of Docker containers from the broader public registry or whatever. Does that play a role? Or is it in different aspect as well?

[00:32:08] Rory McNamara: Yeah. In these specific cases or at least the work case specifically, there’s quite a few different attack vectors. The ones we’ve already mentioned are if you build my container with the Docker file, then you can exploit this vulnerability. But there’s a lot more to it. There are some options within Docker files that when someone builds with your image as an upstream image. As I mentioned, from the base image, there are some options for running things at the new build time. So, you can tell the child container, “Please also run these things when you build with me.” And that’s an attack vector in this space. You can exploit this vulnerability in the same way. And there are also just straight-running arbitrary containers. You can exploit this vulnerability in the same way.

[00:32:53] Liran Tal: Transitive dependencies of languages, is that similar in that sense?

[00:32:58] Rory McNamara: Yes. Yes. It’s very much like that. Any container in the chain going all the way up to the first scratch one, which is the base level that you can have a container. Any one of them can, in theory, exploit your vulnerabilities when you build or when you run with that hierarchy.

[00:33:13] Liran Tal: I’m pretty sure that is a very unknown fact or not something people straight up think about what we just mentioned.

[00:33:21] Rory McNamara: Possibly not. And, yeah, it does bring up, as you say, a lot of questions about supply chain security. And I think the fact and also these vulnerabilities I think should bring more light to validating your images. Having a kind of company internal registry of known safe images. There’s no reason to believe that the Docker hub is posting vulnerable images. I’m not saying they’re exploiting you. They very likely aren’t. But you don’t know, right? And you don’t know today if the image is the same as the one you pull tomorrow.

And I think there’s a lot of value in taking an image from today, having a copy of it internally and then only using that copy and saying, “Well, we’ve looked at this one. We know this one’s all right.” And that kind of attestation, known, safe images should be more of a focus rather than just pulling whatever your devs want to pull.

Not the devs are unreliable. But they don’t know. They don’t necessarily have the time to go and check very deeply. A known good set of base images can, in theory, protect you against this kind of thing. And it’s just an extension of what people currently do. You trust your devs. You have your code review process. Your pull process. Every line of code is going to be checked. but are you checking what happens when the docker file is built? Not necessarily. But if your Docker file is from internal registry/standardized Go base build, then it’s been looked at. It’s okay. And I think that can really have a good impact on supply chain security where you don’t necessarily know what is in the upstream.

[00:34:50] Liran Tal: Yeah. I mean, the more you talk about it, the more I think of it from a developer perspective. It’s like a very transferable risk that we have with packages. We will pin packages for specific version numbers. And even here, the more I think about it, it’s like if I would tell developers avoid doing from nodes “latest” just because of having non-deterministic builds and stuff like that, don’t know what breaks or whatever, here it’s more about you probably want to reuse a known tag that doesn’t change too often. Just because, also, now this impact – potentially, you can backdoor images. Or, again, not saying that some company did that on purpose. But, potentially, that happened. They suffered a breach. And they could be now kind of distributing a malware or something like that through one of those images. And you now trust it or you use the latest versions all the time of Postgres or whatever thing that you use. And, potentially, you’re now getting that backdoor or that trojan, that malware or whatever that is from this kind of – just by using it.

[00:35:51] Rory McNamara: Yeah. Precisely. Precisely. It’s exactly the same.

[00:35:54] Liran Tal: I see. Is this like a transferred risk towards cloud vendors? Because I assume a lot of people just use managed Kubernetes or just managed infrastructure in the cloud instead of running that on-premise for their run infrastructure?

[00:36:08] Rory McNamara: In a lot of cases, and we looked into this and the way that kind of the main cloud providers and their main products manage this kind of thing. When you spit up, as an example, Kubernetes cluster energy, the actual cluster is running on virtual machines inside your tenant. The risks are not really – it’s not shared tenant infrastructure at that point. If you compromise – if someone compromises a Kubernetes cluster built in this way, they’re not going to be able to get across that tenant to your Kubernetes cluster. There may be some other internal services that we can’t necessarily see that are more shared tenant. But the most popular application or – sorry, the most popular services is kind of single-tenant infrastructure.

What that does mean is that there’s likely going to need to be actions to be taken if you use these services. And we can see there’s a good example in 2019. Incidentally, also in runC, there was a similar vulnerability. RunC did that patch. Went out to the vendors and they did their patches, et cetera. All good. But in this case, the vendors hosted advisories. The AWS one is the one that I’ve been looking at recently. And it basically says, “If you use this service, you need to do this. If you need to use this service, you need to do this.” In the case of the managed Kubernetes, it’ll be you need to update. You need to click a button and it’ll update for you. There may still need to be actions taken.

We haven’t seen the advisories yet from the vendors for these vulnerabilities. I can’t say what they have in them yet. But the likelihood is it will be even if you use managed infrastructure, you may need to do something. I guess watch this space. Check with your vendors. But you can’t assume necessarily that you are fully safe just because you have a managed service in this scenario.

[00:37:42] Liran Tal: Yeah. It’s a good segue maybe to kind discuss about protecting and securing against this set of vulnerabilities that you find. What’s the current status kind across the board for this disclosure? Is it fixed? What do consumers need to know or need to do to protect themselves at this point?

[00:37:59] Rory McNamara: At the time we’re recording, the patches are all in flight. They’re all in progress. At the time of I guess release of this, all the patches will be published. The vendors, Docker and runC have very proactive. They’ve responded quickly. It’s been great to work with them in terms of what you need to do.

If you self-host, if you’re using Docker, if you’re using Kubernetes that you run yourself, check their advisories. They’ll publish something probably to say update. It’s not a complicated change. It’s not like a breaking change for these patches. It’s just a case of make sure you’re running the patched versions. And if you have managed services or hosted services using cloud provider, as I mentioned, check their advisories. The [inaudible 00:38:40] for 2019 was very thorough. Covered off what you needed to know and would expect those also to have been released before the release of this podcast. And all of our publications about these vulnerabilities will have all these applicable links. You should be able to come to our blog posts and find the advisories that you’re looking for.

But the big thing, the big action item is you will need to update something in general. Updating, it’s the right thing to do in this case. There aren’t really any easy mitigations to be done. They’re quite core components. In a holistic sense, there might be things you can do. If you got code review on your Docker files when you check them into your CI/CD, the vulnerabilities are going to be obvious. They’re going to be cut and dry. That’s clearly weird why are you doing that. If you can’t justify it, it doesn’t get merged.

In the case when you can’t update, which sometimes happens, or maybe your concern outweighs your active ability to update right this second, which in big companies is going to happen, we have built a couple of tools to maybe assist with this a bit.

In one case, we’ve got a static tool. And what the static tool will do is it’ll look at your Docker files and it will detect if the vulnerable functionality is being used. For the four vulnerabilities, we’ve got WORKDIR. The WORKDIR checks are a bit more complicated because a lot of people use WORKDIR. It will try and check if it’s using the vulnerable context of work there.

But then there are other scenarios. We’ve got a couple of cases of using the mount parameter to the run command, which before this research didn’t even existed. It’s not super common. But people use it. And there’s also an additional one in the syntax stanza, which is in the weeds a little bit. But this tool will detect the usage of these functionalities, which are not always used. And it will say, “Yes, you’re potentially using this.” And in that case, you can go and do a manual review and say, “Well, this is what you’ll expect. This is fine. Not an issue.” Or more interestingly, or more usefully I should say, it can say, “No. You’re not using that functionality.” And therefore, you can – given your risk profile, you could consider that you are not being exploited there. If you’re not touching the vulnerable functionality, you can’t be exploited that way. That’s one of the tools we’ve got.

And the other tool is a bit more – a bit of a deeper tool. And it’s a dynamic instrumentation tool based on EBPF. You run it wherever you run Docker. You can run it on your CI/CD hosts or your Kubernetes hosts, et cetera, and it will instrument the processes and will attempt to flag suspicious behaviour that matches the patterns of the vulnerabilities.

Now there’s a lot of flexibility in the vulnerabilities. These tools are not going to be 100%. But they’re quite good. The team that worked on them have done a good job there. If you can’t update, and I would strongly suggest that you do update, but you still want to do something, maybe look into these tools. See if they fit your requirements. See if they mitigate some of your risk profile while you get to being able to update.

[00:41:30] Liran Tal: Yeah. I mean, that sounds helpful. I imagine most people would probably rush to update. But there’s always a long tail of some of those instances that people have a problem updating whether that’s resources or anything else. It sounds like those tools are going to be super handy for some people.

[00:41:48] Rory McNamara: Hopefully so. Yeah.

[00:41:48] Liran Tal: Would you say – we touched base on this a little bit before. But would you say – how effective are kind of the current security measures for containers against this kind of vulnerabilities that you discovered? We kind talked about if I’m running containers as non-root, like I’m using a lower-privileged user for the container, or the environment itself runs the container as read only file system, does any of those kind help reduce the attack vector or not?

[00:42:16] Rory McNamara: Because of the way the vulnerabilities work, the mitigations or release as far as we looked at non-privileged containers, we looked at relearning systems. We looked at things like AppArmor, which has good connections with Docker, they can’t really mitigate the exploitation of the vulnerability. It’s quite tricky in the work – for example, it’s tricky to stop someone entering that directory because of the way it’s being done. Because of the timing it’s being done and the context under which it’s being entered. The security measures are not really going to stop it. Non-privileged containers can still enter that directory.

Where these security measures do help or will help potentially is in the actual kind of post-exploitation phase. By that, I mean turning – you’ve exploited the vulnerability and now have access to the root file system into something interesting. It’s going to depend a lot on your environment, unfortunately. If you are running, as an example, read-only file systems, it will stop things dropping SSH keys, modifying etc shadow, dropping cron jobs. But another attack vector that – and it’s one of the ones that I used as part of the proven concepts, is you can just ask Docker for a more privileged container. You can navigate the root file system and call the Docker Unix socket, “Please run a new image for me with full privileges, with full capabilities.”

And at that point, the read-only file system hasn’t really helped. Because that’s still necessarily a thing that Docker can do. Things like unprivileged containers or containers – there’s two kind of sides to that. There’s running Docker as not rooting the first place. And then they’re setting a user on your docket image. They both can have some impacts.

Again, it depends on your environment. If everything is running as one specific user, does it matter if you can’t become root? Because you can access everything anyway. These security measures are great that they’re a good idea to have. Anything to decrease privileges is going to help and it’s going to make exploitation harder. And it’s going to make drive-by exploitation even harder. If you can’t use the one standard payload that drops a cron job, then you’re not going be attacked by that specific payload. But it’s not 100%. You’re not going to be fully secure.

In terms of other kind of tangential technologies, there’s things like micro VMs, which are becoming more and more popular as a kind of drop-in wrapper of containers. The big one, I guess the most well-known potentially, is Firecracker by AWS, which is an open-source project. And it’s built into things like Kata containers. What that will do is rather than running your built container image as a container per se, it runs it as a virtual machine and gives you an additional set of security guarantees.

If you are running this – if you try to exploit these vulnerabilities in micro VM, it wouldn’t work. The attack class will not work in the case for micro VM. As a security measure, micro VSs are going be very useful. But they are a lot more infrastructure. There’s a lot more work that goes into. It’s not just clicking a button and say, “Yes. Do this please.” You’ve got to architect around it. But I think, again, it’s something that’s going to become more popular for people who are concerned about the security guarantees of containerization.

[00:45:35] Liran Tal: I really appreciate that take really because I think it’s something that is very much on top with security speak, right? About having layers of defences. Having defence in depth. And it’s not securing one thing. It’s not about the silver bullet of using non-privileged containers or read-onlys. You have to think about the whole thing. You have to architect the whole security solution from the ground up and kind put guardrails wherever you can. When something breaks apart, the attack, the blast radius is very much still contained. Sounds like this is what you were alluding to. And I appreciate that all. I think that makes a lot of sense.

[00:46:12] Rory McNamara: Yeah. Yeah. That’s exactly what I mean. Yeah, the harder you make it, every single extra layer makes it more and more hard. And at some point, you just can’t feasibly do it even if it may technically be possible. Yeah, defence in depth is amazing. It’s one of my favourite things of how to make a difference in security.

[00:46:28] Liran Tal: As we can like – I think go about wrapping it up, this has been super insightful. I think very useful both as a deep dive and as an understanding just adjacent concepts of container technology and just security practices in general. Learning about strace, and tools and how this whole discovery and research works. Would you have some takeaways for application developers and like mitigating this kind of what practices they should take to minimize risk? Or even recommendations for organizations and what they could do to protect themselves from things like this in the future?

[00:46:58] Rory McNamara: At both an individual developer in an organization level, I think a very useful one for known vulnerabilities is going to be have the ability to update quickly. In the case of an individual developer, it may not be too difficult to just update Docker or Docker Desktop, whatever you’re using, it’s more complicated in the case of a full-size company. But have a plan of action. Be able to make progress on an update as soon as this kind of thing becomes public in future is going to be the most valuable.

You can’t protect against things you don’t know about, worrying about all of the zero days that ever exist may not be the most good use of time. But plans in place to be able to action things quickly is going to be great. As this comes public, I’m sure people will try and use it potentially maliciously. If you’re already patched, you’re not worried anymore.

And then on a kind of more ongoing basis principle of least privilege is very valued. And I know that’s not something to just drop. But in this this context, thinking about CI/CD environments, and production environments and where your code is running, building it in a way where it has the minimum viable set of access. Your CI/CD does not need a full root cloud account to do everything that it’s doing. It maybe needs a key to push images sometimes. Making sure that everything has the least access that it can get away with means that if this kind of thing does happen in future and you’re not able to update, minimizing, as you mentioned earlier, the blast radius. Making sure that the least can be achieved by exploiting these vulnerabilities. And also, the other security measures we discussed, which is kind of a defence and depth/principle of least privilege. Things like rootless Docker or setting users as well. Micro VMs, as we mentioned, is worth looking into. And then more procedural things can also be very beneficial.

In this specific case, if you’ve got proper code review on your Docker images, you’re not going to be able to get exploited realistically. The vulnerabilities are not subtle. As you’ll see, as they come out and when the proof of concept are available, they’re very obviously weird behaviours. They’re very obviously doing something suspicious. If you’ve got a second developer checking your first developer’s pull requests, they’ll not be able to slip such a thing in. And that’s going to massively kind of improve your risk profile in terms of how this kind of thing could be exploited. Those are the main things really. I guess it’s kind of two categories is making what you’ve got secure and making the way it’s used also secure.

[00:49:21] Liran Tal: I like how you split that up into two big things to do.

[00:49:26] Rory McNamara: Easy. Just do the two things. Just a morning sort of work.

[00:49:31] Liran Tal: So simple. Cool. I bet you have some upcoming CVEs and more research you probably can’t disclose too much about with us.
[00:49:40] Rory McNamara: Yes. Unfortunately, you have to wait and see. Subscribe to us where you can. Follow what we’re doing.

[00:49:45] Liran Tal: Subscribe to Rory.

[00:49:47] Rory McNamara: Well. No. Synk. I’m part of a team. It’s not just me. We all work together on this. And shout out to my team for being great and working with me. And, yeah, we have plans for future work and we’ll hopefully be publishing lots of blog posts and interesting work in the near future.

[00:50:02] Liran Tal: I can’t wait. That’s going to be awesome, I’m pretty sure. Rory, we’re going to wrap up. Thank you so much for coming to the show and sharing all of your technical know-how with us. Thank you so much. And thanks for tuning in, everybody else. And we’ll meet again in the next episode.

[00:50:20] Rory McNamara: Thank you very much for having me.

[OUTRO]

[00:50:25] ANNOUNCER: Thank you for listening to The Secure Developer. You will find other episodes and full transcripts on devsecon.com. We hope you enjoyed the episode. And don’t forget to leave us a review on Apple iTunes or Spotify and share the episode with others who may enjoy it and gain value from it.

If you would like to recommend a guest, or topic, or share some feedback, you can find us on Twitter @DevSecCon and Linkedin @thesecuredeveloper. See you in the next episode.

[END]

Inside The Matrix Of Container Security: A Deep Dive Into Container Breakout Vulnerabilities

About this episode:

Tags:

Episode Transcript

About Liran Tal

About Rory McNamara

Up next

About The Secure Developer

Hosted by Guy Podjarny