Ep. #101, Running and Expanding a DevOps Team with DJ Schleen

[00:01:51] Guy Podjarny: Hello, everyone. Welcome back to The Secure Developer. Thanks for tuning back in. Today we’re going to talk about the sort of broad state of software security and maybe slightly crappy status for security, and to do that, we have a software connoisseur here with us and very experienced in DJ Schleen, who is the VP of Infrastructure and Developer Operations at VillageMD. DJ, thanks for coming on the show. It’s been a long time coming.

[00:02:16] DJ Schleen: Yeah. Thanks, Guy. It’s definitely great to be here. We’ve had some stops to start schedule wise, but busyness and lulls come into play, and here we are.

[00:02:25] Guy Podjarny: Awesome. We’re here to share some of the views. So, DJ, you’ve had a rich history. You’re kind of going in and out of dev and security and vendors and suppliers. Tell us a little bit about what you do and how you got into it. How you got in security? Maybe a bit of a brief tale of the journey to here.

[00:02:44] DJ Schleen: Oh, man. The tale goes a long way back. I won’t go too far back with the years because that’ll date me. But back when computers just started coming out and the internet’s first started surfacing, I was pretty interested in how everything worked. So, the whole hacker, to tinker, came into play. Then security wasn’t something that anyone really thought about. Although there was none, it was pretty easy to get into websites and do some crazy things and get free phone calls, and all that kind of stuff. I started off as as a hacker and eventually got to web design and form one of the first web design companies in Canada back in the ‘90s.

That sort of brought in a big influx of software development history in my career. Security was, again, nothing that was a career-oriented thing at the time. We were just developing software, pushing it out as fast as we could, and putting this little word in Visual Basic Code, like phrase on error, resume next, right? So, I think that was the motto of the ‘80s and ‘90s. But fast forward to what I’ve been doing in the past couple of years, have been heavily involved in DevOps since we first started realizing that what we were doing was DevOps. My career brought me into the healthcare industry where I was security architect over at Aetna, CVS, so deep into the healthcare industry. That’s when I really started getting into the security aspect.

I joined Aetna after doing a lot of ethical hacking, and breaking and entering into buildings and red teaming. The hacker came back and then I realized that security could be a career. So, joined Aetna, CVS. Joined United Healthcare and the company called Rally over there. Now, I found myself at VillageMD. I started off here as a DevSecOps director, and after months of convincing that we had to break down silos. Now, I’m the VP of infrastructure and developer operations and we are responsible for DevOps cloud technologies, SRE group, DBA, network, physical network and voice. So, lots of changes that happened, but bringing infrastructure together in what we do from a DevOps perspective, it’s sort of come full circle and now we’re just DevOps.

[00:05:02] Guy Podjarny: This includes, though, like the security component is still a part of it. It’s just, it’s a blend of the security aspects of DevOps of that platform over that infrastructure as well as the non-security attributes of it?

[00:05:14] DJ Schleen: Yeah, absolutely. DevSecOps as a term, I evangelize it quite a bit. Both when I was working at Aenta, and when I was in the vendor space. One of the descriptions or the definitions of DevSecOps is that security is everyone’s responsibility. So, by defining that statement, it really makes the term DevSecOps irrelevant. One of the things that I really wanted to do was break down the silos. One of the things I noticed was the past couple of organizations I’ve been at, we ended up having a DevSecOps group, which basically is AppSec, turning into DevSecOps. And then we had DevOps group, nobody talked to each other. When we were back in the same situation that we were 10 years ago, people started saying, “Hey, communication, and collaboration and flow feedback and experimentation have to come into play.” So, that’s why I destroyed the silo of DevSecOps and security is just embedded in everything we do. You can call it DevSecOps, DevOps, rainbow monkey, unicorn pony, whatever you want. But it’s programming, it’s development and security has to be an integral part of that, or else, it’s not a quality piece of software you’re providing, right?

[00:06:22] Guy Podjarny: Yeah, that makes a lot of sense. I remember, I guess the same thing happened and still is happening in DevOps teams, which is you have DevOps, which is all about breaking the silos, and yet, you have DevOps teams. The intent is positive. It’s around saying, it’s a dev- oriented ops team, it’s a modern approach, or they take a different philosophy than maybe an IT ops. But it’s interesting to see, this DevSecOps as well, and definitely see that title kind of creeping up sometimes with the proper, maybe mentality behind it. Sometimes just because it’s a catchy or newer thing, but if you don’t –

[00:07:00] DJ Schleen: People gotten behind it. One of the things that I like to think about is in DevOps, we create a highway that developers and engineers deploy their code on, so we control the guard rails, which is the security, the asphalt, the route and where things go, which is our pipelines. We have speed checks all along the way just to make sure that things are going right. You’ve got the unlimited speed in some, some places hovering over the road. But security being part of that, we provide the route for people to deploy safer software sooner. So, you’ve got DevOps teams versus DevOps embedded into teams. I think both works, and the bigger the company, the more you start ending up having that DevSecOps, DevOps focus team.

[00:07:48] Guy Podjarny: Yeah, I like that analogy. It’s a slight abuse of the paved road, but I actually think it really holds in terms of saying, “Yeah, you paved the road, but use of the guardrails, you use the conditions.”

[00:07:59] DJ Schleen: Yeah. The engineers are the ones who are driving whatever car they want, thinking that make a Porsche, drive one of those. They can drive a Fiat, whatever, it doesn’t matter.

[00:08:08] Guy Podjarny: Yeah, absolutely. So, thanks for the ad. You’re so you’re running into sort of broader infrastructure team. So, let’s dig into the meat of it a little bit. You’ve had these different perspectives, and slightly glossed over, I think the fact that I think a couple of times you’re in sort of software vendors, providing tooling for it. I think we’re doing on this journey, we’re building out these tools, you’ve consumed them, you’ve evangelized them. I guess when you when you look at the state of the security industry, the software security industry, what’s the score? What do you say is the current state?

[00:08:46] DJ Schleen: Well, it depends on where you are. I guess, I think an overall industry state in the tooling, we’re there. We’re getting there. We’re in the A’s and B’s, right? Ten years ago, when DevOps started coming around, there was really no good pipeline really until CI/CD tools, people had to cobble together things. I remember at Aetna, we built our own deployment pipelines and built it on some open source software. We cobbled together things to try to make this a reality. When it comes to security tools, the DevSecOps, six, seven years old, maybe there hasn’t been many tools. We started looking – and tools meaning like being the implementation of good technique and good people. People process and then tooling. But the tools never caught up, or finally catching up. Sorry. Looking at some of the traditional security scanning tools six years ago, you might as well watch paint dry. I remember a time when I was in architecture class in Ottawa, of my historical building architect by education, we would draw a building, we bring it to the plotter and we would wait probably a good hour for that thing to come out. This is where the static analysis tools and all these tools were back in the day. I’ve always said that anything less than two minutes, is where we want to be from a scanning perspective. Now, we’re down in milliseconds and seconds in some places with some of the toolings coming out.

So, I think tooling wise, we’re getting there. I think, implementation of tool wise, I think we’re as a whole in society as general DNC. I don’t think security’s still taken seriously by a lot of organizations. It’s something that people know they have to do, but there’s always a priority, and always an excuse not to do something like this, security wise.

[00:10:36] Guy Podjarny: So, all in all, you’re sort of of describing what a terrible state in terms of the indeed the tools or the arsenal, that is at our disposal. If you think about this challenge, and whether it’s VillageMD, where you’re today, or the best companies, how do you battle that? The tools exist in the ecosystem, but the team is not embracing it correctly, what can be done?

[00:11:00] DJ Schleen: Well, we tried it a couple of different ways in the past. One is like the command from a pie on top of the security mountain, thou shalt put all this security tooling in place. That was the sort of five years ago type of approach to it, where we have these tools, we put them in. It’s the Phoenix project completely in reality, where people start yanking tools out security ends up breaking bills, or breaking systems. The communication is not there. So, that was one way that I tried to implement it, and it didn’t work. So, epic failure of DevSecOps there. Now, I look at it more of an engineering first approach where you almost have to make security invisible. Not saying that it’s not there, but keeping it right in the developer’s area, in their line of sight. So, it’s an integral part of what they do. And really, you treat it as an attribute of quality. So, quality infrastructure code, quality application code, quality SQL code, whatever it is, and quality being linting, or complexity, KPIs, and then the security of it as well. Also, treating any kind of security vulnerability is just a work item or a task.

For example, in our backlogs, or in our tracking systems, we have epic and task. We don’t track bugs, we don’t track vulnerabilities, it’s something you’d have to do in a collection of things. So, really try to get essential on it and really simplify things. So, a couple different ways. I think the engineering first approach is really hitting, it starts resonating with people. We’re going to continue to expand on and work on see what the results are with that.

[00:12:41] Guy Podjarny: I like the kind of assimilated approach of having the tasks view along with the rest of them. Oftentimes one of the challenges is prioritizing and fine, it’s all in the same backlog. But there’s nothing really that I feel like getting when I actually take that task on versus that product capability there that if I did, there will be upside. How do you battle that?

[00:13:04] DJ Schleen: So, going back to my ITIL roots, you have impact and severity. So, you have that matrix. And then if you add likelihood into it, things start sorting a little bit different in your backlog. So, likelihood of an impact to the business or likelihood of breach or attack or something to that effect. So, you can use it use it as a magnifier, really, when it comes to your impact and severity matrix. You’re right on there when it comes to security being a nonfunctional attribute sometimes. But again, it’s like a quality attribute.

So, developers are technical artists, right? And they love creating things. We have to make it part of the paint, right? It has to be an ingredient of the paint that they use. So, they can paint whatever picture they want. But there’s this resilience, security resilience that’s baked into the paint. This is an analogy. But again, it’s a cultural thing where you treat function and nonfunctional requirements as equals. The business is always going to convince you to say, prioritize feature over that. And security is always going to press you on the other side and say, “Hey, we have unpatched servers”, or we have these vulnerabilities in our code. It’s a delicate balancing act. Now, that’s one of the reasons I pulled all of these disparate teams together so that we can really have a consistent way of dealing with security across all of our infrastructure code and otherwise.

[00:14:30] Guy Podjarny: What’s the scale that the team operates in today that VillageMD is – what’s the rough team sizes there on the engineering and security side?

[00:14:38] DJ Schleen: Oh, man. Just to put this into perspective, last October, I think we had 1,500 employees, like physicians as well as it and business. Now, I think we’re up over 3,500, almost 4,000. Next year, we’ll probably be able to own 10. So, things exponentially are sort of grow infrastructure wise with that. Right now, I’m seriously trying to expand the team, and hiring some senior and junior folks in the DevOps and DevSecOps kind of fields. I think that’s one of the struggles is trying to keep up with the growth. I guess probably, in my organization, there’s upwards of 30 to 40 people. I believe that innovation is something that can help mitigate some growing pains of growth. Again, keeping up with the business needs, and the acquisition of clinics and what we’re doing putting stores into our clinics into Walgreens, over the next couple of years. It’s daunting. It’s definitely daunting.

But security wise, we do have a GRC team and AppSec is rolled into what I’m doing now. We also have our SOC. So, it’s a quite a big organization, just with the security and infrastructure teams. And then we have the engineering teams as well. So, I couldn’t put a finger on exactly how many people, but there’s quite a few.

[00:15:55] Guy Podjarny: Yeah. But it’s a good size and it’s growing. So, I guess back to your previous point on thinking about these issues that they are, and the business might be pulling naturally towards more functional capabilities versus the security. How do you track this? How do you know that you’re not falling behind? The teams are not overly lowered into that sort of business bias, especially, if you’re saying that it’s in the backlog, and people just need to track and do try to budget it. Is there a measurement or sort of KPI that you expose the team to? How do you keep everybody even come for their own good? How do you keep everybody on as their own path?

[00:16:37] DJ Schleen: Well, there’s a couple things there. One is, the way I think there’s 24 hours in a day, 8 hours has to be for work, 8 hours sleeping and 8 hours play, and if there’s any imbalance in that, something suffers. So, there has to be a balance in life. The second part of that is, people need to work 36 hours a week, and then 4 hours has to be for experimentation. So, 10%, that goes to my grandmother’s saying the 10% of every dollar you make is yours to keep. It’s the same with time. That has to be experimentation time and that’s the true essence of where you want to be with DevOps.

Now, that leads into a couple of philosophies, I believe and one’s essentialism, which is the relentless pursuit of less and the ability to choose what is essential to do versus what can be delayed or put off. Effortlessness, which is when everything becomes essential, how do you make it effortless, which is in my opinion, automation, and good process and procedures. And then the loss of traction. So, taking away time to make people more innovative with the time they use, and having those cultural changes is really difficult, because people think, “Hey, I have to work 40 hours a week, and so much I have to do.” But you start taking away time. And then people start valuing that time of improvement, and it helps with the rest, and they make sure that they balance.

Now, from a tracking work perspective, getting back to your question, we do one-week sprints. The reason being is if everything’s always priority, we have to do it now, we have to do it now. No matter where it comes from. If we do a week sprint, we can say to people, like, “What’s the priority and engage the importance of it?” So, if it’s all hands-on deck five alarm fire, yeah, we’ll do it now. We’ll move work around. But 9 times out of 10, when you ask somebody if we plan to into next week sprint or after, they say, “Yes.”

Really, what we’re doing is we’re protecting the workstreams of our developer by ensuring that we’re really managing the work that’s coming in and then prioritizing at the same time. Some of the ways that we manage that work too, is we’ll have one planning session or stand up on Monday to plan out the week, figure out what’s in that sprint. And then we do all of our stand ups daily inside of messaging platforms, like Teams, or Slack. Because we don’t want to have people join a 10-minute meeting and then have 20 minutes of reengagement time, where they’re trying to figure out, “Okay, what was I working on before this?”

So, it’s really protection of time for people, the ability to self-organize, understand the work that they have, manage the work that’s coming in, because there’s always a ton of it. And then prioritizing things, whether the security or not.

[00:19:10] Guy Podjarny: I think you’re very consistent here, what sort of the security is just an aspect of all this other work, you just build it. One of my challenges that I sort of oftentimes relate to around security is that it is invisible, but not in a good sense and that it’s done and it’s invisible. But other than it’s invisible when you don’t do it. So, within that, what’s an alarming threshold, in any one of these given sprints, you have a certain amount of security backlog, typically more than you really are able to do indeed. What’s the minimum? Maybe the right way to ask is like, what are the KPIs are you using to track how you’re doing as far as security goes?

[00:19:49] DJ Schleen: So, there’s a few things we do. From a security perspective, we rely heavily on our tools to tell us how many vulnerabilities we have, what the severities are. And not just vulnerabilities and infrastructure code, but also the configuration. So, as you start looking at our Kubernetes clusters or cloud environments, hardening those systems, and then we track those over time. So, we want to see a consistent trend downwards on the vulnerabilities that we have. One of the things I like using are short graphs or quality control charts where, for example, we implemented Snyk, one of my last companies, and we noticed that we had 10,000 vulnerabilities. I’m just picking a number here.

And then, as we started remediating these, we wrote a couple of different add ins for GitHub where we would auto create Jira tickets that would mirror the pull requests, and the vulnerabilities that were found. And then when the pull request or the vulnerability was gone, the ticket would auto close. So, we have those zero-touch kinds of tickets, that would just fall off the backlog. But from a quality perspective, you’re never going to have zero, and I think that’s where we have to, as an industry accept the fact that vulnerabilities, you’re only as good as your last scan. So, you have to continuously scan for security vulnerabilities. And that’s really continuous quality monitoring.

So, upper control limit, lower control limit, you figure those out. And as soon as you start getting up above that boundary, then you know that we have to pay more attention to vulnerability remediation. That might go into a certain sprint as a sprint theme. If you’re under that lower control limit, then you’re in a good place. So, it’s really knowing the range of what normal is, and keeping everything in that range and relying on the tools to tell us that, but then, relying on the KPIs and metrics and the graphs to help us feedback and the prioritization of how we address some of these issues.

[00:21:44] Guy Podjarny: Yeah, I think it sounds like there’s a heavy emphasis in sort of doing better than we did yesterday, kind of have the graph trend in the right direction. So, you’re not accumulating, you’ll get to a good place, that plus a correct sorting algorithm, as you pointed out before as well, around likelihood and other parameters to bump the right items at the top of the list.

[00:22:06] DJ Schleen: Yeah, the biggest kicker is having tools that don’t have many false positives We just don’t have time as high velocity organizations to deal with triaging tens of thousands of issues. So, I rely a lot on the accuracy of the tools. And again, the tools are getting better and more mature and faster. When it comes to security, the reasonable level of effort to secure your systems. So, if you look at compliance requirements, people say, “I’m not really sure exactly which ones but there are two that most of these compliance requirements have, which are SAST and DAST.” Okay, where’s your risk in that? As we mentioned, I was at a vendor, we were doing software supply chain things and found that 3% to 15% of any code is developed by the company that’s developed 85% to 97% is all open source components. So, where’s your risk if you’re using SAST. So, this is where you have to use a combination of tools and really fast try to identify these things. And if there are fixes and binary compatible fixes, implement them. It’s almost like a zero touch, easy thing for engineers to do. Yeah, I digress there.

[00:23:17] Guy Podjarny: Yeah, but it’s important. So, I have a grade two tools related questions for you. One is, maybe it’s not tools, but rather risks. What’s top of mind for you right now? As you think maybe it’s about yourself, but also as an industry, what would you say are the security concerns that people should be most mindful of in this context of application product, security, DevSecOps?

[00:23:38] DJS: Well, I think the biggest one I’m concerned with right now is infrastructure a code and checking for security issues or configuration issues and errors. As people start implementing – everyone’s going to Kubernetes, buzzword of the day. It’s actually a wonderful tool, but a fool with the tool remains a fool. Buckminster Fuller said back in the past that statement, so really need to make sure that our perimeters are known, that they’re configured properly. And that’s just again, that’s part of the security picture. If we were looking at unpatched servers or configuration, some of those external controls that we have, could be compensating control for an application vulnerability. But again, application vulnerabilities are still part of that, with the supply chain software scanning that we got now, the ability to find and remediate vulnerabilities, or at least know an upgrade path to get to a cleaner and higher quality component. We have the technology to do that.

We have static analysis testing tools that can look at what the 3% to 15% of our code is, and search really fast through that, and look through the deltas, and no longer waiting two hours around it. But the infrastructures code domain is still fairly new. That concerns me. Part of the continuous policy enforcement configuration checking, but it’s also, you start getting into continuous verification, which is to our systems work as they’re expected to work. So, that’s the next on my horizon is, how do we ensure that our systems are operating how they should be? But the current problem for me is, how do we ensure that things are configured like they should be?

So yeah, that’s I think, from a tool perspective where I see at least my current state where we need to be. If someone’s just starting with security and DevSecOps, or DevOps with security added and sprinkled in or whatever you want to call it, your biggest risk is going to be around the software supply chain, so you have no idea what’s coming into your systems. The dependencies of those dependencies can cause issues.

A quick anecdote about that. I wrote an article once that was 0 plus 2 equals 154, and adding two components, or two little import statements into a GO application, brought in 154 sub dependencies, and of that nine, were vulnerable. So, it’s understanding the things that we don’t know, I think, is the problem right there.

[00:26:12] Guy Podjarny: Yeah, definitely. I love the catchy title there. I’m well familiar with it clearly. So, I guess, for my closing question to ask. So, you talked about tools evolving, and you mentioned a few attributes of them, which are sort of accuracy and speed. What else is on your list when you’re talking about a tool needing to be fit for purpose to work, in this high velocity surrounding that you’re working?

[00:26:40] DJ Schleen: That, I think, gets back to the engineering first approach. I think the security organization is going to look into the UI. They’re going to find out some information, and aggregate data, that kind of thing. Getting KPIs and metrics out of it. But for the tools to really be successful, it’s integrating into git, where people are working, for example, GitHub. And having all the information there.

So, if a vulnerability is detected in a third-party component, and it can be fixed, it’s just sitting there waiting for a pull request approval, right? And then that can go through your existing workflows for that system. Even if you can’t remediate it, potentially having that there as well, and saying, “This pull request is broken, because it won’t actually work, but this is what you have to do.” So, developers tend to work better there, or creating some tickets that reflected but low impact, low touch. Accurate things, ease of remediation, if they can be remediated, even correcting code, like you might have a RegEx in your code that doesn’t catch something. Having the tool say, “Hey, this RegEx is wrong. And this is what you have to do if this is your intent.” Sort of intelligent tooling, not to the point where it’s AI, but things that actually detect quality.

So, security being an attribute of quality for me, I think that’s really important is to keep it in that quality in almost the QA checks of build process, right? So, I think that’s where they’re going to be most successful is. The IDE is nice. The web interface is nice. But when it comes to being really effective, there’s nothing like a command line or a pull request.

[00:28:24] Guy Podjarny: Kind of being well integrated in. I love the consistent analogies to think about this as an aspect of quality. Think about it as a tool of software. This has been great, DJ. I have all sorts of other questions to ask, but I think we’re kind of at time. So, before I let you go here, one question, I’d like to ask every guest coming on the show. If you had your crystal ball out, you can pick it out anything, five years out that someone sitting in your shoes, not necessarily at VillageMD, but other, in this type of role. What do you think would be most different about their reality?

[00:28:56] DJ Schleen: That’s a deep question. I think the reality, if I could look into a crystal ball would be that they have a single view of the overall security and quality state of their applications as they deploy. So, the utopia for me is to get to a point where your system gets deployed, which has multiple microservices, UI, data, whatever it is, and have a holistic view of all the vulnerabilities or all the issues that can be around supply chain, code quality, infrastructure as code configuration, information, like all in one.

So, I think five years from now, I think we’re going to get to that. I think that’s where we’re things are starting to go or where the business is looking at. I want a number. I want to know what my risk is for this app. Because everything, again, it comes back to, you start looking at, “Gosh, I have some certification books here.” This certified information security manager. It’s great. It does talk about software. So, it’s fantastic. We can track like locks on doors and stuff like that. But there’s nothing in there about actual software. If we take the risk from there, we can start running through a risk calculation and really getting an idea of what our risk exposure is for our organization from a monetary perspective or non-monetary or legal perspective compliance. I think people won’t have a problem getting to that information.

[00:30:26] Guy Podjarny: I think that’s a great optimistic kind of a crystal ball.

[00:30:32] DJ Schleen: I’d love to see it.[00:30:32] Guy Podjarny: I’m with you. I relate to it. I hope it happens.

[00:30:37] DJ Schleen: I need to make sure it happens for us and talking with people in the industry that are building tools, and just seeing the amazing creativity that they come up with to solve some of these issues that we never even knew we had. I think that we’re just going to have safer software. I hope that the tables turn a little bit on the attackers where we’re not reactive anymore, that we’re more proactive to security and the results show.

[00:31:02] Guy Podjarny: Yeah. Definitely the only path to go really, the only way to kind of raise security. DJ, this has been great. If people want to bug you on the Internet, how can they find you?

[00:31:14] DJ Schleen: LinkedIn, DJ Schleen. Twitter, @DJSchleen. If you want to look at some cool code snippets, we got some interesting stuff coming for Synk in the future doing some pre commit hooks. DevOps Kung Fu, DK FM, we’re on github.com/DevOps-kung-fu. And you can check out some of our stuff there. But definitely feel free to reach out to me if anyone has any questions, or just to connect on LinkedIn, always willing to socialize and see what the industry is up to.

[00:31:45] Guy Podjarny: Excellent. Or maybe try to find a job in that expanding team.

[00:31:49] DJ Schleen: Oh, man. Yeah, I’m definitely looking for some, some people that really want to change the world, work hard, play hard, and just grow in a rapidly growing industry and the company that we’re in and really solve problems in a holistic vision over security, it’s just an attribute of quality.

[00:32:06] Guy Podjarny: Indeed. Cool. Thanks, DJ. Thanks for joining on. Thanks, everybody, for tuning in. I hope you join us for the next one.

[00:32:14] DJ Schleen: Appreciate your time. Thanks.

[END OF INTERVIEW]

Running and Expanding a DevOps Team

About this episode:

Tags:

Episode Transcript

About DJ Schleen

Up next

About The Secure Developer

Hosted by Guy Podjarny