“Bebugging” is a concept that originated in the 1970s. Also known as “fault injection” or “error seeding”, the premise is simple: knowingly introduce a bug or a vulnerability into a system to understand how your technology and team responds. Bebugging helps you find gaps in your security measures, detection tools, and incident response practices, and allows you to iterate and improve before a real threat or incident occurs.
Organizations face a challenge when allocating responsibility between centralized security and DevSecOps teams. For decades, security was seen as a separate responsibility, sitting within a separate team. With the rise of “you build it, you run it” services, DevSecOps became the trend for organizations enabling autonomy across the software development lifecycle. Unfortunately, it’s not as simple as saying, “security is everyone’s responsibility” or, “the security team will take care of it.” In reality, organizations need both approaches to ensure a comprehensive security posture at scale.
Bebugging promotes combining expertise from a central security team and DevSecOps practitioners to achieve something greater than either team could alone. Bebugging also encourages teams to address the question of which team bears responsibility to drive technical or process-oriented security improvements.
In general, it’s useful for a central security team to operate vulnerability management programs for finding, exploiting, and fixing vulnerabilities. An incident response team can provide global, 24/7 monitoring, without adding additional burden to on-call DevSecOps engineers. Advanced organizations may employ a red team to perform white hat hacking attempts to improve the company’s security posture. Centralizing these responsibilities helps develop a repertoire of context from across the engineering organization, without siloing it into distinct DevSecOps teams. This allows teams to learn from incidents across applications or services that they don’t own, and ensures common failure modes are preempted.
Disseminating knowledge requires maintaining a level of awareness and trust between the security team and members of the engineering organization. To facilitate trust, understanding, and knowledge transfer, it helps to iterate, practice, and improve incident and vulnerability management practices prior to a successful red team operation or public-facing incident. Bebugging provides practice with a real-world scenario, but in a controlled environment.
How to choose a bebugging target
Because bebugging requires a fair amount of dedicated time (though less than an actual incident), it’s important to select an appropriate target for maximum return-on-investment. To choose a bebugging target, consider the following criteria:
- Criticality of a system
Is it used across multiple applications and services? Does it allow root access or multi-system access if compromised? Can it bring down the core operations of the company?
- Likelihood of attack
Is this type of system widely-known to be targeted in our industry? Has a new exploit been exposed in the black hat community that affects this system?
- Ability to detect, prevent, and/or respond
Are the existing secure default libraries or monitoring in place, or were the security controls written from scratch? Have we recently tested incident response against this service or product?
Bebugging best practices
Once the team chooses a target, it’s time to start bebugging.
- Choose a bug to introduce
The best bugs mimic real-world hacks or top results from vulnerability management metrics. If you have an existing threat model, you can use one of those scenarios and attack vectors as the bug to introduce. A healthy dose of intuition around what could most realistically go wrong helps!
When identifying and introducing a bug, don’t get too fancy – keep the changes simple. Often this requires nothing more than commenting out a user-input validation function, or adding an or True to an authentication or authorization check.
- Set up protections so the bug doesn’t escape:
Intentionally introducing a security vulnerability or bypassing an existing security control feels extremely dangerous. It is wise to take precautions to ensure this code never deploys beyond a local or development test environment into a main branch. Adding a feature flag or environmental condition provides an extra layer of assurance.
- Facilitate learning
The most vital element of this practice is the opportunity to collaborate with the other team. With a purposefully-vulnerable test system in place, bring the dedicated security team together with the DevSecOps team in real-time. Allocate a few hours – this is going to require nuance, skill, and lots of screen sharing.
The DevSecOps team who owns the service demonstrates the newly-introduced bug to the security team, and the security team demonstrates reconnaissance and attack techniques to find out just how much damage a single vulnerability enables. This is the real gold – both teams learn how the other team thinks, which tools they use, and how they approach security! During the exercise, both teams observe metrics, logs, and system behavior to start tabulating indicators of compromise, malicious behavior, and potential security controls to stop it. These tabulated factors translate into alerts and mitigations to provide defense-in-depth; a resilient system survives even if a single security control fails.
There are, of course, much more sophisticated and automated mechanisms to perform bebugging or fault injection – such as randomly mutating code blocks and then ensuring sufficient security unit and integration tests exist to detect and mitigate these mutations. We find manual efforts provide a sufficient level of insight and resulting improvements that a fully automated effort would likely require too much overhead and investment for marginal further improvements.
Parsing the results
Save time at the end of the bebugging exercise to create follow-up tasks and tickets, or nominate a scribe to do so during the exercise. The DevSecOps team uses the identified defense-in-depth improvements and adds them as stories to a backlog. The central security team improves global monitoring and incident response. However, not all improvements will be so cut and dry. Prevention and detection capabilities may need to be split between central security teams and service owners.
How do you decide which items to assign to each team?
- Does this affect a single codebase or multiple codebases?
Some results help scale security across multiple products, tech stacks, or code bases. In large companies, improving a single service or product via bebugging represents a missed opportunity to impact the organization at scale. Instead, the central security team applies secure defaults to the libraries, platforms, and testing protocols to prevent, catch, and mitigate vulnerabilities before they’re released to production.
- Can the protections eliminate an entire class of vulnerabilities?
Occasionally, a mitigation applies to many types of vulnerabilities beyond the single injected fault. For example, to address an entire array of information disclosure vulnerabilities, the central security team may set up global monitoring for “honey tokens”, which are values (like API tokens) that should never show up in production systems. The DevSecOps team introduces these values in test data, and if they show up in the logs or production, you know your security mechanisms have failed.
- Who knows the codebase best?
The teams who built the application or maintain the service know the codebase better than the central security team, so when possible, they should own implementing mitigations. For example, an engineer designing an input validation function knows exactly which types and schemas of data belong in a particular workflow. A security engineer may only be able to determine how to validate to prevent injection attacks in general, not provide a context-specific allow-list of legitimate values.
From bebugging to belonging
At this point, each team should have a greater understanding of their peers, processes, and technologies. DevSecOps engineers grow their security muscle through hands-on practice. Security teams learn real-world context and nuance that informs future requirements and reviews. You should find that these sorts of exercises are fun and popular; it can be difficult to find the time to conduct as many as people want!
We find that these exercises invaluable for driving security protections, but also establishing trust and empathy. A shared understanding and mutual respect saves precious moments in real incidents, and facilitates security ownership across an organization.