SecAdvent

How the API Graph Can Help Bridge the Security/Developer Gap – SecAdvent Day 16

December 16, 2020

Jean Yang

Introduction

When I was starting Akita, my team and I spent a lot of time talking with software developers and security engineers about how to improve data security in modern web applications.

Almost every time, it came up that a security team’s success depended on their ability to change developer behavior. And that communicating clearly about technical objectives was extremely helpful towards this.

This led to another question: especially given the popularity of “shifting left” in security, where can more automation help improve communication and align goals? This led Akita to become the API dev tools company we are today.

In this post, I’ll talk about what it would look like for security tooling to be more aligned with developer interests and workflows and why mapping out the API graph will get us there.

What developers want

What software engineers need from security teams makes a lot of sense, and is in fact what Snyk has been doing for open-source vulnerabilities, config files, and more:

Getting a prioritized list of issues. Both the security teams and software teams told us that instead of knowing all possible ways a system can go wrong, getting a smaller set of alerts that are high-priority makes it more likely that any fixes occur.
Getting an actionable list of issues. The easier it is for a developer to know why something is happening, for instance why an API is releasing access tokens when it shouldn’t be, the more likely it is that the developer will fix this issue.
Being informed of issues as soon as possible. The longer a bug is in production before being fixed, the more expensive it is to fix: the developer is more likely to have lost context of the original code; the fix is likely to impact the development schedule more; the fix is likely to involve additional actions like scraping logs.

It turns out, however, that it gets pretty hard to do this for data security issues including stray passwords and personally identifiable information (PII). We’ll get into this next—

Why it’s hard to give developers what they want

Let’s walk through a concrete example to illustrate why scanning code, configs, or the network is often not enough when it comes to data security.

Let’s say you’re working on an API that returns information about users. To comply with regulations, you omit user phone numbers from the response to prevent callers of your API from storing this information in a scattered fashion that makes it hard to process service deletion requests.

For example, your API might look like:

type User struct {
  ID string `json:”id”`

  // Don’t return phone number for regulation reasons!
  Phone string `json:”-”`
}

func main() {
  ...
  http.HandleFunc("/users/json", func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(200)
    enc := json.NewEncoder(w)
    enc.Encode(myUsers)

  })

  ...
}

One day, your colleague adds a new version of the endpoint that returns YAML instead of JSON because they want to introduce some competition in the data serialization market. A very reasonable implementation of the YAML endpoint produces a PR like this:

 http.HandleFunc("/users/yaml", func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/x-yaml")
    w.WriteHeader(200)
    enc := yaml.NewEncoder(w)
    defer enc.Close()
    enc.Encode(myUsers)
  })

A code reviewer without insider knowledge is very likely to gloss over the fact that yaml.Encode(…) does something different than json.Encode. However, when someone actually uses this new endpoint, they get now get users’ phone numbers in the response! 🙊

This post will talk about what it would look like for security tooling to be more aligned with developer interests and workflows and why mapping out the api graph will get us there.

There are a few reasons bugs like this one defy existing tooling:

Issues like these are hard to prioritize. Whether this change leads to a data leak depends on whether the phone numbers end up somewhere they are not supposed to go, like logs, API responses, or calls to third-party SaaS. This is hard to know without having the bigger picture of how different services are interacting.

Issues resulting from changes like these are often not actionable. The data leak that this change causes may not have to do with this service at all, making it difficult to identify the root cause using only tools that scan logs, APIs, and third-party calls.

Issues like these are hard to catch before running the code. It takes a careful reviewer to catch that a new string encoding could cause phone numbers to leak. Catching this issue using a linter or static analysis requires either having a rule matching on exactly the change we described, or to flag all changes to string marshalling. The former is unlikely; the latter would produce reports so noisy they would almost definitely get ignored.

As a result, data leaks like this one are often caught in production, long after they have been introduced, without context about what led to them.

How the api graph will help

At Akita, we believe that mapping the graph of API interactions is the key to improving communication and aligning goals across security and software engineering teams. The API graph will show 1) what endpoints are actually getting called, 2) what data is being sent to those endpoints, and 3) the bigger pictures of how endpoints are interdependent on each other. Being able to capture the API graph allows us not only to get visibility into a system at a particular point in time, but to track changes at fine granularity. Mapping the API graph allows us to identify potential data security issues with high precision by giving context to data flows and allowing us to use that context to detect changes over time.

When we first set out to build the API graph, we wanted something that was 1) blackbox, language-agnostic, and 2) integrated easily with existing systems. Over the last couple of years, we have figured out how to do this by watching API traffic, building dynamic models of API behaviors that we capture as API specs, and connecting the dots on how APIs across a system depend on each other. Check out our docs here.

HELP AKITA HELP YOU!

We believe that, in order to unlock the next level of automated tooling for data security, there need to be tools that give more visibility over the API graph. If you’re interested in helping us build this out, please try out our private beta!

Jean Yang

About Jean Yang

Jean Yang is the founder and CEO of Akita Software, an API dev tools company. Jean was previously an Assistant Professor of Computer Science at Carnegie Mellon University.

Tags:

API Security

Application Security

DevSecOps

SecAdvent

Security Culture

Replacing a VPN with OAuth2 – SecAdvent Day 25

What’s so great about Rootless Containers? – SecAdvent Day 24