SECADVENT DAY 14

The Misadventures of One Cloud Function

“It WORKS!!”

You know that sweet, sparkling feeling when something you’ve been debugging for days, finally works?

As a platform engineer who accidentally became the resident GCP expert, I’ve been working closely with our data engineers throughout the last year. I help with their architecture, define – or help them define – infrastructure-as-code, remind them to not test in production and to consider security. Speaking of which…

The more we go into cloud technologies, embrace microservices and venture into the land of serverless, the more data is floating around, ready to be used – or breached. For those of us whose workloads are entirely cloud-native, the data analytics landscape is bright and full of wonders: AWS Lake Formation and Redshift, GCP Dataflow and BigQuery, pockets of ML and NLP tools – everything to build a data platform suited just for your company.

And so much to secure.

Here is what a data flow architecture might look if you chose to build on GCP:

“But Natalie,” you say, “most of those services are public APIs, how do you make sure an intruder – or a malicious insider – can’t leak sensitive data from your Big Query datasets?”

VPC Service Controls

Security in the cloud has been, probably, the most heated argument that cloud providers have to prove. In this post I will focus on security in the context of GCP, and, in particular, VPC Service Controls. The main idea is to restrict access to your sensitive services to only the designated entities, and it remains the same for any tool or cloud provider.

VPC Service Controls is a set of tools which you can apply to restrict access to Google’s services within your GCP projects. You can think of it as a firewall on steroids: you define which projects you want to protect by putting them into a Service Perimeter; decide which APIs you want to restrict, and who can access those APIs from where.

When using a data platform similar to the one above, the list of APIs you want to protect would include:


["bigquery.googleapis.com",
"storage.googleapis.com",
"logging.googleapis.com",
"monitoring.googleapis.com",
"notebooks.googleapis.com",]

Restricting an API means that the resources that you create in your protected project will no longer be accessible from the Internet via the usual DNS address; those requests will instead be redirected to a dedicated domain – restricted.googleapis.com, which, in turn, points to a set of special IPs that Google manages.

You then define access controls, called Access Levels, where you will typically list the internal IPs, corporate device settings, or some special service accounts. Only requests that match at least one of those filters will reach the protected resources.

Can we use Cloud Functions now?

Of course! Serverless is all the craze these days, what’s your use case?

Let’s say we have an external supplier submitting reports to a cloud storage bucket. You can create a Cloud Function that will listen to a particular path in the bucket, and once a file is written, will read the data and write some transformed version of it to a BigQuery table.

You will need to run the function as a special service account you create for it, as well as protect the Cloud Functions API by adding it to the list of restricted APIs in the service perimeter.

Then, usually, services running in the same project should be able to communicate just fine…

ERROR: VPC Service Controls: Request is prohibited by organization's policy.

 

Ah, right, okay – perhaps we need to add the Compute Engine default service account to the Access Level; try again?

VPC Service Controls: Request is prohibited by organization's policy.

Hmm. RTFM! Google uses a special service called Cloud Build that will build your function in a separate, invisible, so-called “shadow” project, then make it available for execution in your actual project. GCP uses a lot of such shadow projects to do all sorts of magic. It makes a lot of managed services very easy to use, but also doesn’t help debugging. 

Anyways, let’s add the Cloud Build service account into the service perimeter’s access level.

VPC Service Controls: Request is prohibited by organization's policy.

 

Okay okay. Let’s see whether the logs will give us some clarity. Which principal is being passed with the request?

"authenticationInfo": {
"principalEmail": "special-service-account@your-project.iam.gserviceaccount.com",
"serviceAccountDelegationInfo": [
{
"firstPartyPrincipal": {
"principalEmail": "service-098765432345@gcf-admin-robot.iam.gserviceaccount.com"
}
}
]
},

Now THAT is confusing. The function is running as the designated service account, but there is also a firstPartyPrincipal, which is neither the Compute Engine, nor the Cloud Build account?!

The more we go into cloud technologies, embrace microservices and venture into the land of serverless, the more data is floating around, ready to be used – or breached.
TWEET THIS

The solution

Now that we have sufficient data to analyse how Cloud Functions actually call other GCP services, we know what we need to do:

  1. Restrict Cloud Functions API within the service perimeter
  2. Add Cloud Functions, Compute Engine, and Cloud Build robot service accounts to the access levels
  3. Add the runtime service account that we create for the function to the access levels

If you are building on AWS or Azure, your setup will be different. Remember to follow the principles of least privilege when configuring IAM, and use VPC endpoints for private connections in AWS.

Having to manage all those service account lists can become cumbersome. While the robot service accounts always follow the same naming pattern and can be predicted based on project numbers, the arbitrary custom service account that we create per function – cannot.

The whole reason for having a separate service account per function is security: the default Compute Engine SA gets the Editor role on the project, which is way too open.

The compromise here is to create a Cloud Functions SA dedicated to that GCP project, as opposed to a single Cloud Function. This simplifies the setup, makes this SA predictable – and automatable, yet you can restrict its access to the bare minimum.

Who says automate – says infrastructure as code! Here is how the above can be achieved with Terraform.

 

Package all the resources for setting up a project into a module

Pro Tip: Using modules makes your code reusable! 

module "your-project" {
  source = "../shiny-modules/gcp-project"
  name = "your-project"
  # we will base certain resources on this list of services 
  services = ["bigquery", "cloudfunctions"] 
}
GCP-project module
# Add a label if Cloud Functions are requested - we will use it later
locals {
  cloudfunctions = contains(var.services, "cloudfunctions") ? "enabled" : "disabled"
  labels = merge(var.labels, map("cloudfunctions", local.cloudfunctions))
}
resource "google_project" "project" {
  name   = var.name
  labels = local.labels
}
# Create a project-dedicated SA to use in Cloud Functions

# Make it contain the project number so that you can automate!

resource "google_service_account" "sa" {
  count      = contains(var.services, "cloudfunctions") ? 1 : 0
  account_id = "cf-runtime"
  display_name = "Cloud Functions runtime service account"
}
VPC Service Controls config – make it auto-deploy after the projects are deployed 😉 
Find out all project numbers where Cloud Functions are enabled
data "google_projects" "cf_enabled_prj" {
  filter = "labels.cloudfunctions=enabled lifecycleState:ACTIVE"
}

data "google_project" "cf_enabled_prj_numbers" {
  count      = length(data.google_projects.cf_enabled_prj.projects)
  project_id = lookup(data.google_projects.cf_enabled_prj.projects[count.index], "project_id")

}
Parse and compile a list of SAs you need to add to the access levels

locals = {
  cf_sa_list = formatlist("serviceAccount:service-%s@gcf-admin-robot.iam.gserviceaccount.com", data.google_project.cf_enabled_prj_numbers.*.number)
  cloudbuild_sa_list = formatlist("serviceAccount:%s@cloudbuild.gserviceaccount.com", data.google_project.cf_enabled_prj_numbers.*.number)
  cloudbuild_agent_sa_list = formatlist("serviceAccount:service-%s@gcp-sa-cloudbuild.iam.gserviceaccount.com", data.google_project.cf_enabled_prj_numbers.*.number)
  runtime_cf_sa_list  = formatlist("serviceAccount:cf-runtime@%s.iam.gserviceaccount.com", data.google_projects.cf_enabled_prj.*.project_id) 
}

 

And add them to an access level corresponding to your service perimeter
resource "google_access_context_manager_access_level" "cloud_functions_access_level" {
  ...
  basic {
    conditions {
      members = [concat(local.cf_sa_list, local.cloudbuild_sa_list, local.cloudbuild_agent_sa_list, local.runtime_cf_sa_list)]
}}}

Well, now you can go off and use Cloud Functions securely! That was quite a ride. 

Oftentimes, when building something new we get to discover corners of the technology that are not very well described or documented. Bleeding edge, we like to call it? 

But the tools are there for you – use them, work with your teams, adapt to your workflows – and you will find just the right solution.

Related Posts

Privacy Preferences
When you visit our website, it may store information through your browser from specific services, usually in form of cookies. Here you can change your privacy preferences. Please note that blocking some types of cookies may impact your experience on our website and the services we offer.