Embarrassingly easy private certificate management for VMs on AWS, GCP, and Azure

Mike Malone
Mike Malone
, 10 min read

Today’s release of step and step-ca (v0.11.0) adds support for cloud instance identity documents (IIDs), making it embarrassingly easy to get certificates to workloads running on public cloud virtual machines (VMs). If you want to use TLS to secure service-to-service communication in the cloud (and you should), this will make your life 100% easier.

Star step cli
Star step certificates

See the release announcement to find out what else is new in v0.11.0.

This post introduces IID-based authentication with step and step-ca, and notes some interesting architectural and security details. Here’s how it works once everything is set up:

Terminal window showing example of using step certificates with instance identity documents to get a certificate

All of the nitty-gritty details of key pair & CSR generation, and IID-based authentication, are handled for you. With three simple commands you get a certificate and private key for use with TLS:

$ step certificate inspect --short foo.crt
X.509v3 TLS Certificate (ECDSA P-256) [Serial: 4555...1939]
  Subject:     foo.local
  Issuer:      My Intermediate CA
  Provisioner: AWS IID Provisioner [ID: 8074...3263]
  Valid from:  2019-07-08T21:39:40Z
          to:  2019-07-09T21:39:40Z

Background & motivation

step-ca is a private certificate authority (CA) you can run yourself. step is a step-ca API client. Together, they let you automate internal certificate management.

More concretely:

  • step-ca is a server that issues certificates via a JSON/HTTPS API
  • step integrates with step-ca’s API to get, renew, and revoke certificates

They’re both open source. See our getting started guide to get them up and running.

Certificates are the only broadly interoperable option for defense-in-depth and for connecting system components across the public internet. You can use certificates and TLS to connect to databases, authenticate & encrypt requests inside a VPC, or connect securely across the public internet without a VPN.

Running a private CA reduces reliance on trusted third parties, lets you control certificate details, satisfy SLAs, control naming, and generally own your internal security.

But running a CA and managing certificates is hard. step and step-ca fill a tooling gap to make these tasks much easier. IID-authentication makes integrating these tools into cloud environments turnkey.

IID-based authentication

Instance identity documents (IIDs) are simply credentials that identify an instance’s name and owner. By presenting an IID in a request, a workload can prove that it’s running on a VM instance that you own.

The major clouds have different names for IIDs: AWS calls them instance identity documents, GCP calls them instance identity tokens, and Azure calls them access tokens. The “metadata” included in an IID also differs between clouds, along with many other implementation details. Abstractly, though, they’re all the same: signed bearer tokens that identify, minimally, the name and owner of a VM.

IIDs are returned from a metadata API exposed via a non-routable IP address (the link-local address 169.254.169.254). This magic is orchestrated by the hypervisor, which identifies the requesting VM and services IID requests locally. The upshot is: IIDs are very easy to get from within a VM via one unauthenticated HTTP request. Barring any security issues, they’re impossible to get from anywhere else.

For fun, let’s fetch an IID on GCP. Since GCP encodes IIDs as JWTs and makes their public keys available at a well-known endpoint we can easily verify and decode a GCP IID on the command line using step:

$ curl -s -H "Metadata-Flavor: Google" \
     'http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=step-cli&format=full' \
  | step crypto jwt verify \
     --jwks https://www.googleapis.com/oauth2/v3/certs \
     --aud step-cli \
     --iss https://accounts.google.com
{
  "header": {
    "alg": "RS256",
    "kid": "afde80eb1edf9f3bf4486dd877c34ba46afbba1f",
    "typ": "JWT"
  },
  "payload": {
    "aud": "step-cli",
    "azp": "117354011164720418655",
    "email": "259112794408-compute@developer.gserviceaccount.com",
    "email_verified": true,
    "exp": 1562780090,
    "google": {
      "compute_engine": {
        "instance_creation_timestamp": 1557864897,
        "instance_id": "5404647754959331152",
        "instance_name": "foo",
        "project_id": "step-instance-identity-test",
        "project_number": 259112794408,
        "zone": "us-west1-b"
      }
    },
    "iat": 1562776490,
    "iss": "https://accounts.google.com",
    "sub": "117354011164720418655"
  },
  "signature": "AV8vZiNjOJNkWhWp5oy9R_WgGu3-tePyM4pKyHoela2SMVyWfpq4fPlSUxSPdzmfCT_akrUXrw-mDq7eLByqDOp3A4sGEZn9bY4Vmt5h9QYMVIo_60LRtC7c7QoBFZp2u3tNrPaI8ZhoINgHCsTdbfGEUDnCA8aH1mygd8b3kUEXcMCHrgUayPEVSMih8OYfmHUdecyTt0qOw6Ima16lX1jmM6lSoj8VNFmee36qFn58qULchB89lqviv-E0VzS5NthlqaM2_ukYNtKac-MdQdIlE86a-2YtgyXo4OVCpb87Svf2Rw9VaFsCKt4wFlRsnz4B3rx3I2bM2mXsQZY38Q"
}

Note that, since IIDs are signed by your cloud provider, verification doesn’t require any additional API requests. So, in addition to being easy for clients, IID-based authentication is scalable, performant, and highly available.

In summary, by simply fetching an IID from the metadata API and presenting it in an HTTPS request header a workload can prove that it’s running on a VM under your control. That’s precisely how step and step-ca use IIDs.

Getting a certificate from step-ca using IID-based authentication

The basic architecture of step-ca’s IID-based authentication is pretty simple:

Step Certificates IID-Provisioner Architecture

To get a certificate from step-ca using an IID we need to:

  • Generate a key pair and a certificate signing request (CSR) with the workload’s name
  • Obtain an IID from the metadata API to authenticate to step-ca
  • Submit the CSR & IID to step-ca via HTTPS POST to obtain a certificate
  • Store the certificate and private key somewhere our workload can find it

While this is all standards-based and simple in theory, in practice there’s a lot of arcane implementation detail to get right. Luckily, the step command line utility works seamlessly with step-ca to do all of this for us.

To demonstrate, assume we have step-ca running on AWS with hostname ca.local (see getting started).

To enable IID-based authentication we’ll simply configure step-ca, adding an AWS-type provisioner.

Find your AWS account ID to restrict access to our VMs:

AWS Account ID

On the host running step-ca add an AWS provisioner to your configuration by running:

$ step ca provisioner add "AWS IID Provisioner" \
      --type AWS \
      --aws-account 123456789042

(Re)start step-ca to pick up this new configuration:

$ step-ca $(step path)/config/ca.json

With the step-ca server configured and running, let’s use step ca bootstrap to configure a new VM instance to trust and connect to it:

$ step ca bootstrap \
       --ca-url https://ca.local \
       --fingerprint f501ed49263c1369bd490a85660ddd4388d4175e0337100a11d4e82eae496499
The root certificate has been saved in ~/.step/certs/root_ca.crt.
Your configuration has been saved in ~/.step/config/defaults.json.

The --fingerprint is for the root certificate used by step-ca. Find it by running step certificate fingerprint $(step path)/certs/root_ca.crt on your CA.

After bootstrapping we’re ready to get a certificate. If we pass our AWS IID provisioner name to step ca certificate (via --provisioner), step will automatically use IID-based authentication to get a certificate:

$ step ca certificate foo.local foo.crt foo.key \
       --provisioner "AWS IID Provisioner" 
✔ Key ID: AWS IID Provisioner (AWS)
✔ CA: https://ca.local
✔ Certificate: foo.crt
✔ Private Key: foo.key

The first positional argument to step ca certificate specifies our workload’s name – the certificate subject – in this case, foo.local. The next two positional arguments specify which files to write the certificate and private key, respectively.

We can use step certificate inspect to check our work:

$ step certificate inspect --short foo.crt
X.509v3 TLS Certificate (ECDSA P-256) [Serial: 4555...1939]
  Subject:     foo.local
               ip-172-31-65-180.us-east-1.compute.internal
               172.31.65.180
  Issuer:      My Intermediate CA
  Provisioner: AWS IID Provisioner [ID: 8074...3263]
  Valid from:  2019-07-08T21:39:40Z
          to:  2019-07-09T21:39:40Z

Congratulations, that’s it. Tell your workload to use foo.crt and foo.key, configure clients to trust certificates signed by $(step path)/certs/root_ca.crt, and you’re good to go. You now have a strong standards-based mechanism to authenticate workloads and encrypt traffic using TLS.

Example configurations for GCP and Azure are available in the step certificates docs. Instead of account IDs, the GCP IID implementation restricts access by project and/or service account. Azure restricts access by tenant. The step CLI abstracts away the remaining differences.

Depending on your situation and religion, you might toss these commands into a startup script, bake them into an AMI, or use configuration management for the last mile of automation. Speaking of religion: if you’re using kubernetes then IID-based authentication isn’t right for you. Use autocert or our cert-manager integration instead.

A few more features

Subject Names

In the example above we got a certificate identifying our workload using the logical service name foo.local. Standard HTTPS implementations require this name to appear in the authority portion of a URL. That is, this certificate will only work if we connect to the service using a URL like https://foo.local/some/path.

The name foo.local doesn’t appear anywhere in the IID. There’s no check to ensure that this particular VM should be allowed to obtain a certificate with this particular name. In fact, by default, step-ca will allow your VMs to obtain a certificate binding any name. This is a pretty broad privilege – a VM that’s supposed to run bar.local could maliciously impersonate foo.local – but this isn’t a problem if you trust all of the processes and users that have access to your VMs.

If you’d rather, you can add "disableCustomSANs": true to your provisioner configuration to lock down step-ca so that it will only issue certificates binding a cloud-specific instance identifier (e.g., i-05e360db6e50b5eda), internal hostname (e.g., ip-172-31-65-180.us-east-1.compute.internal), and internal IP address (e.g., 172.31.65.180):

ubuntu@ip-172-31-65-180:~$ NAME=$(curl -s http://169.254.169.254/1.0/meta-data/instance-id)
ubuntu@ip-172-31-65-180:~$ step ca certificate $NAME vm.crt vm.key \
  --provisioner "AWS IID Provisioner" 
✔ Key ID: AWS IID Provisioner (AWS)
✔ CA: https://ca.local
✔ Certificate: vm.crt
✔ Private Key: vm.key

Here, we’ve passed the instance ID (i.e., i-05e360db6e50b5eda) to step ca certificate as the certificate’s subject (its common name), which we obtain via introspection from the metadata API. With custom subject alternative names (SANs) disabled, step-ca automatically extracts the instance’s internal hostname and IP from the IID and includes them as SANs in the certificate:

ubuntu@ip-172-31-65-180:~$ step certificate inspect --short vm.crt
X.509v3 TLS Certificate (ECDSA P-256) [Serial: 9391...0237]
  Subject:     i-05e360db6e50b5eda
               ip-172-31-65-180.us-east-1.compute.internal
               172.31.65.180
  Issuer:      My Intermediate CA
  Provisioner: AWS IID Provisioner [ID: 8074...3263]
  Valid from:  2019-07-08T19:00:48Z
          to:  2019-07-09T19:00:48Z

With custom SANs disabled for an IID provisioner this is the only sort of certificate step-ca will sign. That is, step-ca will only sign certificates binding your VM’s instance identifier, host name, and IP address. This is much more restrictive, and will require that you connect to workloads using URLs like https://ip-172-31-65-180.us-east-1.compute.internal or https://172.31.65.180.

If you’re interested in more sophisticated enrollment and access control please open an issue or let us know on gitter so we can understand your requirements.

Multiple Certificates

If you try to get two certificates on the same VM you’ll find that the CA returns Unauthorized:

$ step ca certificate foo.local foo.crt foo.key
✔ Key ID: AWS IID Provisioner (AWS)
✔ CA: https://ca.local
✔ Certificate: foo.crt
✔ Private Key: foo.key

$ step ca certificate bar.local bar.crt bar.key
✔ Key ID: AWS IID Provisioner (AWS)
✔ CA: https://ca.local
Unauthorized

This is another security feature. Since any user or process on a VM can obtain an IID, they can also obtain a certificate. To prevent malicious certificate requests the CA will only issue one certificate per VM by default. If you obtain a genuine certificate early in your VM’s life cycle (e.g., in a startup script) any subsequent requests will fail. So this feature mitigates some threats.

Unfortunately, this also makes it impossible to use IIDs to issue different certificates to two or more workloads running on the same VM. To turn this feature off, set "disableTrustOnFirstUse": true in your provisioner configuration and restart step-ca.

Renewal & Revocation

You can renew and revoke a certificate that was issued using IID-based authentication the same way you would any other certificate issued by step-ca, with step ca renew and step ca revoke:

$ step ca renew --force foo.crt foo.key
Your certificate has been saved in foo.crt.

$ step ca revoke --cert foo.crt --key foo.key
Certificate with Serial Number 4555...1939 has been revoked.

$ step ca renew --force foo.crt foo.key
error renewing certificate: Unauthorized

This handy incantation of step ca renew can address a couple obvious operational considerations, too:

$ step ca renew --daemon --exec "kill -HUP $SOME_PID" foo.crt foo.key
INFO: 2019/07/18 14:22:18 first renewal in 15h50m43s

Here we’ve told step to stay running, renewing your certificate periodically, and notifying $SOME_PID after each renewal by sending it a SIGHUP. If the CA goes down for a period of time and a renewal fails, step ca renew will simply try again.

For more thoughts and best practices around renewal & revocation see our blog post on passive revocation.

Summary

So there you have it. IID-based authentication makes it easier than ever to get certificates to cloud VMs. All you need to do is spin up an instance running ‘step-ca’ and add a few shell commands to your CD pipeline. The hard stuff is handled for you by your cloud provider, step, and step-ca. You can get up and running in an afternoon.

Star step cli
Star step certificates

Once you have certificates issued it’s easy to use TLS (or mutual TLS) to authenticate workloads, for defense-in-depth, and to connect across clouds without a VPN. This is a powerful capability that every distributed system should have. Your only excuse – that it’s “too hard” – is no longer true. Automated certificate management with step and step-ca is free, open, and embarrassingly easy. Give it a try and let us know how it goes!