If there is one thing I’ve learned in my career, it’s that IAM and RBAC have more in common with dabbling in the Dark Arts than engineering.

Kubernetes is no exception.

“Just grant admin!” are the tempting words of the the devil himself — a Faustian bargain that trades short-term gratification for future time spent in the seventh level of hell: security remediation.

I’m writing this so that you might be saved from that fate worse than death.

Simple Requirements

My requirement is simple. I want to be able to run some code inside .a pod that monitors the cluster. This means that I need a pod that:

  1. Can access the API control plane
  2. Has read-only access to all resources in the cluster. That includes resources in the pod’s namespace, all other namespaces, and access to non-namespace cluster resources.

Easy right?

Actually, yes. But not without some dabbling in the dark arts.

Control Plane Access

There are a few things to understand about accessing the control plane from within the cluster:

  1. Kubernetes provides the endpoint to the control plane as a service that is accessible from within a pod.
  2. Pods run as a service account named default unless you specify otherwise.
  3. The service account named default will not have any permissions.

So what does this mean?

It’s easiest to demonstrate.

Create a Test Pod

Let’s create a dummy pod that we’ll use for the example:

$ cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
    containers:
    – name: test
       image: alpine
       args:
       – sleep
       – “1000000”
EOF

We can confirm that it has started:

$ kubectl get pod test

NAME    READY     STATUS        RESTARTS       AGE
test           1/1             Running       0                         1m

Cluster-Internal API Endpoint

Let’s take a look at the environment variables that are set in our pod’s container. We only want to look at a subset:

$ kubectl exec test env | grep KUBERNETES_SERVICE

KUBERNETES_SERVICE_HOST=172.20.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443

We can use these env vars to access the API endpoint. However the alpine image doesn’t have curl installed, so let’s install that.

$ kubectl exec test apk add curl

Now that we have curl, let’s GET the API Server endpoint:

$ kubectl exec test -- curl -sk https://<IP>:<PORT>
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {
},
  "code": 403
}

This tells us that our pod can communicate with the API Server, but curl doesn’t supply any credentials, so the API Server rejects the request.

Let’s install kubectl so that we can send an authenticated request easily:

$ kubectl exec test -- curl -o /bin/kubectl
  https://storage.googleapis.com/kubernetes-
release/release/v1.12.0/bin/linux/amd64/kubectl
$ kubectl exec test -- chmod +x /bin/kubectl

Now we have kubectl available. We can try to list the pods in the current namespace:

$ kubectl exec test -- kubectl get pods
Error from server (Forbidden): pods is forbidden: User
"system:serviceaccount:default:default"
cannot list pods in the namespace "default"

Note that while this operation fails, we can see that the API server has identified us as system:serviceaccount:default:default, which is different from the error message we got with the anonymous curl request above.

We can confirm this by looking at the serviceAccount our pod is using:

kubectl get pod test -o yaml | grep serviceAccount
   serviceAccount: default
   serviceAccountName: default

Create Service Account

Let’s create a new service account named test-sa. We will run our pods under this service account.

$ kubectl create sa test-sa
serviceaccount/test-sa created

Use Service Account

Kubernetes doesn’t allow us to change the service account of a running pod. So let’s delete the pod we just created:

$ kubectl delete pod test

Now we’ll re-create the pod using the test-sa service account that we just created.

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
serviceAccountName: test-sa
containers:
- name: test
image: alpine
args:
- sleep
- "1000000"
EOF

Now we can verify that the pod is running under the test-sa service account:

$ kubectl get pod test -o yaml | grep serviceAccount
   serviceAccount: test-sa
   serviceAccountName: test-sa

Let’s re-install curl:

$ kubectl exec test apk add curl

And kubectl:

$ kubectl exec test -- curl -o /bin/kubectl 
  https://storage.googleapis.com/kubernetes-
  release/release/v1.12.0/bin/linux/amd64/kubectl
$ kubectl exec test -- chmod +x /bin/kubectl

If we run kubectl in the new pod, we can see that it is now running as test-sa. However kubernetes will still reject the request.

$ kubectl exec test kubectl get pods
Error from server (Forbidden):
pods is forbidden: User "system:serviceaccount:default:test-sa"
cannot list pods in the namespace "default"
command terminated with exit code 1

To fix this, we need to create a ClusterRole with the permissions we need and bind it to our service account.

Role vs ClusterRole

It’s important to understand the difference between a Role and a ClusterRole.

A Role is a namespace-scoped object that applies only to a given namespace. If you create a Role in the namespace foo that has permissions to list pods, it will be able to list pods in the foo namespace, but not in any other namespace.

This great for most application services, but in our case we want our pod to be able to see all objects in all namespaces and all cluster-resources as well.

We need to use a ClusterRole to accomplish this.

Create ClusterRole

Let’s create aClusterRole named test-read-only.

$ cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
name: test-read-only
namespace: default
rules:
- apiGroups:
- ""
resources: ["*"]
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources: ["*"]
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources: ["*"]
verbs:
- get
- list
- watch
EOF

Now we can see that this role has been created:

$ kubectl get ClusterRole test-read-only
NAME AGE
test-read-only 2m46s

Bind ClusterRole to ServiceAccount

The test-read-only role that we just created isn’t yet attached to any users or service accounts. We need to attach it to our service-account.

$ cat <<EOF | kubectl apply -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: test-binding
subjects:
- kind: ServiceAccount
name: test-sa
namespace: default
roleRef:
kind: ClusterRole
name: test-read-only
apiGroup: rbac.authorization.k8s.io
EOF

Try It!

Now we can try running kubectl inside the Pod’s container:

$ kubectl exec test -- kubectl get pods
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 27m

Let’s verify that we can see pods in other namespaces:

$ kubectl exec test -- kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
aws-node-pkbgm 1/1 Running 0 21d
kube-dns-7cc87d595-l855w 3/3 Running 0 21d
kube-proxy-99lsh 1/1 Running 0 21d

And let’s verify that we can see cluster-resources that do not live in any namespace:

$ kubectl exec test -- kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-200-3-140.us-west-2.compute.internal Ready <none> 21d v1.10.3

That’s great, but how did this work?

Service Account Credentials

Let’s first see how credentials to access the control plane were provided to the Pod’s container.

By convention, a volume is mounted at /run/secrets/kubernetes.io/serviceaccount

We can verify this:

$ kubectl exec test -- mount | grep secrets
tmpfs on /run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,relatime)

Let’s look at what’s there:

$ kubectl exec test -- ls -al 
/run/secrets/kubernetes.io/serviceaccount
total 0
drwxrwxrwt 3 root root 140 Nov 12 02:56 .
drwxr-xr-x 3 root root 28 Nov 12 02:56 ..
drwxr-xr-x 2 root root 100 Nov 12 02:56 ..2018_11_12_02_56_01.670959907
lrwxrwxrwx 1 root root 31 Nov 12 02:56 ..data -> ..2018_11_12_02_56_01.670959907
lrwxrwxrwx 1 root root 13 Nov 12 02:56 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 16 Nov 12 02:56 namespace -> ..data/namespace
lrwxrwxrwx 1 root root 12 Nov 12 02:56 token -> ..data/token

If you pick apart the token file, you can see that it contains a couple of JSON strings that are base64-encoded and concatenated together. Here is the first:

$ kubectl exec test -- cat 
/run/secrets/kubernetes.io/serviceaccount/token | awk -F.
'{ print $1 }' | base64 -D - | jq
{
"alg": "RS256",
"kid": ""
}

And the second:

$ kubectl exec test -- cat /run/secrets/kubernetes.io/serviceaccount/token | 
awk -F. '{ print $2 }' | base64 -D - | cat - <(echo '}') | jq
{
"iss": "kubernetes/serviceaccount",
"kubernetes.io/serviceaccount/namespace": "default",
"kubernetes.io/serviceaccount/secret.name": "test-sa-token-56rht",
"kubernetes.io/serviceaccount/service-account.name": "test-sa",
"kubernetes.io/serviceaccount/service-account.uid": "58e12c64-e625-
11e8-ba42-060fea22bae6",
"sub": "system:serviceaccount:default:test-sa"
}

I have no idea why the JSON in the second portion is missing the trailing }.

In any case, you can see what kubectl is doing internally. It is just following standard conventions:

  1. Using KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to locate the API Server endpoint.
  2. Using the contents of /run/secrets/kubernetes.ip/serviceaccount to provide authentication credentials to the API Server.

Mystery be gone.

Back to the ClusterRole

Now back the the ClusterRole:

$ kubectl get clusterrole test-read-only -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: 2018-11-12T02:00:31Z
name: test-read-only
resourceVersion: "2508274"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/test-read-only
uid: bc27992c-e61e-11e8-a908-0ade80478eb0
rules:
- apiGroups:
- ""
resources:
- '*'
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- '*'
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- '*'
verbs:
- get
- list
- watch

This is the part that starts to dabble in the Dark Arts.

There are three dimensions in play here:

  1. API Group ("", extensions, and apps)
  2. Resource ( we use '*’ to make sure we match everything)
  3. Verbs (get, list, and watch are the read-only verbs)

Using the empty string to refer to the default API group could hardly be more confusing but there it is.

Conclusion

This is all straightforward and reasonably well designed.

To review, we need to:

  1. Create a ClusterRole that defines the RBAC permissions that we need
  2. Create a ServiceAccount that the pods will use
  3. Create a ClusterRoleBinding that attaches the ClusterRole to the ServiceAccount
  4. Specify the serviceAccountName attribute on the pod spec to reference the ServiceAccount.

That’s it.

Hopefully this article demystifies some of the magic. Ironically, it probably took me longer to write this article than to figure all this out. But hopefully this makes it a little bit easier for you.

P.S. When I say “you”, I really mean “me” because I’ll probably be referring to my own blog post soon enough.