If there is one thing I’ve learned in my career, it’s that IAM and RBAC have more in common with dabbling in the Dark Arts than engineering.
Kubernetes is no exception.
“Just grant admin!” are the tempting words of the the devil himself — a Faustian bargain that trades short-term gratification for future time spent in the seventh level of hell: security remediation.
I’m writing this so that you might be saved from that fate worse than death.
Simple Requirements
My requirement is simple. I want to be able to run some code inside .a pod that monitors the cluster. This means that I need a pod that:
- Can access the API control plane
- Has read-only access to all resources in the cluster. That includes resources in the pod’s namespace, all other namespaces, and access to non-namespace cluster resources.
Easy right?
Actually, yes. But not without some dabbling in the dark arts.
Control Plane Access
There are a few things to understand about accessing the control plane from within the cluster:
- Kubernetes provides the endpoint to the control plane as a service that is accessible from within a pod.
- Pods run as a service account named
default
unless you specify otherwise. - The service account named
default
will not have any permissions.
So what does this mean?
It’s easiest to demonstrate.
Create a Test Pod
Let’s create a dummy pod that we’ll use for the example:
$ cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
– name: test
image: alpine
args:
– sleep
– “1000000”
EOF
We can confirm that it has started:
$ kubectl get pod test
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 1m
Cluster-Internal API Endpoint
Let’s take a look at the environment variables that are set in our pod’s container. We only want to look at a subset:
$ kubectl exec test env | grep KUBERNETES_SERVICE
KUBERNETES_SERVICE_HOST=172.20.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
We can use these env vars to access the API endpoint. However the alpine image doesn’t have curl
installed, so let’s install that.
$ kubectl exec test apk add curl
Now that we have curl, let’s GET the API Server endpoint:
$ kubectl exec test -- curl -sk https://<IP>:<PORT>
{ "kind": "Status", "apiVersion": "v1", "metadata": {
}, "status": "Failure", "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"", "reason": "Forbidden", "details": {
}, "code": 403 }
This tells us that our pod can communicate with the API Server, but curl doesn’t supply any credentials, so the API Server rejects the request.
Let’s install kubectl
so that we can send an authenticated request easily:
$ kubectl exec test -- curl -o /bin/kubectl https://storage.googleapis.com/kubernetes- release/release/v1.12.0/bin/linux/amd64/kubectl
$ kubectl exec test -- chmod +x /bin/kubectl
Now we have kubectl
available. We can try to list the pods in the current namespace:
$ kubectl exec test -- kubectl get pods
Error from server (Forbidden): pods is forbidden: User
"system:serviceaccount:default:default"
cannot list pods in the namespace "default"
Note that while this operation fails, we can see that the API server has identified us as system:serviceaccount:default:default
, which is different from the error message we got with the anonymous curl request above.
We can confirm this by looking at the serviceAccount our pod is using:
kubectl get pod test -o yaml | grep serviceAccount serviceAccount: default serviceAccountName: default
Create Service Account
Let’s create a new service account named test-sa
. We will run our pods under this service account.
$ kubectl create sa test-sa
serviceaccount/test-sa created
Use Service Account
Kubernetes doesn’t allow us to change the service account of a running pod. So let’s delete the pod we just created:
$ kubectl delete pod test
Now we’ll re-create the pod using the test-sa
service account that we just created.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
serviceAccountName: test-sa
containers:
- name: test
image: alpine
args:
- sleep
- "1000000"
EOF
Now we can verify that the pod is running under the test-sa
service account:
$ kubectl get pod test -o yaml | grep serviceAccount serviceAccount: test-sa serviceAccountName: test-sa
Let’s re-install curl
:
$ kubectl exec test apk add curl
And kubectl:
$ kubectl exec test -- curl -o /bin/kubectl https://storage.googleapis.com/kubernetes- release/release/v1.12.0/bin/linux/amd64/kubectl $ kubectl exec test -- chmod +x /bin/kubectl
If we run kubectl in the new pod, we can see that it is now running as test-sa
. However kubernetes will still reject the request.
$ kubectl exec test kubectl get pods
Error from server (Forbidden):
pods is forbidden: User "system:serviceaccount:default:test-sa"
cannot list pods in the namespace "default"
command terminated with exit code 1
To fix this, we need to create a ClusterRole
with the permissions we need and bind it to our service account.
Role vs ClusterRole
It’s important to understand the difference between a Role
and a ClusterRole
.
A Role
is a namespace-scoped object that applies only to a given namespace. If you create a Role in the namespace foo
that has permissions to list pods, it will be able to list pods in the foo
namespace, but not in any other namespace.
This great for most application services, but in our case we want our pod to be able to see all objects in all namespaces and all cluster-resources as well.
We need to use a ClusterRole
to accomplish this.
Create ClusterRole
Let’s create aClusterRole
named test-read-only
.
$ cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
name: test-read-only
namespace: default
rules:
- apiGroups:
- ""
resources: ["*"]
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources: ["*"]
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources: ["*"]
verbs:
- get
- list
- watch
EOF
Now we can see that this role has been created:
$ kubectl get ClusterRole test-read-only
NAME AGE
test-read-only 2m46s
Bind ClusterRole to ServiceAccount
The test-read-only
role that we just created isn’t yet attached to any users or service accounts. We need to attach it to our service-account.
$ cat <<EOF | kubectl apply -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: test-binding
subjects:
- kind: ServiceAccount
name: test-sa
namespace: default
roleRef:
kind: ClusterRole
name: test-read-only
apiGroup: rbac.authorization.k8s.io
EOF
Try It!
Now we can try running kubectl inside the Pod’s container:
$ kubectl exec test -- kubectl get pods
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 27m
Let’s verify that we can see pods in other namespaces:
$ kubectl exec test -- kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
aws-node-pkbgm 1/1 Running 0 21d
kube-dns-7cc87d595-l855w 3/3 Running 0 21d
kube-proxy-99lsh 1/1 Running 0 21d
And let’s verify that we can see cluster-resources that do not live in any namespace:
$ kubectl exec test -- kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-200-3-140.us-west-2.compute.internal Ready <none> 21d v1.10.3
That’s great, but how did this work?
Service Account Credentials
Let’s first see how credentials to access the control plane were provided to the Pod’s container.
By convention, a volume is mounted at /run/secrets/kubernetes.io/serviceaccount
We can verify this:
$ kubectl exec test -- mount | grep secrets
tmpfs on /run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,relatime)
Let’s look at what’s there:
$ kubectl exec test -- ls -al
/run/secrets/kubernetes.io/serviceaccount
total 0
drwxrwxrwt 3 root root 140 Nov 12 02:56 .
drwxr-xr-x 3 root root 28 Nov 12 02:56 ..
drwxr-xr-x 2 root root 100 Nov 12 02:56 ..2018_11_12_02_56_01.670959907
lrwxrwxrwx 1 root root 31 Nov 12 02:56 ..data -> ..2018_11_12_02_56_01.670959907
lrwxrwxrwx 1 root root 13 Nov 12 02:56 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 16 Nov 12 02:56 namespace -> ..data/namespace
lrwxrwxrwx 1 root root 12 Nov 12 02:56 token -> ..data/token
If you pick apart the token file, you can see that it contains a couple of JSON strings that are base64-encoded and concatenated together. Here is the first:
$ kubectl exec test -- cat
/run/secrets/kubernetes.io/serviceaccount/token | awk -F.
'{ print $1 }' | base64 -D - | jq
{
"alg": "RS256",
"kid": ""
}
And the second:
$ kubectl exec test -- cat /run/secrets/kubernetes.io/serviceaccount/token |
awk -F. '{ print $2 }' | base64 -D - | cat - <(echo '}') | jq
{
"iss": "kubernetes/serviceaccount",
"kubernetes.io/serviceaccount/namespace": "default",
"kubernetes.io/serviceaccount/secret.name": "test-sa-token-56rht",
"kubernetes.io/serviceaccount/service-account.name": "test-sa",
"kubernetes.io/serviceaccount/service-account.uid": "58e12c64-e625-
11e8-ba42-060fea22bae6",
"sub": "system:serviceaccount:default:test-sa"
}
I have no idea why the JSON in the second portion is missing the trailing }
.
In any case, you can see what kubectl
is doing internally. It is just following standard conventions:
- Using
KUBERNETES_SERVICE_HOST
andKUBERNETES_SERVICE_PORT
to locate the API Server endpoint. - Using the contents of
/run/secrets/kubernetes.ip/serviceaccount
to provide authentication credentials to the API Server.
Mystery be gone.
Back to the ClusterRole
Now back the the ClusterRole:
$ kubectl get clusterrole test-read-only -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: 2018-11-12T02:00:31Z
name: test-read-only
resourceVersion: "2508274"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/test-read-only
uid: bc27992c-e61e-11e8-a908-0ade80478eb0
rules:
- apiGroups:
- ""
resources:
- '*'
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- '*'
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- '*'
verbs:
- get
- list
- watch
This is the part that starts to dabble in the Dark Arts.
There are three dimensions in play here:
- API Group (
""
,extensions
, andapps)
- Resource ( we use
'*’
to make sure we match everything) - Verbs (
get
,list
, andwatch
are the read-only verbs)
Using the empty string to refer to the default API group could hardly be more confusing but there it is.
Conclusion
This is all straightforward and reasonably well designed.
To review, we need to:
- Create a
ClusterRole
that defines the RBAC permissions that we need - Create a
ServiceAccount
that the pods will use - Create a
ClusterRoleBinding
that attaches theClusterRole
to theServiceAccount
- Specify the
serviceAccountName
attribute on the pod spec to reference theServiceAccount
.
That’s it.
Hopefully this article demystifies some of the magic. Ironically, it probably took me longer to write this article than to figure all this out. But hopefully this makes it a little bit easier for you.
P.S. When I say “you”, I really mean “me” because I’ll probably be referring to my own blog post soon enough.