March 19, 2019

Why is my kubelet listening on a random port? A closer look at CRI and the docker CRI shim

Today I learned something new about the kubelet. I was analyzing the default EKS AMI, with the goal of understanding how EKS deploys and configures the kubelet.

One of the things I was most interested in was understanding which ports the kubelet was opening. I used netstat to list open ports, and everything looked as expected: the kubelet was listening on the default ports 10248 and 10250. I was surprised, however, when I noticed that the kubelet was listening on a third port that I hadn’t seen before.

$ netstat -tupln | grep kubelet
tcp        0      0 127.0.0.1:35751         0.0.0.0:*               LISTEN      10107/kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      10107/kubelet
tcp6       0      0 :::10250                :::*                    LISTEN      10107/kubelet

Port number 35751 was not specified anywhere in the kubelet’s configuration, and Github/Google searches returned zilch. I suspected that the kubelet picked the port randomly, and I confirmed this was the case by restarting the kubelet and verifying that the port had changed.

I was intrigued. I started querying the endpoint to see if I could get any meaningful response that would help me figure out what it was. Everything I tried, however, resulted in a “Page Not Found” error.

$ curl localhost:35751/metrics
404: Page Not Found
$ curl localhost:35751/healthz
404: Page Not Found
$ curl localhost:35751/api
404: Page Not Found

This approach was taking me nowhere, so I decided to switch strategies. I looked through the kubelet’s configuration reference documentation but found no port flag that defaulted to “random.” I then used tcpdump to capture the traffic destined to that port, but nothing was coming through the port.

Finally, I started searching through the kubelet code to find all places that started an HTTP server. It took me a while, but I landed on a kubelet package that seemed promising: pkg/kubelet/server/streaming/server.go. After skimming through the code, I noticed that the server had handlers for /exec, /attach, and /portforward.

// pkg/kubelet/server/streaming/server.go:125 (git tag v1.11.8)
endpoints := []struct {
    path    string
    handler restful.RouteFunction
}{
    {"/exec/{token}", s.serveExec},
    {"/attach/{token}", s.serveAttach},
    {"/portforward/{token}", s.servePortForward},
}

Could this be it? Let’s find out. I started another tcpdump capture and used kubectl exec to execute a command in a pod running on the node. Aha! Traffic was coming through, and tcpdump was lighting up. The commands I was sending and the resulting output were being sent through the third kubelet port.

How it works

Once I discovered that the third port was handling the kubectl exec command, I wanted to understand the entire flow better. I used GitHub’s blame feature to find the pull request that introduced this code. It turned out that this was implemented as part of the Container Runtime Interface (CRI) effort.

After reading through pull requests, issues and design documents, I now have a better understanding of why the kubelet opens up this extra port. It all boils down to the way CRI supports the exec, attach and port-forward operations.

To support a variety of container runtimes, the kubelet offloads to the container runtime (or CRI shim) the implementation details of how to execute these operations against a given pod. The only thing the kubelet expects is for the runtime to provide the URL of a streaming server that is capable of completing the operations. Does the runtime standup a single server for all pods, or one server per pod? Is the server started on the node or within the pod (e.g., to support the “pods as VMs” use case)? All of these decisions are up to the runtime.

In the case of the docker CRI shim (built into the kubelet), the kubelet starts a streaming server as part of the kubelet’s initialization. This streaming server is the one listening on that third port we saw in the netstat output above.

Now, once the kubelet requests the URL of the streaming server from the runtime and receives the response, it needs to use that URL to complete the operation. There are two options as to how the kubelet achieves this. The first and default behavior is for the kubelet to proxy the request from the API server through to the streaming server and back. The advantage of this is that the streaming server can listen on localhost, and thus the request is secured using the authentication mechanisms that already exist between the API server and the kubelet.

The other approach is enabled by setting the --redirect-container-streaming kubelet configuration flag (undocumented, but found here) to true. In this case, the kubelet returns an HTTP redirect to the API server, instructing it to send the request to the streaming server’s URL. This approach takes the kubelet out of the flow, which can improve performance. The downside, however, is that there is no authentication in place between the API server and the streaming server which is accessible from outside the node (There is an open issue to address this limitation).

To recap how the entire process works, let’s walk through a kubectl exec flow:

The user sends a request to the API server using kubectl exec $pod_name $command.
The API server sends an “exec request” to the corresponding kubelet.
The kubelet sends the “exec request” to the CRI shim.
The CRI shim (docker CRI shim in this case) responds with the location of the streaming server (e.g. localhost:35751).
The kubelet serves as a proxy between the API server and the localhost streaming server.
The streaming server talks to the container runtime to execute the command in the container.

I’ve meant to learn more about CRI, and this was a great way to dig into how it works and how it handles kubectl exec requests. The next thing I want to do is look into how other CRI-compatible runtimes, such as containerd, implement this.