In the latest installment of our Kubernetes series we will continue our journey with some Windows ASP.NET containerisation and deploy them onto Windows K8S worker nodes. Is it going to be viable though?

Windows K8S orchestra

The problem

We still have some services running on old Windows virtual machines. These are .NET apps written in C#. We are using blue-green deployments to release new versions which double up our resources reserved by these applications. It’s described well by Martin Fowler if you would like to read more about it.

Wouldn’t it be great to move them off from those VMs and run them in Kubernetes? This way we would be able to free up some resources and only use the required amount for actual demand. We could also utilize autoscaling, it would be much more straightforward to route to these apps within K8S instead of going out to our load balancer again. Easier chaos testing, fault injection, etc. So there are numerous benefits to this!

Dockerizing ASP .Net Applications

We will be building a docker image from an ASP.NET application. We will be using the FMP project as an example.

All steps are done on a Windows machine.

Installing Docker

First we install Docker for windows from https://hub.docker.com/editions/community/docker-ce-desktop-windows/ . Once installed, click on the Docker Desktop on the taskbar, and select Switch to Windows Containers. This might need a restart.

The Dockerfile

First we’ll need to build our .NET app, as usual, either from Visual Studio or the CLI. We can use the output folder to build our Docker image.

Comments explain why we did each step. For reference, we are running this on Windows 10 Pro Version 1909 (Build 18363.836).

# It's an ASP.NET application so we are using the microsoft/aspnet image
FROM mcr.microsoft.com/dotnet/framework/aspnet:4.8-20200512-windowsservercore-ltsc2019

# Downloading url rewrite as it's required by FMP
ADD http://download.microsoft.com/download/D/D/E/DDE57C26-C62C-4C59-A1BB-31D58B36ADA2/rewrite_amd64_en-US.msi c:/inetpub/rewrite_amd64_en-US.msi

# Uninstalling Web-Stat-Compression as it's not on the list of IIS features for FMP
RUN powershell "Uninstall-WindowsFeature Web-Stat-Compression"

# Enabling Windows Update (wuauserv) to support .Net 3.5 installation and then installing .Net 3.5 (NET-Framework-Features)
RUN powershell "Set-Service -Name wuauserv -StartupType Manual; Install-WindowsFeature -Name NET-Framework-Features -Verbose"

# Installing url rewrite
RUN powershell -Command Start-Process c:/inetpub/rewrite_amd64_en-US.msi -ArgumentList "/qn" -Wait

# Setting powershell as the shell to avoid parsing errors on the website names and bindings
SHELL ["powershell"]

# Removing the Default Web Site from IIS as it's not required and it's bound on port 80 which we will need for our website
RUN Remove-Website -Name 'Default Web Site'

# Copying our website to C:/websites/FMP, the folder location was copied from the build output
COPY ./build /websites/FMP

# Adding our website to IIS, setting its Physical Path and binding it to port 80
RUN New-IISSite -Name "FMP" -BindingInformation "*:80:" -PhysicalPath c:\websites\FMP

As you can see there are some interesting steps there. We had to get these right in order to get our application up and running.

Building and pushing the docker image after getting the Dockerfile right was fairly straightforward on Windows.

The image size is really large though: 3.69GB. This wouldn’t be practical to run with, since we’re keeping a few tags around.

We have found a blog post where Microsoft is talking about 40% image size reductions. https://devblogs.microsoft.com/dotnet/we-made-windows-server-core-container-images-40-smaller/ . Unfortunately, it’s only available in insider builds which are not compatible with our Kubernetes Windows node versions.

Adding Windows worker nodes to Linux based Kubernetes Clusters

Now that we have a docker image we can use, it’s time to add some Windows worker nodes to our Kubernetes cluster.

Prerequisites

  • An existing Kubernetes cluster with a Linux master node the worker node can communicate with. We were working with Kubernetes version 1.19.1 at the time.
  • A VM with Windows Server 2019 with the latest cumulative updates, because in order to have network overlay enabled we need KB4489899 and KB4497934. Our Windows Server 2019 Build Number is 1809.
  • Enough fixed space on the Windows worker node. 50GB at least, ideally 100GB. Make sure it’s a fixed disk and not dynamically allocated space.
  • WinOverlay Kubernetes feature gate enabled on all of the master nodes by editing the /etc/kubernetes/manifests/kube-apiserver.yaml and appending - --feature-gates=WinOverlay=true to the argument list.

Preparing the master node for Windows workers

This guide is based on the Kubernetes Windows Guide recommendations.

To add the first Windows worker to the cluster, we’ll have to prepare our existing Kubernetes cluster to deal with the connection. They are as follows:

  1. Edit the kube-flannel ConfigMap with:

     kubectl edit cm -n kube-system kube-flannel-cfg
    

    Search for the net-conf.json and add 2 new fields to the Backend section. This will look like:

     "Backend": {
         "Type": "vxlan",
         "VNI" : 4096,
         "Port": 4789
       }
    
  2. Don’t save the ConfigMap just yet, find the cni-conf.json section instead and change its name to vxlan0 from cbr0

  3. Get the raw version of a node-selector patch with wget from https://raw.githubusercontent.com/microsoft/SDN/1d5c055bb195fecba07ad094d2d7c18c188f9d2d/Kubernetes/flannel/l2bridge/manifests/node-selector-patch.yml and execute

     kubectl patch ds/kube-flannel-ds-amd64 --patch "$(cat node-selector-patch.yml)" -n=kube-system
    
  4. Add Windows Flannel and kube-proxy DaemonSets

    1. Get the Kubernetes version you are running with the following command
           kubectl get nodes
      
    2. Check if your Ethernet name is Ethernet by going to the Windows node, access the Control Panel and then Network and Sharing Center. Once there, go to Change adapter settings. In case it’s not Ethernet, please refer to the notes of the section Add Windows Flannel and kube-proxy DaemonSets on the Kubernetes Windows Guide.

    3. Replace KUBERNETES_VERSION on the line below with your Kubernetes version and run the commands
         curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/VERSION/v<KUBERNETES_VERSION>/g' | kubectl apply -f -
         kubectl apply -f https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/flannel-overlay.yml
    

Adding a new Windows worker node to the cluster

Again, we’ll have to prepare this Windows machine to join the cluster.

  1. Enable Containers on Windows

     Get-WindowsOptionalFeature -Online -FeatureName Containers            
     Enable-WindowsOptionalFeature -Online -FeatureName Containers
     Restart-Computer -Force
    
  2. Now install a specific version of Docker (18.09.11). This is because when we installed a newer version of Docker (19.*.*) we had problems with kubelet pausing frequently. We also saw some PLEG errors saying something like PLEG is not healthy: pleg was last seen active 3m7.629592089s ago when describing the node with kubectl and checking the kubelet status there. It was resolved by using this specific version of Docker.

     Install-Module -Name DockerMsftProvider -Repository PSGallery -Force
     Install-Package -Name Docker -ProviderName DockerMsftProvider -RequiredVersion 18.09.11
    
  3. Restart the Windows VM

  4. Install wins, kubelet, and kubeadm by pulling and running the script from the C:\ directory proving the target Kubernetes version to the script
     curl.exe -LO https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/PrepareNode.ps1
     .\PrepareNode.ps1 -KubernetesVersion v<KUBERNETES_VERSION>
    
  5. Join the cluster

    1. Go to the control plane host and get the command to join the cluster
       root@control-plane kubeadm token create --print-join-command
      
    2. Execute the command from the previous step on the Windows node you are adding

    Note: It may be necessary to clear down the C:\etc\kubernetes directory before running the above command on the Windows node to enable the script to complete.

  6. Check the cluster status
    1. See if the node is ready
       ssh root@control-plane kubectl get nodes -o wide
      
    2. In case it’s not ready, check if flannel is running, mind you, it might take a while to download the image.
       kubectl -n kube-system get pods -l app=flannel
      
    3. If after downloading the image it is still not ready, please refer to the Kubernetes Troubleshooting guide.
  7. Since this is a Windows worker node, we don’t want to schedule certain pods on it. This includes the Prometheus, metrics and monitoring pods. To disable scheduling Linux docker containers on the worker nodes, we’ll need to add a taint:
    kubectl taint nodes <node> OS=Windows:NoExecute
    

    This will only allow deployments with the “OS=Windows” toleration on them.

For the flannel networking to work correctly, we’ll need to edit the kube-flannel-ds-windows-amd64 DaemonSet in the kube-system namespace and add a new toleration in (there will be a tolerations block already):

- key: "OS"
  operator: "Equal"
  value: "Windows"
  effect: "NoExecute"

There are other ways to stop deploying to specific nodes, but we’re using this approach for now.

Should you notice any issues with the networking, you can use the Windows Kubernetes networking troubleshooting guide to debug them.

Deploy with our docker image to the newly added Windows worker node

Our last step is to deploy the .NET application into Kubernetes!

These are our Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fmp
    environment: playground
  name: fmp
  namespace: playground
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fmp
      environment: playground
      release: fmp-playground
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: fmp
        environment: playground
        release: fmp-playground
    spec:
      containers:
        image: <our_private_docker_repository>/findmypast/fmp:v1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        name: fmp
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        resources:
          limits:
            memory: 1000Mi
          requests:
            cpu: 50m
            memory: 200Mi
      dnsPolicy: ClusterFirst
      hostname: kube-fmp-01
      nodeSelector:
        kubernetes.io/os: windows
      restartPolicy: Always
      tolerations:
      - effect: NoExecute
        key: OS
        operator: Equal
        value: Windows

And Service config objects:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: fmp
    environment: playground
    monitoring: prometheus
  name: fmp
  namespace: playground
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: fmp
    environment: playground
  type: ClusterIP

kubectl apply -f deployment.yaml service.yaml and we can watch our image getting pulled and pods should hopefully end up in a Running state.

Troubleshooting

Flannel Pod Enters ‘CrashLoopBackOff’ State

If the kube-flannel pod enters CrashLoopBackOff, it might be due to a missing podCIDR configuration. Check the logs to see if an error similar to the following is present:

E0302 11:39:19.978950       1 main.go:289] Error registering network: failed to acquire lease: node "<node>" pod cidr not assigned

If you see an error similar to this, check whether the podCIDR has been set using the following:

$ kubectl get node <node> -o jsonpath='{.spec.podCIDR}'

If the podCIDR has not been set, you need to get a list of podCIDRs currently configured in the target cluster by running the following (where the second line is example output):

$ kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
10.244.0.0/24 10.244.3.0/24 10.244.2.0/24 10.244.1.0/24

Finally, you should patch the affected node to assign a podCIDR not in the outputted list, for example (where the second line is the output from the command):

$ kubectl patch node <node> -p '{"spec":{"podCIDR":"10.244.4.0/24"}}'
node/<node> patched
  1. You’ll then need to verify that the windows pods are running in the kube-system namespace. If not yet, give them a restart.

Can’t route to services / kube-proxy or flannel pods are restarting/not running

You will need to patch the file, then restart the node for the networking to be correct.

Summary

So, the initial question remains… Is this going to be viable? We can see how circumstantial it is to set up Windows worker nodes and deploy an ASP.NET app into Kubernetes. It is possible, but one would not say it is easy.

After getting the application running in K8S, pods were really unstable. We were getting odd container restarts; health check timeouts and metrics were lacking as well. We decided not to go with this setup in our production systems for now.

Don’t give up hope just yet! There are interesting articles out there that talk about setting up monitoring for Windows nodes and pods: https://www.inovex.de/blog/kubernetes-on-windows-2-tools/ .

Windows K8S orchestra