In the latest installment of our Kubernetes series we will continue our journey with some Windows ASP.NET containerisation and deploy them onto Windows K8S worker nodes. Is it going to be viable though?
The problem
We still have some services running on old Windows virtual machines. These are .NET apps written in C#. We are using blue-green deployments to release new versions which double up our resources reserved by these applications. It’s described well by Martin Fowler if you would like to read more about it.
Wouldn’t it be great to move them off from those VMs and run them in Kubernetes? This way we would be able to free up some resources and only use the required amount for actual demand. We could also utilize autoscaling, it would be much more straightforward to route to these apps within K8S instead of going out to our load balancer again. Easier chaos testing, fault injection, etc. So there are numerous benefits to this!
Dockerizing ASP .Net Applications
We will be building a docker image from an ASP.NET application. We will be using the FMP project as an example.
All steps are done on a Windows machine.
Installing Docker
First we install Docker for windows from https://hub.docker.com/editions/community/docker-ce-desktop-windows/ .
Once installed, click on the Docker Desktop on the taskbar, and select Switch to Windows Containers
. This might need a restart.
The Dockerfile
First we’ll need to build our .NET app, as usual, either from Visual Studio or the CLI. We can use the output folder to build our Docker image.
Comments explain why we did each step. For reference, we are running this on Windows 10 Pro Version 1909 (Build 18363.836).
# It's an ASP.NET application so we are using the microsoft/aspnet image
FROM mcr.microsoft.com/dotnet/framework/aspnet:4.8-20200512-windowsservercore-ltsc2019
# Downloading url rewrite as it's required by FMP
ADD http://download.microsoft.com/download/D/D/E/DDE57C26-C62C-4C59-A1BB-31D58B36ADA2/rewrite_amd64_en-US.msi c:/inetpub/rewrite_amd64_en-US.msi
# Uninstalling Web-Stat-Compression as it's not on the list of IIS features for FMP
RUN powershell "Uninstall-WindowsFeature Web-Stat-Compression"
# Enabling Windows Update (wuauserv) to support .Net 3.5 installation and then installing .Net 3.5 (NET-Framework-Features)
RUN powershell "Set-Service -Name wuauserv -StartupType Manual; Install-WindowsFeature -Name NET-Framework-Features -Verbose"
# Installing url rewrite
RUN powershell -Command Start-Process c:/inetpub/rewrite_amd64_en-US.msi -ArgumentList "/qn" -Wait
# Setting powershell as the shell to avoid parsing errors on the website names and bindings
SHELL ["powershell"]
# Removing the Default Web Site from IIS as it's not required and it's bound on port 80 which we will need for our website
RUN Remove-Website -Name 'Default Web Site'
# Copying our website to C:/websites/FMP, the folder location was copied from the build output
COPY ./build /websites/FMP
# Adding our website to IIS, setting its Physical Path and binding it to port 80
RUN New-IISSite -Name "FMP" -BindingInformation "*:80:" -PhysicalPath c:\websites\FMP
As you can see there are some interesting steps there. We had to get these right in order to get our application up and running.
Building and pushing the docker image after getting the Dockerfile right was fairly straightforward on Windows.
The image size is really large though: 3.69GB. This wouldn’t be practical to run with, since we’re keeping a few tags around.
We have found a blog post where Microsoft is talking about 40% image size reductions. https://devblogs.microsoft.com/dotnet/we-made-windows-server-core-container-images-40-smaller/ . Unfortunately, it’s only available in insider builds which are not compatible with our Kubernetes Windows node versions.
Adding Windows worker nodes to Linux based Kubernetes Clusters
Now that we have a docker image we can use, it’s time to add some Windows worker nodes to our Kubernetes cluster.
Prerequisites
- An existing Kubernetes cluster with a Linux master node the worker node can communicate with. We were working with Kubernetes version 1.19.1 at the time.
- A VM with Windows Server 2019 with the latest cumulative updates, because in order to have network
overlay
enabled we needKB4489899
andKB4497934
. Our Windows Server 2019 Build Number is 1809. - Enough fixed space on the Windows worker node. 50GB at least, ideally 100GB. Make sure it’s a fixed disk and not dynamically allocated space.
WinOverlay
Kubernetes feature gate enabled on all of the master nodes by editing the/etc/kubernetes/manifests/kube-apiserver.yaml
and appending- --feature-gates=WinOverlay=true
to the argument list.
Preparing the master node for Windows workers
This guide is based on the Kubernetes Windows Guide recommendations.
To add the first Windows worker to the cluster, we’ll have to prepare our existing Kubernetes cluster to deal with the connection. They are as follows:
-
Edit the kube-flannel
ConfigMap
with:kubectl edit cm -n kube-system kube-flannel-cfg
Search for the
net-conf.json
and add 2 new fields to theBackend
section. This will look like:"Backend": { "Type": "vxlan", "VNI" : 4096, "Port": 4789 }
-
Don’t save the
ConfigMap
just yet, find thecni-conf.json
section instead and change its name tovxlan0
fromcbr0
-
Get the raw version of a node-selector patch with
wget
from https://raw.githubusercontent.com/microsoft/SDN/1d5c055bb195fecba07ad094d2d7c18c188f9d2d/Kubernetes/flannel/l2bridge/manifests/node-selector-patch.yml and executekubectl patch ds/kube-flannel-ds-amd64 --patch "$(cat node-selector-patch.yml)" -n=kube-system
-
Add Windows Flannel and kube-proxy DaemonSets
- Get the Kubernetes version you are running with the following command
kubectl get nodes
-
Check if your Ethernet name is
Ethernet
by going to the Windows node, access the Control Panel and then Network and Sharing Center. Once there, go toChange adapter settings
. In case it’s notEthernet
, please refer to the notes of the sectionAdd Windows Flannel and kube-proxy DaemonSets
on the Kubernetes Windows Guide. - Replace
KUBERNETES_VERSION
on the line below with your Kubernetes version and run the commands
curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/VERSION/v<KUBERNETES_VERSION>/g' | kubectl apply -f - kubectl apply -f https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/flannel-overlay.yml
- Get the Kubernetes version you are running with the following command
Adding a new Windows worker node to the cluster
Again, we’ll have to prepare this Windows machine to join the cluster.
-
Enable Containers on Windows
Get-WindowsOptionalFeature -Online -FeatureName Containers Enable-WindowsOptionalFeature -Online -FeatureName Containers Restart-Computer -Force
-
Now install a specific version of Docker (
18.09.11
). This is because when we installed a newer version of Docker (19.*.*
) we had problems with kubelet pausing frequently. We also saw some PLEG errors saying something likePLEG is not healthy: pleg was last seen active 3m7.629592089s ago
when describing the node with kubectl and checking the kubelet status there. It was resolved by using this specific version of Docker.Install-Module -Name DockerMsftProvider -Repository PSGallery -Force Install-Package -Name Docker -ProviderName DockerMsftProvider -RequiredVersion 18.09.11
-
Restart the Windows VM
- Install wins, kubelet, and kubeadm by pulling and running the script from the
C:\
directory proving the target Kubernetes version to the scriptcurl.exe -LO https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/PrepareNode.ps1 .\PrepareNode.ps1 -KubernetesVersion v<KUBERNETES_VERSION>
-
Join the cluster
- Go to the control plane host and get the command to join the cluster
root@control-plane kubeadm token create --print-join-command
- Execute the command from the previous step on the Windows node you are adding
Note: It may be necessary to clear down the
C:\etc\kubernetes
directory before running the above command on the Windows node to enable the script to complete. - Go to the control plane host and get the command to join the cluster
- Check the cluster status
- See if the node is ready
ssh root@control-plane kubectl get nodes -o wide
- In case it’s not ready, check if flannel is running, mind you, it might take a while to download the image.
kubectl -n kube-system get pods -l app=flannel
- If after downloading the image it is still not ready, please refer to the Kubernetes Troubleshooting guide.
- See if the node is ready
- Since this is a Windows worker node, we don’t want to schedule certain pods on it. This includes the Prometheus, metrics and monitoring pods. To disable scheduling Linux docker containers on the worker nodes, we’ll need to add a taint:
kubectl taint nodes <node> OS=Windows:NoExecute
This will only allow deployments with the “OS=Windows” toleration on them.
For the flannel networking to work correctly, we’ll need to edit the kube-flannel-ds-windows-amd64
DaemonSet in the kube-system
namespace and add a new toleration in (there will be a tolerations
block already):
- key: "OS"
operator: "Equal"
value: "Windows"
effect: "NoExecute"
There are other ways to stop deploying to specific nodes, but we’re using this approach for now.
Should you notice any issues with the networking, you can use the Windows Kubernetes networking troubleshooting guide to debug them.
Deploy with our docker image to the newly added Windows worker node
Our last step is to deploy the .NET application into Kubernetes!
These are our Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: fmp
environment: playground
name: fmp
namespace: playground
spec:
replicas: 2
selector:
matchLabels:
app: fmp
environment: playground
release: fmp-playground
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: fmp
environment: playground
release: fmp-playground
spec:
containers:
image: <our_private_docker_repository>/findmypast/fmp:v1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: fmp
ports:
- containerPort: 80
name: http
protocol: TCP
resources:
limits:
memory: 1000Mi
requests:
cpu: 50m
memory: 200Mi
dnsPolicy: ClusterFirst
hostname: kube-fmp-01
nodeSelector:
kubernetes.io/os: windows
restartPolicy: Always
tolerations:
- effect: NoExecute
key: OS
operator: Equal
value: Windows
And Service
config objects:
apiVersion: v1
kind: Service
metadata:
labels:
app: fmp
environment: playground
monitoring: prometheus
name: fmp
namespace: playground
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
selector:
app: fmp
environment: playground
type: ClusterIP
kubectl apply -f deployment.yaml service.yaml
and we can watch our image getting pulled and pods should hopefully end up in a Running
state.
Troubleshooting
Flannel Pod Enters ‘CrashLoopBackOff’ State
If the kube-flannel
pod enters CrashLoopBackOff
, it might be due to a missing podCIDR
configuration. Check the logs to see if an error similar to the following is present:
E0302 11:39:19.978950 1 main.go:289] Error registering network: failed to acquire lease: node "<node>" pod cidr not assigned
If you see an error similar to this, check whether the podCIDR
has been set using the following:
$ kubectl get node <node> -o jsonpath='{.spec.podCIDR}'
If the podCIDR
has not been set, you need to get a list of podCIDR
s currently configured in the
target cluster by running the following (where the second line is example output):
$ kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
10.244.0.0/24 10.244.3.0/24 10.244.2.0/24 10.244.1.0/24
Finally, you should patch the affected node to assign a podCIDR
not in the outputted list, for
example (where the second line is the output from the command):
$ kubectl patch node <node> -p '{"spec":{"podCIDR":"10.244.4.0/24"}}'
node/<node> patched
- You’ll then need to verify that the windows pods are running in the
kube-system
namespace. If not yet, give them a restart.
Can’t route to services / kube-proxy or flannel pods are restarting/not running
You will need to patch the file, then restart the node for the networking to be correct.
Summary
So, the initial question remains… Is this going to be viable? We can see how circumstantial it is to set up Windows worker nodes and deploy an ASP.NET app into Kubernetes. It is possible, but one would not say it is easy.
After getting the application running in K8S, pods were really unstable. We were getting odd container restarts; health check timeouts and metrics were lacking as well. We decided not to go with this setup in our production systems for now.
Don’t give up hope just yet! There are interesting articles out there that talk about setting up monitoring for Windows nodes and pods: https://www.inovex.de/blog/kubernetes-on-windows-2-tools/ .