Skip to main content

Sometimes I write code

Upgrading CRDs in Kubernetes

Today, I was performing a routine update in my Kubernetes cluster and I encountered a unique error in ArgoCD. I thought the update to Cert Manager, from 1.6.1 to 1.7.1 would be a simple one due to SemVer, but a CRD issue stopped me from upgrading.

That issue, while easily fixable if I’d just read the release notes, led me to an interesting journey about CRD deprecations.

The Error

When I tried to sync, ArgoCD spat out a scary looking error message about all the CRDs:

Failed to sync CRDs

I’d missed the note on the ArtifactHub page stating that I should check the release notes for Cert Manager. Turns out, they’d removed a bunch of old CRD versions in 1.7.0. That’s cool, they also mention using their tool cmctl to easily do this.

While I could just grab their cmctl and run it blindly, I thought it’d be fun to investigate this issue from a pure Kubernetes perspective. What if I needed to do this with one of my own CRDs or someone else had a CRD that we needed deprecate and they didn’t provide a THINGctl?

I started looking for documents related to status.storedVersions since that’s what my error was and found there’s a section specifically on this topic named upgrading existing object to a new stored version.

What is storedVersions

The API documents states that storedVersions:

The API server records each version which has ever been marked as the storage version in the status field storedVersions. Objects may have been persisted at any version that has ever been designated as a storage version. No objects can exist in storage at a version that has never been a storage version.

Let’s have a look at status.storedVersions in my ClusterIssuers CRD, since I know I have one and at one point it was v1alpha2.

kubectl get crd clusterissuers.cert-manager.io -o yaml | \
  yq e '.status.storedVersions' -
- v1alpha2
- v1

This means that at some point I’ve had a v1alpha2 and a v1 version of this CRD installed. This does not mean that I’ve had instances of each of these, just that the CRD definition existed at each of these API versions. For my cluster this means I was not very good at updating Cert Manager 😞.

Fixing our CRD

The reason this is “broken” is that the new version no longer has v1alpha2 in it, so Kubernetes is trying to protect us by saying “hey man, at some point you had v1alpha2 in your cluster, but you’re trying to remove it from the CRD! This might mean some of your stuff won’t work!” Thanks Captain Kube!

The process is effectively:

  • verify that we don’t have any more of the old Custom Resources
    • I performed this validation at the repo level
    • kubent is a great tool for this and you can use it with CRDs!
  • patch the status to remove the clusters’ memory of them
  • upgrade the CRD spec to a version that removes the deprecated versions

We can use the patch operation combined with kube proxy as stated in the docs to remove the v1alpha2 CRD storedVersion:

kubectl proxy &
curl --header "Content-Type: application/json-patch+json" \
  --request PATCH http://localhost:8001/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterissuers.cert-manager.io/status \
  --data '[{"op": "replace", "path": "/status/storedVersions", "value":["v1"]}]'

This will return the CRD and you should be able to verify it no longer has the old storedVersion:

kubectl get crd clusterissuers.cert-manager.io -o yaml | \
  yq e '.status.storedVersions' -
- v1

Awesome, we’re ready to rock now and can go sync ArgoCD!