Kubernetes v1.36: SELinux Mount Optimization Reaches General Availability

From Xtcworld, the free encyclopedia of technology

Introduction

Kubernetes continues to refine its integration with Security-Enhanced Linux (SELinux). As of v1.36, the SELinuxMountReadWriteOncePod feature gate has graduated to General Availability (GA). This change makes volume initialization faster for many workloads by avoiding expensive recursive relabeling. However, it also introduces subtle behavioral shifts that could impact applications sharing volumes between privileged and unprivileged Pods. Cluster administrators should prepare for this transition, as the broader SELinuxMount feature is expected to become enabled by default in the upcoming v1.37 release.

Kubernetes v1.36: SELinux Mount Optimization Reaches General Availability

If your nodes do not use SELinux (e.g., SELinux is disabled or unavailable in the kernel), the kubelet skips all SELinux logic, so no changes affect you. For those running SELinux in enforcing mode, this article explains the problem, the solution introduced by Kubernetes, and the steps you should take now to ensure a smooth upgrade.

The Problem: Slow Recursive Relabeling

Linux systems with SELinux enforce access controls using security labels attached to objects such as files and network sockets. When a Pod is created, the container runtime historically applies an SELinux label to the Pod and all its volumes. Kubernetes passes the label from the Pod's securityContext.seLinuxOptions to the runtime, which then performs a recursive label change on every file within the volume.

This recursive operation is time-consuming, especially for volumes with many files or those residing on remote filesystems (like NFS or cloud block storage). The overhead can delay Pod startup and increase resource consumption on nodes.

Note: If a container uses a subPath of a volume, only that subdirectory is relabeled. This allows two Pods with different SELinux labels to share the same volume, as long as they use distinct subpaths.

Additionally, when a Pod has no SELinux label assigned in the API, the container runtime assigns a unique random label to isolate the Pod's processes. Even then, the runtime still recursively relabels all volumes—a potentially expensive process.

How Kubernetes Improves Volume Mounting

To eliminate recursive relabeling, the kubelet can now mount volumes using the -o context=<label> option. This tells the kernel to apply the correct SELinux label to every inode on that mount point without traversing the filesystem tree. The result is a near-instant label assignment, drastically reducing Pod startup time.

This optimized path is gated by feature gates and requires:

  • The Pod exposes a sufficient SELinux label (e.g., spec.securityContext.seLinuxOptions.level).
  • The volume driver opts in to seLinuxMount. For CSI drivers, this means setting spec.seLinuxMount: true in the CSIDriver object.

Kubernetes rolled out the feature in two phases:

  1. ReadWriteOncePod volumes were handled under the SELinuxMountReadWriteOncePod feature gate, which became enabled by default in v1.28 and reached GA in v1.36.
  2. Broader coverage (all volumes) is managed under the SELinuxMount feature gate, paired with the new spec.securityContext.seLinuxChangePolicy field on Pods (introduced in v1.32 as alpha).

The GA of the first phase confirms the approach is stable. The broader coverage is still alpha but is expected to become beta and eventually default in future releases.

What Changes in v1.36 and v1.37

With v1.36, clusters using ReadWriteOncePod volumes will automatically benefit from the optimized mount, provided the volume driver supports it. This is a transparent performance improvement for most workloads.

Looking ahead to v1.37, the community plans to enable the SELinuxMount feature gate by default. This will apply the optimized mount to all volume types (not just RWO Pods). However, this change can break applications that rely on the old recursive relabeling behavior.

Why It Might Break Workloads

In the old model, the container runtime relabeled all files in the volume. With the new mount, the label is applied only at the mount point. This means:

  • If two Pods share a volume (e.g., a persistent volume claim mounted by a privileged and an unprivileged Pod on the same node), the kernel may use a single label for the entire mount, breaking the expectation that each Pod sees its own label.
  • Applications that previously relied on the runtime setting different labels for different subpaths (via subPath) might behave unexpectedly if the volume is now mounted with a global context.

In short, the optimization is safe for single-Pod mounts but can be problematic for shared volumes, especially those mixing privilege levels.

Recommended Actions for Cluster Administrators

v1.36 is the ideal release to audit your cluster and prepare:

  • Identify nodes running SELinux in enforcing mode. Use commands like getenforce or check node labels.
  • List all workloads using ReadWriteOncePod volumes. Ensure they are compatible with the new mount behavior. In most cases, they will work fine.
  • Check for volumes shared between multiple Pods on the same node. Pay special attention to cases where one Pod is privileged and another is unprivileged.
  • If you encounter issues, you can opt out by disabling the SELinuxMountReadWriteOncePod feature gate in v1.36. For v1.37, the SELinuxMount gate will be on by default, but you can disable it manually by setting the feature gate to false in the kubelet configuration.
  • Test the upgrade in a non-production environment before rolling out to production clusters.

For more details on the feature, refer to the improvement section above and the original Kubernetes Enhancement Proposal.

Conclusion

The SELinux mount optimization is a welcome performance improvement for Kubernetes clusters with SELinux. By eliminating recursive relabeling, Pod startup times can be significantly reduced. However, the change in behavior can impact workloads that depend on the old model, especially those sharing volumes. Cluster operators should take advantage of the v1.36 release to audit their configurations, test the new behavior, and plan for the default enablement in v1.37.

If you encounter any unexpected behavior after upgrading, consider temporarily disabling the feature gate while you adapt your workloads. As always, the Kubernetes community encourages feedback and contributions to further refine this functionality.