troubleshooting warning kubernetes ·

How to Debug OOMKilled Pods in Kubernetes: A Step-by-Step Guide

Stop Kubernetes CrashLoopBackOffs. Learn how to diagnose OOMKilled pods, analyze memory working sets, and perform heap profiling to fix memory leaks.

How to Debug OOMKilled Pods in Kubernetes: A Step-by-Step Guide
Advertisement

What Does OOMKilled Actually Mean?

An OOMKilled status occurs when the Linux kernel’s Out-of-Memory (OOM) killer terminates a process to prevent a system-wide crash. In Kubernetes, this is signaled by Exit Code 137. The primary goal of the OOM killer is to reclaim memory to keep the underlying node stable.

There are two distinct triggers for this event. First, a container may exceed its specified memory limit, leading the kubelet to kill the process immediately. Second, the entire node may experience memory exhaustion, forcing the kernel to select a “victim” pod based on its Quality of Service (QoS) class. I’ve seen this happen frequently in clusters where “Burstable” pods are over-provisioned, causing the kernel to kill pods that were technically under their own limits just to save the node. Detailed resource management specifications are available in the official Kubernetes documentation.

Why is Your Pod Getting OOMKilled?

Root causes typically fall into three categories: application leaks, misconfigured limits, or node-level pressure.

Application Memory Leaks

This is the most common cause for pods that operate normally for several hours before crashing. A leak occurs when an application allocates memory but fails to release it. In Java, this often happens when objects are stored in static collections that never clear. In Go, common culprits include unclosed goroutines or slices that grow indefinitely. In these cases, memory usage climbs linearly until it hits the hard limit.

Improper Resource Limits

Engineers often set limits too close to requests. Many applications have a “bursty” startup phase. For example, a Spring Boot application loading numerous beans into memory may spike to 600Mi during initialization. If the limit is set to 512Mi, the pod will be killed before it ever reaches a healthy state.

Node-Level Memory Pressure

When multiple pods are configured with requests significantly lower than their limits, they can all attempt to burst simultaneously. If the aggregate usage exceeds the node’s physical RAM, the kernel triggers a node-level OOM event. The kernel targets pods with the lowest priority or those consuming the most memory relative to their requested amount.

How to Diagnose and Fix OOMKilled Pods

Follow this systematic workflow to move from a crashing pod to a verified root cause.

Step 1: Confirm the Termination Reason

Identify the failing pod and verify the exit code using kubectl.

$ kubectl get pods
NAME                               READY   STATUS             RESTARTS   AGE
api-gateway-7f8db6d9-abc12          0/1     CrashLoopBackOff   4          12m

$ kubectl describe pod api-gateway-7f8db6d9-abc12

Locate the Last State section in the output:

Containers:
  api-container:
    State:          Running
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Finished:     Thu, 24 Oct 2024 14:22:01 +0000

An Exit Code: 137 confirms an OOM event. If the pod cycles through crashes without this specific code, refer to our guide on Fix CrashLoopBackOff in Kubernetes Pods.

Step 2: Analyze the “Working Set” Metric

Avoid using container_memory_usage_bytes in Prometheus, as it includes page caches that the kernel can reclaim under pressure. Kubernetes makes OOM decisions based on container_memory_working_set_bytes.

Execute this PromQL query in your Grafana dashboard:

sum(container_memory_working_set_bytes{pod="api-gateway-7f8db6d9-abc12"}) by (pod)

A “sawtooth” pattern (steady growth followed by a sharp drop to zero) indicates a memory leak. A flat line with a sudden, vertical spike suggests a resource limit issue or a specific heavy request.

Step 3: Application Profiling

Increasing limits without profiling only delays the crash. If a leak is suspected, you must analyze the heap.

For Go applications, integrate pprof. Add this to your main.go:

import (
    _ "net/http/pprof"
    "net/http"
)

func main() {
    go func() {
        // Listen on a separate port to avoid interfering with app traffic
        http.ListenAndServe("0.0.0.0:6060", nil)
    }()
    // your app logic
}

Capture a heap profile while the pod is under load:

kubectl exec api-gateway-7f8db6d9-abc12 -- curl -s http://localhost:6060/debug/pprof/heap > heap.pprof
go tool pprof -http=:8080 heap.pprof

For Java applications, trigger a heap dump using jcmd:

kubectl exec api-gateway-7f8db6d9-abc12 -- jcmd 1 GC.heap_dump /tmp/heapdump.hprof
kubectl cp api-gateway-7f8db6d9-abc12:/tmp/heapdump.hprof ./heapdump.hprof

Analyze the .hprof file in VisualVM or Eclipse MAT to identify the leaking class.

Step 4: Right-Sizing the Limits

Calculate the new limit based on observed peaks. A production-ready standard is to set the limit 20% to 30% above the peak working_set_bytes observed during a full load test.

Update your deployment manifest:

resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "1Gi"
    cpu: "500m"

Apply the change:

kubectl apply -f deployment.yaml

How to Prevent OOMKills in the Future

To stop reacting to memory crashes, implement these three architectural strategies.

  1. Deploy Vertical Pod Autoscaler (VPA): Use VPA in Recommender mode. It analyzes historical usage and suggests the ideal requests and limits, reducing the guesswork that leads to OOM events.

  2. Enforce Guaranteed QoS: For critical workloads, set requests exactly equal to limits. This assigns the pod to the “Guaranteed” QoS class, making it the last candidate for eviction when the node runs out of RAM.

  3. Coordinate with HPA: Ensure your Horizontal Pod Autoscaler (HPA) is tuned. If HPA triggers new pods based on CPU but your pods are OOMKilled due to memory, you’ll experience cascading failures. Read Kubernetes HPA Deep Dive: Autoscaling Explained for coordination tips.

FAQ

Q: Why did my pod get OOMKilled even though it was below its limit? A: This is a node-level OOM event. When the physical RAM of the node is exhausted, the kernel kills pods based on QoS class. “BestEffort” pods die first, then “Burstable” pods. If your pod is “Burstable” and the node is under pressure, it can be killed regardless of its individual limit.

Q: What is the difference between RSS and Working Set? A: Resident Set Size (RSS) is the memory physically held in RAM. Working Set is RSS plus cached memory that cannot be evicted. Kubernetes uses Working Set for OOM decisions because it represents the memory the container absolutely requires to function.

Q: Should I enable swap in Kubernetes to prevent OOMKills? A: No. While Kubernetes 1.28+ has improved swap support, relying on swap usually masks memory leaks and introduces severe latency spikes. It is better to fix the leak or increase the node size.

Conclusion and Next Steps

Debugging OOMKilled pods requires moving beyond kubectl get pods and into memory metrics and heap profiling. By distinguishing between container-level and node-level OOM events, you can apply the correct fix, whether that is tuning JVM heap sizes or adjusting your QoS class.

Your next steps:

  1. Audit your current deployments for “Burstable” pods with wide gaps between requests and limits.
  2. Install the Prometheus kube-state-metrics to track container_memory_working_set_bytes.
  3. Integrate pprof or jcmd into your base container images to make profiling immediate during incidents.
Advertisement

Stay up to date

Get DevOps tips, tutorials, and guides delivered to your inbox.