Diagnosing CPU Throttling in Kubernetes Pods for Go Services

TL;DR – 5 quick takeaways
– CPU throttling silently drags Go request latency past the 99th‑percentile.
– The Linux cgroup quota, not the Go runtime, decides when a pod is throttled.
kubectl top, cAdvisor, and Prometheus expose container_cpu_cfs_throttled_seconds_total for pod‑level visibility.
– A tiny Go helper that reads /sys/fs/cgroup/.../cpu.stat lets you correlate throttling with GC pauses.
– Fixes range from raising limits or switching to Guaranteed QoS to auto‑scaling with VPA or KEDA.


Before you start, you need

  • A Kubernetes cluster (1.24+ recommended) with metrics-server installed.
  • Access to kubectl (v1.27) and helm (v3.12).
  • Prometheus 2.44 deployed with kube-state-metrics 2.9.
  • A Go 1.22 (or newer) binary compiled for Linux amd64.
  • Basic familiarity with cgroup v2 hierarchy and Go runtime/pprof.

Diagnosing CPU throttling in Kubernetes pods running Go services

Why CPU throttling matters for Go microservices

When a Go HTTP handler stalls, the runtime still attempts to schedule goroutines.
If the kernel repeatedly slices away CPU slices, GC cycles pile up, and latency spikes.
A CNCF 2023 survey reported that 42 % of production Go services blame intermittent throttling for latency outliers.
The same study showed that a 5 % throttling rate often pushes p99 latency past SLA thresholds.

⚠️ Warning: Ignoring throttling can mask memory leaks or inefficient loops, because the Go scheduler hides the real CPU pressure behind its own metrics.


Kubernetes CPU scheduling basics (requests, limits, QoS)

Kubernetes classifies a pod into three QoS categories:

QoS classRequest = LimitRequest < LimitNo request/limit
Guaranteed
Burstable
Best‑Effort

A Guaranteed pod receives a dedicated cgroup quota that matches its request, giving the kernel a clear ceiling.
Burstable pods live with a lower request; the scheduler may overcommit the node, leading to throttling under contention.


How the Linux cgroup enforces throttling

The kernel attaches every container to a cgroup.
When you set cpu: "500m" in the pod spec, the kubelet writes cpu.max (cgroup v2) or cpu.cfs_quota_us (cgroup v1).
If the process exceeds its quota, the kernel increments the throttled_time counter and stalls the task until the next period.

💡 Pro Tip: On modern clusters, the unified cgroup v2 hierarchy stores counters in /sys/fs/cgroup/<cgroup>/cpu.stat. The field throttled_time reports nanoseconds spent throttled.


Toolchain overview: kubectl top, cAdvisor, Prometheus, kube‑state‑metrics, and Go runtime/pprof

ToolWhat it showsTypical command
kubectl top podLive CPU/Memory vs limitskubectl top pod my-go-pod -n prod
cAdvisor (via kubelet)Per‑container container_cpu_cfs_throttled_seconds_totalExposed on :10250
PrometheusTime‑series, rate calculationsQuery container_cpu_cfs_throttled_seconds_total
kube‑state‑metricsPod QoS class, requests/limitskube_pod_status_qos
Go runtime/pprofCPU profile, GC pause histogramgo tool pprof

The combination gives you both the symptom (high CPU usage) and the root cause (throttling counter spikes).


Collecting throttling metrics – pod vs node perspective

From a node viewpoint, node_cpu_seconds_total aggregates all containers, masking individual throttling.
For precise diagnosis, slice the data at the pod level.

Prometheus query for pod‑level throttling rate (5 min window):

sum(rate(container_cpu_cfs_throttled_seconds_total[5m])) by (namespace, pod)

The result expresses throttled seconds per second, i.e., the percentage of allotted CPU time that was denied.

To see node‑wide pressure, add by (instance) and compare against node_cpu_seconds_total.


Step‑by‑step detection workflow

  1. Validate limits – Run kubectl get pod my-go-pod -o yaml | grep -A3 cpu.
  2. Spot obvious over‑usagekubectl top pod my-go-pod.
  3. Pull cAdvisor statscurl -k https://<node>:10250/stats/summary | jq '.pods[] | select(.podRef.name=="my-go-pod") | .cpu'.
  4. Query Prometheus – Use the rate query above in Grafana.
  5. Correlate inside the container – Deploy the Go helper (see next section) that reads cpu.stat and logs throttled_time.
  6. Overlay with Go runtime metrics – Capture a 30‑second CPU profile during load, then compare timestamps of spikes with throttling logs.

Following this pipeline eliminates guesswork and lets you pinpoint whether the bottleneck lives in the platform or the code.


Analyzing Go runtime metrics after throttling

When throttling occurs, the Go scheduler receives fewer CPU cycles, which inflates runtime.GOMAXPROCS contention and GC work.
Inspect the GC pause histogram (runtime/debug/pprof) for a sudden right‑shift.

go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

Look for the “CPU time” column; if it diverges from the wall‑clock time, throttling is likely the culprit.


Common architectural trade‑offs (Burstable vs Guaranteed QoS, VPA, node‑level vs pod‑level limits)

DecisionProsCons
Burstable (lower request)Saves money on under‑utilized nodesHigher jitter during spikes; throttling more frequent
GuaranteedPredictable latency; throttling rareHigher reservation cost; may lead to pod‑eviction if limits are too tight
Vertical Pod AutoscalerDynamically bumps request/limit based on usageAdds control‑plane load; may cause churn if policy is aggressive
KEDA (event‑driven scaling)Scales out before throttling hitsRequires accurate metrics; complexity in Kafka/Redis triggers
Node‑level limit (cgroup v2 parent)Guarantees a slice of node CPU for a tenantOver‑provisioning can waste capacity; difficult to tune per application

Balancing cost against latency stability often means mixing Guaranteed pods for latency‑critical paths and Burstable pods for best‑effort background workers.


Real‑world case studies & statistics

  • Shopify (2022): A Go checkout service suffered 25 % higher median latency. After raising the pod’s CPU limit from 700 m to 1 CPU and moving it to Guaranteed QoS, latency fell back within SLA.
  • Uber benchmark: A 5 % throttling rate added 12 % to p99 latency for a real‑time matching engine built with Go. The team mitigated the issue by enabling VPA, which raised limits during peak traffic.
  • Google Borg analysis: Throttling beyond 10 % of allotted CPU time caused a 1.8× increase in GC pause durations across typical Go workloads.

These numbers underline why even modest throttling percentages matter for latency‑sensitive microservices.


Remediation strategies: tuning limits, autoscaling, code optimizations

  1. Raise the limit – Increment the cpu: "1" field in the pod spec, then redeploy.
  2. Switch QoS – Align request and limit to achieve Guaranteed status.
  3. Enable VPA – Deploy the vertical-pod-autoscaler chart (helm install vpa vpa-chart --version 0.13.0).
  4. Add KEDA – Use a ScaledObject that watches container_cpu_cfs_throttled_seconds_total and adds replicas when the rate exceeds 0.05.
  5. Optimize Go code – Reduce allocation churn, avoid tight loops without runtime.Gosched(), and tune GOGC to lower GC frequency.

Automation & alerting (Prometheus rules, Grafana dashboards, KEDA)

Prometheus alert rule (high throttling):

groups:
- name: cpu-throttling.rules
  rules:
  - alert: CpuThrottlingHigh
    expr: sum(rate(container_cpu_cfs_throttled_seconds_total[5m])) by (namespace, pod) > 0.1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Pod {{ $labels.pod }} throttling >10%"
      description: "CPU throttling has exceeded 10 % of allocated time for the last 5 minutes."

Grafana panel template (single‑stat):

  • Query: sum(rate(container_cpu_cfs_throttled_seconds_total[1m])) by (pod)
  • Visualization: Gauge, thresholds 0–0.05 (green), 0.05–0.1 (orange), >0.1 (red)

💡 Pro Tip: Set the dashboard variable pod to {{ $pod }} so the same panel can be reused across namespaces.

KEDA ScaledObject example (throttling‑driven scaling):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: go-service-scaler
spec:
  scaleTargetRef:
    name: go-service-deployment
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc:9090
      metricName: container_cpu_cfs_throttled_seconds_total
      query: sum(rate(container_cpu_cfs_throttled_seconds_total[2m])) by (pod) > 0.05
      activationQuery: sum(rate(container_cpu_cfs_throttled_seconds_total[5m])) by (pod) > 0.03

When throttling climbs, KEDA adds replicas, diluting the contention and restoring smooth CPU delivery.


End‑to‑end Go helper: reading cgroup throttling stats

Below is a self‑contained Go program (compatible with Go 1.22) that:

  1. Detects the active cgroup path (handles both v1 and v2).
  2. Reads cpu.stat (or cpu.cfs_quota_us & cpu.cfs_period_us for v1).
  3. Exposes a /metrics endpoint for Prometheus with go_throttling_nanoseconds_total.
// go-cgroup-throttle/main.go
// Requires: go 1.22+, github.com/prometheus/client_golang v1.18.0
package main

import (
    "bufio"
    "fmt"
    "net/http"
    "os"
    "path/filepath"
    "strconv"
    "strings"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    throttledNs = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "go_throttling_nanoseconds_total",
            Help: "Cumulative nanoseconds the container spent throttled by the kernel.",
        },
    )
)

func init() {
    prometheus.MustRegister(throttledNs)
}

// locateCgroupPath returns the absolute path that contains cpu.stat.
// It works for both unified (v2) and legacy (v1) hierarchies.
func locateCgroupPath() (string, error) {
    // /proc/self/mountinfo contains the mount hierarchy.
    f, err := os.Open("/proc/self/mountinfo")
    if err != nil {
        return "", fmt.Errorf("open mountinfo: %w", err)
    }
    defer f.Close()

    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        line := scanner.Text()
        fields := strings.Split(line, " ")
        // field 4 is the mount point, field 9 is the filesystem type.
        if len(fields) < 10 {
            continue
        }
        if fields[8] == "cgroup2" {
            // unified hierarchy – the cgroup path is the 5th field after the separator.
            parts := strings.Split(line, " - ")
            if len(parts) != 2 {
                continue
            }
            root := strings.Fields(parts[0])[3]
            return filepath.Join(root, "cpu.stat"), nil
        }
        if strings.Contains(fields[8], "cpu") && fields[8] == "cgroup" {
            // legacy hierarchy – look for cpu.stat under cpu sub‑dir.
            parts := strings.Split(line, " - ")
            if len(parts) != 2 {
                continue
            }
            root := strings.Fields(parts[0])[3]
            return filepath.Join(root, "cpu","cpu.stat"), nil
        }
    }
    if err := scanner.Err(); err != nil {
        return "", fmt.Errorf("scan mountinfo: %w", err)
    }
    return "", fmt.Errorf("cgroup cpu.stat not found")
}

// readThrottledNs parses cpu.stat and returns the throttled_time field in nanoseconds.
func readThrottledNs(path string) (uint64, error) {
    f, err := os.Open(path)
    if err != nil {
        return 0, fmt.Errorf("open %s: %w", path, err)
    }
    defer f.Close()

    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "throttled_time") {
            parts := strings.Fields(line)
            if len(parts) != 2 {
                return 0, fmt.Errorf("unexpected format in %s", path)
            }
            val, err := strconv.ParseUint(parts[1], 10, 64)
            if err != nil {
                return 0, fmt.Errorf("parse uint: %w", err)
            }
            return val, nil
        }
    }
    if err := scanner.Err(); err != nil {
        return 0, fmt.Errorf("scan %s: %w", path, err)
    }
    return 0, fmt.Errorf("throttled_time not found in %s", path)
}

// collector runs every 15 seconds, updates the Prometheus counter.
func collector(path string) {
    var last uint64
    for {
        cur, err := readThrottledNs(path)
        if err != nil {
            fmt.Fprintf(os.Stderr, "error reading throttling stats: %v\n", err)
            time.Sleep(15 * time.Second)
            continue
        }
        if cur > last {
            delta := cur - last
            throttledNs.Add(float64(delta))
            last = cur
        }
        time.Sleep(15 * time.Second)
    }
}

func main() {
    cgroupPath, err := locateCgroupPath()
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to locate cgroup path: %v\n", err)
        os.Exit(1)
    }
    go collector(cgroupPath)

    http.Handle("/metrics", promhttp.Handler())
    fmt.Println("Serving metrics on :9090/metrics")
    if err := http.ListenAndServe(":9090", nil); err != nil {
        fmt.Fprintf(os.Stderr, "listen failed: %v\n", err)
    }
}

How to use in a pod

apiVersion: v1
kind: Pod
metadata:
  name: go-service
spec:
  containers:
  - name: app
    image: ghcr.io/nileshblog/checkout-service:latest
    resources:
      requests:
        cpu: "500m"
      limits:
        cpu: "1"
    ports:
    - containerPort: 8080
  - name: cgroup-exporter
    image: ghcr.io/nileshblog/go-cgroup-throttle:1.0.0
    ports:
    - containerPort: 9090
    securityContext:
      readOnlyRootFilesystem: false

The sidecar pushes go_throttling_nanoseconds_total to the same Prometheus instance that already scrapes your application metrics, enabling a single graph that correlates GC pauses with throttling spikes.

💡 Pro Tip: Deploy this exporter only in staging or high‑traffic environments; the additional HTTP endpoint adds negligible overhead (≈ 0.1 ms per scrape).


Common Errors & Fixes

  • permission denied when opening /sys/fs/cgroup/.../cpu.stat – Ensure the container runs with privileged: false but with allowPrivilegeEscalation: true and that the pod has readOnlyRootFilesystem: false.
  • Zero values from container_cpu_cfs_throttled_seconds_total – Verify that the node uses cgroup v2; on older clusters you must enable --cgroup-driver=systemd for the kubelet.
  • Prometheus duplicate time series – The Go exporter may be scraped multiple times if you expose the same port on both containers; consolidate using a sidecar or shared process.
  • Unexpected high throttling after raising limits – Check for a node‑level CPU pressure (node_cpu_seconds_total near 100 %). The pod may be fine, but the node is overloaded; consider adding more nodes or enabling cluster autoscaler.

My take

My take: I’ve seen teams spend days chasing “slow Go functions” only to discover that a noisy neighbor on the same node kept the kernel from delivering the requested CPU share. Adding a tiny exporter that reads cpu.stat turned the mystery into a quantifiable metric, and the subsequent switch to Guaranteed QoS eliminated the latency jitter. The extra line of code paid for itself in minutes of engineering time saved.


CTA

If this guide helped you untangle CPU throttling in your Go services, drop a comment below, share the article on X, or subscribe to the newsletter at nileshblog.tech for more deep‑dives into Kubernetes performance engineering.


Frequently Asked Questions

What is CPU throttling and how does Kubernetes enforce it?

CPU throttling occurs when a container exceeds the CPU quota defined by its cgroup. Kubernetes enforces the quota via the Linux cgroup subsystem: if a pod’s cpu limit is 500 mCPU, the kernel caps the container’s CPU time; excess cycles are throttled, resulting in delayed execution.

How can I tell if a Go pod is being throttled?

Start with kubectl top pod <pod> to see CPU usage vs limit. Then query cAdvisor or Prometheus for container_cpu_cfs_throttled_seconds_total. Inside the container you can read /sys/fs/cgroup/cpu/cpu.stat (or the unified path for cgroup v2) and correlate throttled_time with Go’s runtime metrics (e.g., runtime/pprof CPU profile) to confirm the impact.

Why do Burstable pods experience throttling more often than Guaranteed pods?

Burstable pods have a request lower than their limit, so the scheduler may place them on a node that cannot guarantee the limit during contention. When other workloads consume CPU, the kernel throttles the burstable pod to keep the node’s aggregate usage within capacity.

Can Prometheus alert on CPU throttling?

Yes. A typical alert rule is:

alert: CpuThrottlingHigh
if: sum(rate(container_cpu_cfs_throttled_seconds_total[5m])) by (pod, namespace) > 0.1

This triggers when more than 10 % of allotted CPU time is being throttled over a five‑minute window.

What remediation steps should I take after detecting throttling?

1️⃣ Increase the pod’s CPU limit or request.
2️⃣ Move the pod to a Guaranteed QoS class.
3️⃣ Enable Vertical Pod Autoscaling to let the control plane adjust limits automatically.
4️⃣ Optimize Go code (reduce GC pressure, use worker pools).
5️⃣ Verify with load‑testing that throttling metrics drop below the threshold.


Architecture diagram

flowchart LR
    subgraph Cluster
        node1[Node (cgroup v2)]
        node2[Node (cgroup v2)]
    end
    subgraph PodA
        appA[Go Service] --> exporterA[Throttle Exporter]
    end
    subgraph PodB
        appB[Go Service] --> exporterB[Throttle Exporter]
    end
    node1 --> PodA
    node2 --> PodB
    exporterA -->|scrape| prometheus[Prometheus]
    exporterB -->|scrape| prometheus
    prometheus -->|alert| alertMgr[Alertmanager]
    alertMgr -->|notify| slack[Slack]
    prometheus -->|dash| grafana[Grafana]
    style exporterA fill:#f9f,stroke:#333,stroke-width:2px
    style exporterB fill:#f9f,stroke:#333,stroke-width:2px

The diagram illustrates how each Go pod runs a sidecar exporter that pushes throttling counters to a central Prometheus instance, which then fuels alerts and dashboards.


CPU throttling timeline overlay with Go GC pauses
Recommended Alt Text: Graph showing CPU throttling seconds over time alongside Go GC pause histogram for a microservice.


Internal Links


Author Bio:
I’m Nilesh Raut, a Software Development Engineer with 2+ years of experience, specializing in Go, JavaScript, Python, Docker, Kubernetes, Git, Jenkins, microservices, and system design (LLD/HLD), backed by a strong foundation in data structures and algorithms. Alongside my engineering journey, I bring 4+ years of hands‑on experience in SEO, where I’ve worked extensively on content strategy, keyword research, technical SEO, and organic growth, helping products and businesses scale efficiently by aligning solid technology with search‑driven performance.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top