At Pomerium, we help organizations run secure and resilient systems on Kubernetes. Health checks are a critical part of that work, yet they’re often difficult to configure effectively. As we expand our deployment and observability practices, we’ve been refining how we think about health checks, why they’re challenging, and the patterns that lead to reliability across different customer environments.
Recently, Nick Taylor (Developer Advocate), Alex Lamarre (Software Engineer), and Bobby DeSimone (Founder, Pomerium) shared how our team is building Kubernetes health checks into the Pomerium platform.
Speaker: Nick (Host, Pomerium)
"Hey everyone, welcome back to the Pomerium live stream. Today, we're kicking off a new series where members of the Pomerium crew tackle real-world challenges. We're starting with a topic that's been top of mind: Kubernetes health checks.
Before we dive too deep, let's have everyone introduce themselves. I'm Nick, your usual host. Alex?"
Speaker: Alex (Software Engineer, Pomerium)
"Hi, I'm Alex. I joined Pomerium last month as a software engineer."
Speaker: Bobby (Founder, Pomerium)
"I'm Bobby, the founder of Pomerium. I wrote the original version of the project and still get hands-on with a lot of our infrastructure."
Nick:
"For folks who might be new to Kubernetes, could you give us a TL;DR?"
Alex:
"Absolutely. The story really starts with containers: they solve the problem of shipping software across different hardware platforms. Kubernetes is an orchestration engine for containers that manages scaling, configuration, and deployment. It's an API-driven, configuration-based system that enables companies to run hundreds of containers across multiple machines with consistency and efficiency."
Nick:
"Is there a rule of thumb on when a project should move from Docker Compose to Kubernetes?"
Alex:
"I prefer Kubernetes for most things, even smaller projects. Docker Compose is great for local development, but it starts to break down when you need multi-node support, self-healing, or advanced health checks. Kubernetes gives you a standardized way to manage all those requirements."
Bobby:
"A significant portion of Pomerium's larger customers use Kubernetes, especially for new workloads. Docker Compose is awesome for local container work, but as soon as you need to abstract hardware or use multiple machines, Kubernetes is a better fit."
Nick:
"We've all heard about health checks, but why are they tricky in Kubernetes?"
Alex:
"Kubernetes uses eventually consistent patterns. To manage state, it relies on three types of health checks or 'probes':
Startup: For initialization
Readiness: Determines if the application can receive traffic
Liveness: Monitors if the application needs to be restarted
Getting these right isn't just code—it's also about understanding your app's internal states and dependencies. The complexity ramps up when you try to map complex, stateful systems to Kubernetes' stateless health probes."
Nick:
"Does Kubernetes' eventual consistency create challenges?"
Alex:
"Absolutely. These checks are continuously polled and, after a set threshold of failures, can cause your pods to restart or stop receiving traffic. This model works well for stateless apps but requires careful reasoning for stateful services like Pomerium’s proxy."
Bobby:
"Every application needs to reliably report health. Alex, could you explain how you approached implementing health checks for Pomerium?"
Alex:
"When designing Pomerium’s health checks, I drew on the 'service lifecycle' concept—abstracting our services as moving through states: created, starting, running, terminating. For each major subsystem (authentication, authorization, envoy proxying), we devised specific checks. Difficult cases include cache invalidation and synchronizing across different deployment types, especially in split-mode setups. For advanced scenarios, we even considered readiness gates and external signals that can block readiness until the system is truly ready."
Nick:
"How do you balance security, sensible defaults, and customizability?"
Alex:
"Our default is 'check everything’: if a service is running, it gets checked. Some customers might want to relax certain checks due to operational needs, so we've made the system configuration-driven. Kubernetes’ interfaces let teams adjust thresholds and select which internal services factor into readiness probes."
Bobby:
"The goal is to empower users, but our priority is always safe, secure defaults. Observability and clear error messages are crucial—we not only want to know that something is failing, but also what and why."
Alex:
"If you need richer signals than health checks provide, look to observability: metrics, logs, traces. With Pomerium, we’ve added robust tracing (including OpenTelemetry in recent releases) to help operators diagnose issues beyond just 'healthy/unhealthy.'"
Bobby:
"Our role as a proxy often means we surface failures that originate upstream. We aim to make it clear when something is our issue versus something external—just like Cloudflare’s status messages for upstream errors."
Nick:
"Where should people go to learn more or troubleshoot their own health checks?"
Alex:
"Start with your company’s subject-matter experts, but the Kubernetes documentation is excellent. After that, study open-source projects with similar patterns and experiment in your own environment."
Bobby:
"We love working in the open and encourage users to check out our open core and enterprise solutions. If you have unique requirements or want to extend health checks, Kubernetes lets you add custom probes through CRDs or readiness gates."
Nick:
"Thanks to Bobby and Alex for sharing their expertise. If you're tackling health checks in Kubernetes or want to see how Pomerium approaches it, check out our documentation and join our open-core community. See you on the next live stream!"
Stay up to date with Pomerium news and announcements.
Embrace Seamless Resource Access, Robust Zero Trust Integration, and Streamlined Compliance with Our App.