Skip to main content

Metrics Settings

This reference covers all of Pomerium's Metrics Settings:

Metrics Address

Metrics Address exposes a Prometheus endpoint on the specified port.

warning

Use with caution: the endpoint can expose frontend and backend server names or addresses. Do not externally expose the metrics if this is sensitive information.

How to configure

Config file keysEnvironment variablesTypeUsageDefault
metrics_addressMETRICS_ADDRESSstringoptionaldisabled

Examples

metrics_address: :9090
METRICS_ADDRESS: 127.0.0.1:9090

Pomerium Metrics Tracked

Each metric exposed by Pomerium has a pomerium prefix, which is omitted in the table below for brevity.

NameTypeDescription
build_infoGaugePomerium build metadata by git revision, service, version and go version
config_checksum_int64GaugeCurrently loaded configuration checksum by service
config_last_reload_successGaugeWhether the last configuration reload succeeded by service
config_last_reload_success_timestampGaugeThe timestamp of the last successful configuration reload by service
grpc_client_request_duration_msHistogramGRPC client request duration by service
grpc_client_request_size_bytesHistogramGRPC client request size by service
grpc_client_requests_totalCounterTotal GRPC client requests made by service
grpc_client_response_size_bytesHistogramGRPC client response size by service
grpc_server_request_duration_msHistogramGRPC server request duration by service
grpc_server_request_size_bytesHistogramGRPC server request size by service
grpc_server_requests_totalCounterTotal GRPC server requests made by service
grpc_server_response_size_bytesHistogramGRPC server response size by service
http_client_request_duration_msHistogramHTTP client request duration by service
http_client_request_size_bytesHistogramHTTP client request size by service
http_client_requests_totalCounterTotal HTTP client requests made by service
http_client_response_size_bytesHistogramHTTP client response size by service
http_server_request_duration_msHistogramHTTP server request duration by service
http_server_request_size_bytesHistogramHTTP server request size by service
http_server_requests_totalCounterTotal HTTP server requests handled by service
http_server_response_size_bytesHistogramHTTP server response size by service
storage_operation_duration_msHistogramStorage operation duration by operation, result, backend and service

Identity Manager

Identity manager metrics have a pomerium_identity_manager prefix.

NameTypeDescription
last_refresh_timestampGaugeTimestamp of last directory refresh operation.
session_refresh_error_timestampGaugeTimestamp of last session refresh ended in an error.
session_refresh_errorsCounterSession refresh error counter.
session_refresh_successCounterSession refresh success counter.
session_refresh_success_timestampGaugeTimestamp of last successful session refresh.
user_group_refresh_error_timestampGaugeTimestamp of last user group refresh ended in an error.
user_group_refresh_errorsCounterUser group refresh error counter.
user_group_refresh_successCounterUser group refresh success counter.
user_group_refresh_success_timestampGaugeTimestamp of last group successful user refresh.
user_refresh_error_timestampGaugeTimestamp of last user refresh ended in an error.
user_refresh_errorsCounterUser refresh error counter.
user_refresh_successCounterUser refresh success counter.
user_refresh_success_timestampGaugeTimestamp of last successful user refresh.

Envoy Proxy Metrics

As of v0.9, Pomerium uses Envoy for the data plane. As such, proxy related metrics are sourced from Envoy, and use Envoy's internal stats data model. Please see Envoy's documentation for information about specific metrics.

All metrics coming from Envoy will be labeled with service="pomerium" or service="pomerium-proxy", depending if you're running all-in-one or distributed service mode and have pomerium prefix added to the standard envoy metric name.

See Configuration & Settings for more information configuration environments.

Metrics Basic Authentication

Metrics Basic Authentication requires Basic HTTP Authentication to access the metrics endpoint.

To support this in Prometheus, consult the basic_auth option in the scrape_config documentation.

How to configure

Config file keysEnvironment variablesTypeUsage
metrics_basic_authenticationMETRICS_BASIC_AUTHENTICATIONstring (base64 encoded)optional

Examples

# for username: x and password: y
metrics_basic_authentication: eDp5
# for username: x and password: y
METRICS_BASIC_AUTHENTICATION=eDp5

Metrics Certificate

Metrics Certificate uses a certificate to secure the metrics endpoint.

A Certificate is an X.509 public-key and private-key pair.

Note:

All certificates supplied to Pomerium must be in PEM format.

How to configure

Config file keysEnvironment variablesTypeUsage
metrics_certificate and metrics_certificate_keyMETRICS_CERTIFICATE and METRICS_CERTIFICATE_KEYstringoptional
metrics_certificate_file and metrics_certificate_key_fileMETRICS_CERTIFICATE_FILE and METRICS_CERTIFICATE_KEY_FILEstringoptional

Examples

metrics_certificate: base64-encoded-string
metrics_certificate_key: base64-encoded-string
METRICS_CERTIFICATE_FILE=/relative/file/location
METRICS_CERTIFICATE_FILE_KEY=/relative/file/location

Metrics Client Certificate Authority

Metrics Client Certificate Authority is the X.509 public-key used to validate mTLS client certificates for the metrics endpoint. If not set, no client certificate will be required.

How to configure

Config file keysEnvironment variablesTypeUsage
metrics_client_ca and metrics_client_ca_fileMETRICS_CLIENT_CA and METRICS_CLIENT_CA_FILEstringoptional

Examples

metrics_client_ca: base64-encoded-string
METRICS_CLIENT_CA_FILE=/relative/file/location

Envoy Metrics

Pomerium leverages Envoy as its underlying proxy. Monitoring Envoy is crucial for understanding traffic load, upstream health, resource usage, and security filter performance.

Standard Labels

Envoy metrics are labeled with standard labels that provide context about the metric source and its environment. Here are some examples of Envoy metrics:

# TYPE envoy_cluster_ext_authz_error counter
envoy_cluster_ext_authz_error{service="pomerium-proxy",envoy_cluster_name="telemetry-team1-allowed-00008",installation_id="aecd6525-9eaa-448d-93d9-6363c04b1ccb",hostname="pomerium-proxy-55589cc5f-fjhsb"} 34

Each Envoy metric is prefixed with envoy_. You may look up the additional metric documentation in the Envoy documentation.

The envoy_cluster_name label identifies the upstream cluster of a Pomerium Route, that is set to:

  • name property of the Route configured via the config file in Core
  • id of the Route configured via the Zero Console
  • Metric Name of the Route configured via the Pomerium Enterprise Console / API / Terraform.

The installation_id identifies the unique Pomerium installation:

  • In the vanilla Core, this label is not set
  • In the Pomerium Enterprise, this label is set to the unique installation ID of the Pomerium instance that can be adjusted in the Console under Global Settings.
  • In the Pomerium Zero, this label is set to the Cluster ID of the Pomerium Zero cluster, which is automatically generated and can be found in the Zero Console under Cluster Settings.

The hostname label identifies the hostname of the instance, which is set to the unix hostname.

Downstream (Ingress) Metrics

These metrics provide insights into the traffic received by Envoy from clients. The HTTP connection manager that serves the end user requests has envoy_http_conn_manager_prefix="ingress" label, and the listener that accepts connections has envoy_listener_address="0.0.0.0_443" label.

Refer to Envoy's HTTP connection manager statistics documentation and listener statistics documentation for more details.

  • Request Rate – Total incoming requests. In Envoy's metrics this is captured by counters like envoy_http_downstream_rq_total, often exposed per HTTP connection manager (listener). Use rate(envoy_http_downstream_rq_total[1m]) to compute request rate. Also track breakdown by response code: downstream_rq_2xx, 4xx, 5xx counters represent total responses in each class. For example, a rising downstream_rq_5xx indicates more server errors being returned.
  • Active HTTP Requests – Gauge for in-flight requests currently being processed. This is downstream_rq_active (per listener/connection manager). A high number here could indicate request backlog or slow processing.
  • Request Duration – Histogram of end-to-end request latency as observed by Envoy. downstream_rq_time measures the total time from request start to response completion in milliseconds. This can be used to calculate p95/p99 latency.
  • Active Connections – Gauge for open client connections to Envoy. Each listener exposes downstream_cx_active, representing currently active connections on that listener. This is crucial for capacity monitoring (e.g., if it approaches OS or Envoy limits).
  • Connection Counts - downstream_cx_total (counter) tracks total connections accepted, and downstream_cx_destroy tracks total connections closed. These can be used to compute connection churn or connections per second (CPS). For example, a spike in connection churn might indicate clients not reusing connections.
  • Connection Errors/Overflowsdownstream_cx_overflow (counter) counts connections rejected because the listener's connection limit was reached. downstream_cx_overload_reject counts connections rejected due to Envoy's overload manager actions. Any non-zero rates of these indicate the proxy is at or beyond capacity and dropping connections.
  • Listener draining - envoy_listener_manager_total_filter_chains_draining (gauge) indicates if Envoy is currently draining any listeners. This happens when new certificates are loaded that requires existing listeners to be gracefully shut down, potentially waiting for in-flight requests to complete. A non-zero value here indicates that Envoy is in the process of shutting down listeners.

Upstream (Pomerium Route) Metrics

These metrics describe Envoy's interactions with backend services (clusters), represented by Pomerium Routes. For more details, see Envoy's cluster manager statistics documentation.

  • Cluster Health – Each upstream cluster (backend service) has gauges for healthy vs total endpoints. membership_healthy is the number of healthy endpoints in the cluster and membership_total is the total endpoints. The ratio indicates cluster health (e.g., if healthy drops below total, some endpoints are unhealthy). Endpoint-level health is rolled up in these metrics (Envoy doesn't expose per-endpoint health by default, just counts).
  • Upstream Request Success/Failureenvoy_cluster_upstream_rq_total counts requests Envoy has sent to each cluster.
    • envoy_cluster_upstream_rq_2xx counts successful requests (2xx responses).
    • envoy_cluster_upstream_rq_4xx counts client errors (4xx responses).
    • envoy_cluster_upstream_rq_5xx counts server errors (5xx responses).
    • envoy_cluster_upstream_rq_error counts all other errors (e.g., connection failures, timeouts). A high rate of 5xx or error responses indicates upstream issues.
  • Upstream Request Latency
    • envoy_cluster_upstream_rq_time is a histogram of time spent on an upstream request (including waiting and response) in milliseconds. High quantiles here mean the backend is responding slowly.
  • Active/Pending Upstream Requestsenvoy_cluster_upstream_rq_active is a gauge of current in-flight requests to a given cluster.
    • envoy_cluster_upstream_rq_pending_active is a gauge of requests queued waiting for a connection (if connection pooling is exhausted). Spikes in pending requests suggest the upstream or Envoy's connection pool is saturated and could indicate bottlenecks.
  • Upstream Errors Key counters for failure modes include – envoy_cluster_upstream_rq_timeout (requests to upstream that timed out with no response)
    • envoy_cluster_upstream_cx_none_healthy (connection attempts failed due to no healthy hosts)
    • envoy_cluster_upstream_rq_per_try_timeout (per-retry timeouts). These should normally be zero; any significant values indicate upstream problems (e.g., envoy_cluster_upstream_cx_none_healthy means Envoy couldn't even find a healthy endpoint to connect).
  • Retry/Circuit-breaker Metrics – If Envoy is hitting circuit breaker limits, envoy_cluster_circuit_breakers_default_cx_open gauge would be set to 1.
    • By default, Envoy is configured to allow a maximum of 1024 concurrent connections to each upstream cluster.

Server Metrics

These metrics relate to the Envoy process itself. Consult Envoy's server statistics documentation for further information.

  • Memory Usage – Envoy provides gauges for memory in use. envoy_server_memory_allocated is the current heap allocated (bytes), and server.memory_physical_size is the physical memory used. Tracking these over time helps catch memory leaks or spikes. server.memory_heap_size is the heap reserved (which may be larger than allocated). For example, a steadily rising memory_allocated gauge could indicate a leak or increasing load.
  • CPU Utilization - Envoy does not directly export CPU usage as a stat.
  • Uptime and Threads - envoy_server_uptime (gauge) shows how long the Envoy process has been up (seconds). envoy_server_concurrency is the number of worker threads Envoy is running (typically equals the number of CPU cores or threads allocated). These are mostly informational (uptime resets on restarts; concurrency is fixed at start).
  • Connection Counts (Overall)envoy_server_total_connections is a gauge for total open connections (including any inherited from a hot restart, if applicable). This usually should match the sum of active connections on all listeners for a single instance. A very high number relative to expectations can warn of load or connection leaks.

Overload Manager

Pomerium implements a progressive overload protection system that takes specific actions at different memory usage thresholds:

  • 85-95%: Gradually reduces HTTP connection idle timeouts by up to 50%
  • 90%: Forces Envoy to shrink its heap every 10 seconds
  • 90-98%: Starts resetting streams using the most memory; as usage increases, more streams become eligible
  • 95%: Stops accepting new connections while maintaining existing ones
  • 98%: Disables HTTP keepalive, prevents new HTTP/2 streams, and terminates existing HTTP/2 streams
  • 99%: Drops all new requests

These actions are implemented through Envoy's overload manager and are designed to gracefully degrade service during high memory pressure situations while protecting system stability. Note that resource monitoring is only available in the cgroup environment where the limits are set (i.e. Kubernetes, Docker, systemd, etc.).

Authorization Metrics

Envoy calls ext_authz Pomerium endpoint to perform authorization checks for incoming requests. Monitoring these metrics is crucial for understanding how many requests are being authorized or denied, and the performance of the Pomerium authorization service.

  • Monitoring Authorization Handler - the pomerium-authorize is the cluster name used for the authorization service, that exposes standard cluster metrics like upstream_rq_time, upstream_rq_total, etc. You can monitor the performance of the authorization service by looking at these metrics, which will give you insights into how long it takes to authorize requests and how many requests are being processed.

  • Overall Authorization Success/Failure/Errors - statistics across all clusters are reported on the HTTP connection manager level, that have {envoy_http_conn_manager_prefix="ingress"} label.

    • envoy_http_ext_authz_ok counts requests that were successfully authorized.
    • envoy_http_ext_authz_denied counts requests that were denied by the authorization service.
    • envoy_http_ext_authz_error counts errors encountered while contacting the authorization service (e.g., timeouts, connection failures).

    These metrics are useful for understanding the overall health of the authorization service and how many requests are being processed successfully or denied.

  • Per Pomerium Route Authorization Metrics – For every Pomerium Route that uses the ext_authz filter, Envoy will emit metrics prefixed with envoy_cluster_ext_authz_. These metrics are labeled with the cluster name of the Route, and include:

    • envoy_cluster_ext_authz_ok – Number of requests successfully authorized.
    • envoy_cluster_ext_authz_denied – Number of requests denied by the authorization service.
    • envoy_cluster_ext_authz_error – Number of errors encountered while contacting the authorization service.

    Example:

    envoy_cluster_ext_authz_ok{service="pomerium-proxy",envoy_cluster_name="telemetry-team1-allowed-00008",installation_id="aecd6525-9eaa-448d-93d9-6363c04b1ccb",hostname="pomerium-proxy-55589cc5f-fjhsb"} 19767

    These metrics allow you to monitor authorization performance and success rates on a per-Route basis.

Cache Metrics

Pomerium caches lookups to the databroker. The following metrics are reported for the cache:

  • pomerium_storage_global_cache_hits_total: number of cache hits
  • pomerium_storage_global_cache_misses_total: number of cache misses
  • pomerium_storage_global_cache_invalidations_total: number of cache invalidations

Database Metrics

Pomerium exposes metrics for its PostgreSQL connection pool, prefixed with pomerium_pgxpool_. These metrics help monitor database connection usage, pool health, and performance.

NameTypeDescription
pgxpool_acquire_duration_nanoseconds_totalCounterTotal duration of all successful acquires from the pool in nanoseconds.
pgxpool_acquired_connectionsGaugeNumber of currently acquired (in-use) connections in the pool.
pgxpool_acquires_totalCounterCumulative count of successful acquires from the pool.
pgxpool_canceled_acquires_totalCounterCumulative count of acquires from the pool that were canceled by a context.
pgxpool_constructing_connections_millisecondsGaugeNumber of connections with construction in progress in the pool.
pgxpool_empty_acquire_totalCounterCumulative count of successful acquires from the pool that waited for a resource to be released or constructed because the pool was empty.
pgxpool_idle_connectionsGaugeNumber of currently idle connections in the pool.
pgxpool_max_connectionsGaugeMaximum size of the pool.
pgxpool_max_idle_destroys_totalCounterCumulative count of connections destroyed because they exceeded MaxConnectionsIdleTime.
pgxpool_max_lifetime_destroys_totalCounterCumulative count of connections destroyed because they exceeded MaxConnectionsLifetime.
pgxpool_new_connections_totalCounterCumulative count of new connections opened.
pgxpool_total_connectionsGaugeTotal number of resources currently in the pool (sum of constructing, acquired, and idle connections).

All metrics are labeled with details such as db_client_connection_pool_name, db_system, and hostname.

Example Prometheus output:

# HELP pomerium_pgxpool_acquire_duration_nanoseconds_total Total duration of all successful acquires from the pool in nanoseconds.
# TYPE pomerium_pgxpool_acquire_duration_nanoseconds_total counter
pomerium_pgxpool_acquire_duration_nanoseconds_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="my-host"} 5.1058702e+07

Key Envoy Documentation Sources:

Feedback