Metrics Settings
This reference covers all of Pomerium's Metrics Settings:
- Metrics Address
- Metrics Basic Authentication
- Metrics Certificate
- Metrics Client Certificate Authority
Metrics Address
Metrics Address exposes a Prometheus endpoint on the specified port.
Use with caution: the endpoint can expose frontend and backend server names or addresses. Do not externally expose the metrics if this is sensitive information.
How to configure
- Core
- Enterprise
- Kubernetes
Pomerium Metrics Tracked
Each metric exposed by Pomerium has a pomerium
prefix, which is omitted in the table below for brevity.
Name | Type | Description |
---|---|---|
build_info | Gauge | Pomerium build metadata by git revision, service, version and go version |
config_checksum_int64 | Gauge | Currently loaded configuration checksum by service |
config_last_reload_success | Gauge | Whether the last configuration reload succeeded by service |
config_last_reload_success_timestamp | Gauge | The timestamp of the last successful configuration reload by service |
grpc_client_request_duration_ms | Histogram | GRPC client request duration by service |
grpc_client_request_size_bytes | Histogram | GRPC client request size by service |
grpc_client_requests_total | Counter | Total GRPC client requests made by service |
grpc_client_response_size_bytes | Histogram | GRPC client response size by service |
grpc_server_request_duration_ms | Histogram | GRPC server request duration by service |
grpc_server_request_size_bytes | Histogram | GRPC server request size by service |
grpc_server_requests_total | Counter | Total GRPC server requests made by service |
grpc_server_response_size_bytes | Histogram | GRPC server response size by service |
http_client_request_duration_ms | Histogram | HTTP client request duration by service |
http_client_request_size_bytes | Histogram | HTTP client request size by service |
http_client_requests_total | Counter | Total HTTP client requests made by service |
http_client_response_size_bytes | Histogram | HTTP client response size by service |
http_server_request_duration_ms | Histogram | HTTP server request duration by service |
http_server_request_size_bytes | Histogram | HTTP server request size by service |
http_server_requests_total | Counter | Total HTTP server requests handled by service |
http_server_response_size_bytes | Histogram | HTTP server response size by service |
storage_operation_duration_ms | Histogram | Storage operation duration by operation, result, backend and service |
Identity Manager
Identity manager metrics have a pomerium_identity_manager
prefix.
Name | Type | Description |
---|---|---|
last_refresh_timestamp | Gauge | Timestamp of last directory refresh operation. |
session_refresh_error_timestamp | Gauge | Timestamp of last session refresh ended in an error. |
session_refresh_errors | Counter | Session refresh error counter. |
session_refresh_success | Counter | Session refresh success counter. |
session_refresh_success_timestamp | Gauge | Timestamp of last successful session refresh. |
user_group_refresh_error_timestamp | Gauge | Timestamp of last user group refresh ended in an error. |
user_group_refresh_errors | Counter | User group refresh error counter. |
user_group_refresh_success | Counter | User group refresh success counter. |
user_group_refresh_success_timestamp | Gauge | Timestamp of last group successful user refresh. |
user_refresh_error_timestamp | Gauge | Timestamp of last user refresh ended in an error. |
user_refresh_errors | Counter | User refresh error counter. |
user_refresh_success | Counter | User refresh success counter. |
user_refresh_success_timestamp | Gauge | Timestamp of last successful user refresh. |
Envoy Proxy Metrics
As of v0.9
, Pomerium uses Envoy for the data plane. As such, proxy related metrics are sourced from Envoy, and use Envoy's internal stats data model. Please see Envoy's documentation for information about specific metrics.
All metrics coming from Envoy will be labeled with service="pomerium"
or service="pomerium-proxy"
, depending if you're running all-in-one or distributed service mode and have pomerium
prefix added to the standard envoy metric name.
See Configuration & Settings for more information configuration environments.
Metrics Basic Authentication
Metrics Basic Authentication requires Basic HTTP Authentication to access the metrics endpoint.
To support this in Prometheus, consult the basic_auth
option in the scrape_config
documentation.
How to configure
- Core
- Enterprise
- Kubernetes
Config file keys | Environment variables | Type | Usage |
---|---|---|---|
metrics_basic_authentication | METRICS_BASIC_AUTHENTICATION | string (base64 encoded) | optional |
Examples
# for username: x and password: y
metrics_basic_authentication: eDp5
# for username: x and password: y
METRICS_BASIC_AUTHENTICATION=eDp5
metrics_basic_authentication
is a bootstrap configuration setting and is not configurable in the Console.
Kubernetes does not support metrics_basic_authentication
Metrics Certificate
Metrics Certificate uses a certificate to secure the metrics endpoint.
A Certificate is an X.509 public-key and private-key pair.
All certificates supplied to Pomerium must be in PEM format.
How to configure
- Core
- Enterprise
- Kubernetes
Config file keys | Environment variables | Type | Usage |
---|---|---|---|
metrics_certificate and metrics_certificate_key | METRICS_CERTIFICATE and METRICS_CERTIFICATE_KEY | string | optional |
metrics_certificate_file and metrics_certificate_key_file | METRICS_CERTIFICATE_FILE and METRICS_CERTIFICATE_KEY_FILE | string | optional |
Examples
metrics_certificate: base64-encoded-string
metrics_certificate_key: base64-encoded-string
METRICS_CERTIFICATE_FILE=/relative/file/location
METRICS_CERTIFICATE_FILE_KEY=/relative/file/location
Metrics Certificate settings are bootstrap configuration settings and are not configurable in the Console.
Kubernetes does not support Metrics Certificate
Metrics Client Certificate Authority
Metrics Client Certificate Authority is the X.509 public-key used to validate mTLS client certificates for the metrics endpoint. If not set, no client certificate will be required.
How to configure
- Core
- Enterprise
- Kubernetes
Config file keys | Environment variables | Type | Usage |
---|---|---|---|
metrics_client_ca and metrics_client_ca_file | METRICS_CLIENT_CA and METRICS_CLIENT_CA_FILE | string | optional |
Examples
metrics_client_ca: base64-encoded-string
METRICS_CLIENT_CA_FILE=/relative/file/location
metrics_client_ca
and metrics_client_ca_file
are bootstrap configuration settings and are not configurable in the Console.
Kubernetes does not support Metrics Client Certificate Authority
Envoy Metrics
Pomerium leverages Envoy as its underlying proxy. Monitoring Envoy is crucial for understanding traffic load, upstream health, resource usage, and security filter performance.
Standard Labels
Envoy metrics are labeled with standard labels that provide context about the metric source and its environment. Here are some examples of Envoy metrics:
# TYPE envoy_cluster_ext_authz_error counter
envoy_cluster_ext_authz_error{service="pomerium-proxy",envoy_cluster_name="telemetry-team1-allowed-00008",installation_id="aecd6525-9eaa-448d-93d9-6363c04b1ccb",hostname="pomerium-proxy-55589cc5f-fjhsb"} 34
Each Envoy metric is prefixed with envoy_
. You may look up the additional metric documentation in the Envoy documentation.
The envoy_cluster_name
label identifies the upstream cluster of a Pomerium Route, that is set to:
name
property of the Route configured via the config file in Coreid
of the Route configured via the Zero ConsoleMetric Name
of the Route configured via the Pomerium Enterprise Console / API / Terraform.
The installation_id
identifies the unique Pomerium installation:
- In the vanilla Core, this label is not set
- In the Pomerium Enterprise, this label is set to the unique installation ID of the Pomerium instance that can be adjusted in the Console under Global Settings.
- In the Pomerium Zero, this label is set to the Cluster ID of the Pomerium Zero cluster, which is automatically generated and can be found in the Zero Console under Cluster Settings.
The hostname
label identifies the hostname of the instance, which is set to the unix hostname.
Downstream (Ingress) Metrics
These metrics provide insights into the traffic received by Envoy from clients. The HTTP connection manager that serves the end user requests has envoy_http_conn_manager_prefix="ingress"
label, and the listener that accepts connections has envoy_listener_address="0.0.0.0_443"
label.
Refer to Envoy's HTTP connection manager statistics documentation and listener statistics documentation for more details.
- Request Rate – Total incoming requests. In Envoy's metrics this is captured by counters like
envoy_http_downstream_rq_total
, often exposed per HTTP connection manager (listener). Userate(envoy_http_downstream_rq_total[1m])
to compute request rate. Also track breakdown by response code:downstream_rq_2xx
,4xx
,5xx
counters represent total responses in each class. For example, a risingdownstream_rq_5xx
indicates more server errors being returned. - Active HTTP Requests – Gauge for in-flight requests currently being processed. This is
downstream_rq_active
(per listener/connection manager). A high number here could indicate request backlog or slow processing. - Request Duration – Histogram of end-to-end request latency as observed by Envoy.
downstream_rq_time
measures the total time from request start to response completion in milliseconds. This can be used to calculate p95/p99 latency. - Active Connections – Gauge for open client connections to Envoy. Each listener exposes
downstream_cx_active
, representing currently active connections on that listener. This is crucial for capacity monitoring (e.g., if it approaches OS or Envoy limits). - Connection Counts -
downstream_cx_total
(counter) tracks total connections accepted, anddownstream_cx_destroy
tracks total connections closed. These can be used to compute connection churn or connections per second (CPS). For example, a spike in connection churn might indicate clients not reusing connections. - Connection Errors/Overflows –
downstream_cx_overflow
(counter) counts connections rejected because the listener's connection limit was reached.downstream_cx_overload_reject
counts connections rejected due to Envoy's overload manager actions. Any non-zero rates of these indicate the proxy is at or beyond capacity and dropping connections. - Listener draining -
envoy_listener_manager_total_filter_chains_draining
(gauge) indicates if Envoy is currently draining any listeners. This happens when new certificates are loaded that requires existing listeners to be gracefully shut down, potentially waiting for in-flight requests to complete. A non-zero value here indicates that Envoy is in the process of shutting down listeners.
Upstream (Pomerium Route) Metrics
These metrics describe Envoy's interactions with backend services (clusters), represented by Pomerium Routes. For more details, see Envoy's cluster manager statistics documentation.
- Cluster Health – Each upstream cluster (backend service) has gauges for healthy vs total endpoints.
membership_healthy
is the number of healthy endpoints in the cluster andmembership_total
is the total endpoints. The ratio indicates cluster health (e.g., if healthy drops below total, some endpoints are unhealthy). Endpoint-level health is rolled up in these metrics (Envoy doesn't expose per-endpoint health by default, just counts). - Upstream Request Success/Failure –
envoy_cluster_upstream_rq_total
counts requests Envoy has sent to each cluster.envoy_cluster_upstream_rq_2xx
counts successful requests (2xx responses).envoy_cluster_upstream_rq_4xx
counts client errors (4xx responses).envoy_cluster_upstream_rq_5xx
counts server errors (5xx responses).envoy_cluster_upstream_rq_error
counts all other errors (e.g., connection failures, timeouts). A high rate of5xx
orerror
responses indicates upstream issues.
- Upstream Request Latency
envoy_cluster_upstream_rq_time
is a histogram of time spent on an upstream request (including waiting and response) in milliseconds. High quantiles here mean the backend is responding slowly.
- Active/Pending Upstream Requests –
envoy_cluster_upstream_rq_active
is a gauge of current in-flight requests to a given cluster.envoy_cluster_upstream_rq_pending_active
is a gauge of requests queued waiting for a connection (if connection pooling is exhausted). Spikes in pending requests suggest the upstream or Envoy's connection pool is saturated and could indicate bottlenecks.
- Upstream Errors Key counters for failure modes include –
envoy_cluster_upstream_rq_timeout
(requests to upstream that timed out with no response)envoy_cluster_upstream_cx_none_healthy
(connection attempts failed due to no healthy hosts)envoy_cluster_upstream_rq_per_try_timeout
(per-retry timeouts). These should normally be zero; any significant values indicate upstream problems (e.g.,envoy_cluster_upstream_cx_none_healthy
means Envoy couldn't even find a healthy endpoint to connect).
- Retry/Circuit-breaker Metrics – If Envoy is hitting circuit breaker limits,
envoy_cluster_circuit_breakers_default_cx_open
gauge would be set to 1.- By default, Envoy is configured to allow a maximum of 1024 concurrent connections to each upstream cluster.
Server Metrics
These metrics relate to the Envoy process itself. Consult Envoy's server statistics documentation for further information.
- Memory Usage – Envoy provides gauges for memory in use.
envoy_server_memory_allocated
is the current heap allocated (bytes), andserver.memory_physical_size
is the physical memory used. Tracking these over time helps catch memory leaks or spikes.server.memory_heap_size
is the heap reserved (which may be larger than allocated). For example, a steadily risingmemory_allocated
gauge could indicate a leak or increasing load. - CPU Utilization - Envoy does not directly export CPU usage as a stat.
- Uptime and Threads -
envoy_server_uptime
(gauge) shows how long the Envoy process has been up (seconds).envoy_server_concurrency
is the number of worker threads Envoy is running (typically equals the number of CPU cores or threads allocated). These are mostly informational (uptime resets on restarts; concurrency is fixed at start). - Connection Counts (Overall) –
envoy_server_total_connections
is a gauge for total open connections (including any inherited from a hot restart, if applicable). This usually should match the sum of active connections on all listeners for a single instance. A very high number relative to expectations can warn of load or connection leaks.
Overload Manager
Pomerium implements a progressive overload protection system that takes specific actions at different memory usage thresholds:
- 85-95%: Gradually reduces HTTP connection idle timeouts by up to 50%
- 90%: Forces Envoy to shrink its heap every 10 seconds
- 90-98%: Starts resetting streams using the most memory; as usage increases, more streams become eligible
- 95%: Stops accepting new connections while maintaining existing ones
- 98%: Disables HTTP keepalive, prevents new HTTP/2 streams, and terminates existing HTTP/2 streams
- 99%: Drops all new requests
These actions are implemented through Envoy's overload manager and are designed to gracefully degrade service during high memory pressure situations while protecting system stability. Note that resource monitoring is only available in the cgroup environment where the limits are set (i.e. Kubernetes, Docker, systemd, etc.).
Authorization Metrics
Envoy calls ext_authz
Pomerium endpoint to perform authorization checks for incoming requests. Monitoring these metrics is crucial for understanding how many requests are being authorized or denied, and the performance of the Pomerium authorization service.
-
Monitoring Authorization Handler - the
pomerium-authorize
is the cluster name used for the authorization service, that exposes standard cluster metrics likeupstream_rq_time
,upstream_rq_total
, etc. You can monitor the performance of the authorization service by looking at these metrics, which will give you insights into how long it takes to authorize requests and how many requests are being processed. -
Overall Authorization Success/Failure/Errors - statistics across all clusters are reported on the HTTP connection manager level, that have
{envoy_http_conn_manager_prefix="ingress"}
label.envoy_http_ext_authz_ok
counts requests that were successfully authorized.envoy_http_ext_authz_denied
counts requests that were denied by the authorization service.envoy_http_ext_authz_error
counts errors encountered while contacting the authorization service (e.g., timeouts, connection failures).
These metrics are useful for understanding the overall health of the authorization service and how many requests are being processed successfully or denied.
-
Per Pomerium Route Authorization Metrics – For every Pomerium Route that uses the
ext_authz
filter, Envoy will emit metrics prefixed withenvoy_cluster_ext_authz_
. These metrics are labeled with the cluster name of the Route, and include:envoy_cluster_ext_authz_ok
– Number of requests successfully authorized.envoy_cluster_ext_authz_denied
– Number of requests denied by the authorization service.envoy_cluster_ext_authz_error
– Number of errors encountered while contacting the authorization service.
Example:
envoy_cluster_ext_authz_ok{service="pomerium-proxy",envoy_cluster_name="telemetry-team1-allowed-00008",installation_id="aecd6525-9eaa-448d-93d9-6363c04b1ccb",hostname="pomerium-proxy-55589cc5f-fjhsb"} 19767
These metrics allow you to monitor authorization performance and success rates on a per-Route basis.
Cache Metrics
Pomerium caches lookups to the databroker. The following metrics are reported for the cache:
pomerium_storage_global_cache_hits_total
: number of cache hitspomerium_storage_global_cache_misses_total
: number of cache missespomerium_storage_global_cache_invalidations_total
: number of cache invalidations
Database Metrics
Pomerium exposes metrics for its PostgreSQL connection pool, prefixed with pomerium_pgxpool_
. These metrics help monitor database connection usage, pool health, and performance.
Name | Type | Description |
---|---|---|
pgxpool_acquire_duration_nanoseconds_total | Counter | Total duration of all successful acquires from the pool in nanoseconds. |
pgxpool_acquired_connections | Gauge | Number of currently acquired (in-use) connections in the pool. |
pgxpool_acquires_total | Counter | Cumulative count of successful acquires from the pool. |
pgxpool_canceled_acquires_total | Counter | Cumulative count of acquires from the pool that were canceled by a context. |
pgxpool_constructing_connections_milliseconds | Gauge | Number of connections with construction in progress in the pool. |
pgxpool_empty_acquire_total | Counter | Cumulative count of successful acquires from the pool that waited for a resource to be released or constructed because the pool was empty. |
pgxpool_idle_connections | Gauge | Number of currently idle connections in the pool. |
pgxpool_max_connections | Gauge | Maximum size of the pool. |
pgxpool_max_idle_destroys_total | Counter | Cumulative count of connections destroyed because they exceeded MaxConnectionsIdleTime. |
pgxpool_max_lifetime_destroys_total | Counter | Cumulative count of connections destroyed because they exceeded MaxConnectionsLifetime. |
pgxpool_new_connections_total | Counter | Cumulative count of new connections opened. |
pgxpool_total_connections | Gauge | Total number of resources currently in the pool (sum of constructing, acquired, and idle connections). |
All metrics are labeled with details such as db_client_connection_pool_name
, db_system
, and hostname
.
Example Prometheus output:
# HELP pomerium_pgxpool_acquire_duration_nanoseconds_total Total duration of all successful acquires from the pool in nanoseconds.
# TYPE pomerium_pgxpool_acquire_duration_nanoseconds_total counter
pomerium_pgxpool_acquire_duration_nanoseconds_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="my-host"} 5.1058702e+07