Health check

  • Health checking architecture overview.

  • If health checking is configured for a cluster, additional statistics are emitted. They are documented here.

core.HealthCheck

[core.HealthCheck proto]

{
  "timeout": "{...}",
  "interval": "{...}",
  "initial_jitter": "{...}",
  "interval_jitter": "{...}",
  "interval_jitter_percent": "...",
  "unhealthy_threshold": "{...}",
  "healthy_threshold": "{...}",
  "reuse_connection": "{...}",
  "http_health_check": "{...}",
  "tcp_health_check": "{...}",
  "grpc_health_check": "{...}",
  "custom_health_check": "{...}",
  "no_traffic_interval": "{...}",
  "unhealthy_interval": "{...}",
  "unhealthy_edge_interval": "{...}",
  "healthy_edge_interval": "{...}",
  "event_log_path": "...",
  "always_log_health_check_failures": "..."
}
timeout

(Duration) The time to wait for a health check response. If the timeout is reached the health check attempt will be considered a failure.

interval

(Duration) The interval between health checks.

initial_jitter

(Duration) An optional jitter amount in milliseconds. If specified, Envoy will start health checking after for a random time in ms between 0 and initial_jitter. This only applies to the first health check.

interval_jitter

(Duration) An optional jitter amount in milliseconds. If specified, during every interval Envoy will add interval_jitter to the wait time.

interval_jitter_percent

(uint32) An optional jitter amount as a percentage of interval_ms. If specified, during every interval Envoy will add interval_ms * interval_jitter_percent / 100 to the wait time.

If interval_jitter_ms and interval_jitter_percent are both set, both of them will be used to increase the wait time.

unhealthy_threshold

(UInt32Value) The number of unhealthy health checks required before a host is marked unhealthy. Note that for http health checking if a host responds with 503 this threshold is ignored and the host is considered unhealthy immediately.

healthy_threshold

(UInt32Value) The number of healthy health checks required before a host is marked healthy. Note that during startup, only a single successful health check is required to mark a host healthy.

reuse_connection

(BoolValue) Reuse health check connection between health checks. Default is true.

http_health_check

(core.HealthCheck.HttpHealthCheck) HTTP health check.

Precisely one of http_health_check, tcp_health_check, grpc_health_check, custom_health_check must be set.

tcp_health_check

(core.HealthCheck.TcpHealthCheck) TCP health check.

Precisely one of http_health_check, tcp_health_check, grpc_health_check, custom_health_check must be set.

grpc_health_check

(core.HealthCheck.GrpcHealthCheck) gRPC health check.

Precisely one of http_health_check, tcp_health_check, grpc_health_check, custom_health_check must be set.

custom_health_check

(core.HealthCheck.CustomHealthCheck) Custom health check.

Precisely one of http_health_check, tcp_health_check, grpc_health_check, custom_health_check must be set.

no_traffic_interval

(Duration) The “no traffic interval” is a special health check interval that is used when a cluster has never had traffic routed to it. This lower interval allows cluster information to be kept up to date, without sending a potentially large amount of active health checking traffic for no reason. Once a cluster has been used for traffic routing, Envoy will shift back to using the standard health check interval that is defined. Note that this interval takes precedence over any other.

The default value for “no traffic interval” is 60 seconds.

unhealthy_interval

(Duration) The “unhealthy interval” is a health check interval that is used for hosts that are marked as unhealthy. As soon as the host is marked as healthy, Envoy will shift back to using the standard health check interval that is defined.

The default value for “unhealthy interval” is the same as “interval”.

unhealthy_edge_interval

(Duration) The “unhealthy edge interval” is a special health check interval that is used for the first health check right after a host is marked as unhealthy. For subsequent health checks Envoy will shift back to using either “unhealthy interval” if present or the standard health check interval that is defined.

The default value for “unhealthy edge interval” is the same as “unhealthy interval”.

healthy_edge_interval

(Duration) The “healthy edge interval” is a special health check interval that is used for the first health check right after a host is marked as healthy. For subsequent health checks Envoy will shift back to using the standard health check interval that is defined.

The default value for “healthy edge interval” is the same as the default interval.

event_log_path

(string) Specifies the path to the health check event log. If empty, no event log will be written.

always_log_health_check_failures

(bool) If set to true, health check failure events will always be logged. If set to false, only the initial health check failure event will be logged. The default value is false.

core.HealthCheck.Payload

[core.HealthCheck.Payload proto]

Describes the encoding of the payload bytes in the payload.

{
  "text": "..."
}
text

(string, REQUIRED) Hex encoded payload. E.g., “000000FF”.

core.HealthCheck.HttpHealthCheck

[core.HealthCheck.HttpHealthCheck proto]

{
  "host": "...",
  "path": "...",
  "service_name": "...",
  "request_headers_to_add": [],
  "request_headers_to_remove": [],
  "use_http2": "...",
  "expected_statuses": [],
  "codec_client_type": "..."
}
host

(string) The value of the host header in the HTTP health check request. If left empty (default value), the name of the cluster this health check is associated with will be used.

path

(string, REQUIRED) Specifies the HTTP path that will be requested during health checking. For example /healthcheck.

service_name

(string) An optional service name parameter which is used to validate the identity of the health checked cluster. See the architecture overview for more information.

request_headers_to_add

(core.HeaderValueOption) Specifies a list of HTTP headers that should be added to each request that is sent to the health checked cluster. For more information, including details on header value syntax, see the documentation on custom request headers.

request_headers_to_remove

(string) Specifies a list of HTTP headers that should be removed from each request that is sent to the health checked cluster.

use_http2

(bool) If set, health checks will be made using http/2. Deprecated, use codec_client_type instead.

expected_statuses

(type.Int64Range) Specifies a list of HTTP response statuses considered healthy. If provided, replaces default 200-only policy - 200 must be included explicitly as needed. Ranges follow half-open semantics of Int64Range.

codec_client_type

(type.CodecClientType) Use specified application protocol for health checks. This is to replace use_http2 in light of HTTP3.

core.HealthCheck.TcpHealthCheck

[core.HealthCheck.TcpHealthCheck proto]

{
  "send": "{...}",
  "receive": []
}
send

(core.HealthCheck.Payload) Empty payloads imply a connect-only health check.

receive

(core.HealthCheck.Payload) When checking the response, “fuzzy” matching is performed such that each binary block must be found, and in the order specified, but not necessarily contiguous.

core.HealthCheck.RedisHealthCheck

[core.HealthCheck.RedisHealthCheck proto]

{
  "key": "..."
}
key

(string) If set, optionally perform EXISTS <key> instead of PING. A return value from Redis of 0 (does not exist) is considered a passing healthcheck. A return value other than 0 is considered a failure. This allows the user to mark a Redis instance for maintenance by setting the specified key to any value and waiting for traffic to drain.

core.HealthCheck.GrpcHealthCheck

[core.HealthCheck.GrpcHealthCheck proto]

grpc.health.v1.Health-based healthcheck. See gRPC doc for details.

{
  "service_name": "...",
  "authority": "..."
}
service_name

(string) An optional service name parameter which will be sent to gRPC service in grpc.health.v1.HealthCheckRequest. message. See gRPC health-checking overview for more information.

authority

(string) The value of the :authority header in the gRPC health check request. If left empty (default value), the name of the cluster this health check is associated with will be used.

core.HealthCheck.CustomHealthCheck

[core.HealthCheck.CustomHealthCheck proto]

Custom health check.

{
  "name": "...",
  "config": "{...}",
  "typed_config": "{...}"
}
name

(string, REQUIRED) The registered name of the custom health checker.

config

(Struct) A custom health checker specific configuration which depends on the custom health checker being instantiated. See envoy/config/health_checker for reference.

Only one of config, typed_config may be set.

typed_config

(Any) A custom health checker specific configuration which depends on the custom health checker being instantiated. See envoy/config/health_checker for reference.

Only one of config, typed_config may be set.

Enum core.HealthStatus

[core.HealthStatus proto]

Endpoint health status.

UNKNOWN

(DEFAULT) ⁣The health status is not known. This is interpreted by Envoy as HEALTHY.

HEALTHY

⁣Healthy.

UNHEALTHY

⁣Unhealthy.

DRAINING

⁣Connection draining in progress. E.g., https://aws.amazon.com/blogs/aws/elb-connection-draining-remove-instances-from-service-with-care/ or https://cloud.google.com/compute/docs/load-balancing/enabling-connection-draining. This is interpreted by Envoy as UNHEALTHY.

TIMEOUT

⁣Health check timed out. This is part of HDS and is interpreted by Envoy as UNHEALTHY.

DEGRADED

⁣Degraded.