第六章:健康检查¶
健康检查类型¶
HTTP 检查¶
# HTTP 健康检查
check {
id = "http-check"
name = "HTTP Health Check"
http = "http://localhost:8080/health"
interval = "10s"
timeout = "1s"
deregister_critical_service_after = "30s"
}
TCP 检查¶
# TCP 健康检查
check {
id = "tcp-check"
name = "TCP Health Check"
tcp = "localhost:8080"
interval = "10s"
timeout = "1s"
}
TTL 检查¶
# TTL 健康检查
check {
id = "ttl-check"
name = "TTL Health Check"
ttl = "30s"
deregister_critical_service_after = "30s"
}
gRPC 检查¶
# gRPC 健康检查
check {
id = "grpc-check"
name = "gRPC Health Check"
grpc = "localhost:50051"
grpc_use_tls = false
interval = "10s"
}
Docker 检查¶
# Docker 健康检查
check {
id = "docker-check"
name = "Docker Health Check"
docker_container_id = "container-id"
shell = "/bin/bash"
script = "/health-check.sh"
interval = "10s"
}
服务健康检查¶
配置文件¶
# service.hcl
service {
name = "user-service"
id = "user-service-1"
address = "192.168.1.10"
port = 8080
check {
id = "user-service-http"
name = "HTTP Health Check"
http = "http://192.168.1.10:8080/health"
interval = "10s"
timeout = "1s"
}
check {
id = "user-service-ttl"
name = "TTL Health Check"
ttl = "30s"
}
}
HTTP API 注册¶
# 注册带健康检查的服务
curl -X PUT http://localhost:8500/v1/agent/service/register -d '{
"Name": "user-service",
"ID": "user-service-1",
"Address": "192.168.1.10",
"Port": 8080,
"Check": {
"HTTP": "http://192.168.1.10:8080/health",
"Interval": "10s",
"Timeout": "1s"
}
}'
TTL 检查使用¶
# 注册 TTL 检查
curl -X PUT http://localhost:8500/v1/agent/service/register -d '{
"Name": "user-service",
"ID": "user-service-1",
"Check": {
"TTL": "30s"
}
}'
# 定期发送心跳
curl -X PUT http://localhost:8500/v1/agent/check/pass/user-service-1:ttl
# 标记失败
curl -X PUT http://localhost:8500/v1/agent/check/fail/user-service-1:ttl
查询健康状态¶
HTTP API¶
# 查询所有健康检查
curl http://localhost:8500/v1/agent/checks
# 查询特定服务的健康状态
curl http://localhost:8500/v1/health/service/user-service
# 只查询健康的服务
curl http://localhost:8500/v1/health/service/user-service?passing
# 查询节点健康状态
curl http://localhost:8500/v1/health/node/node1
CLI 查询¶
# 查看所有检查
consul catalog services
# 查看服务健康状态
consul catalog services -tags
# 检查服务健康
curl http://localhost:8500/v1/health/service/user-service?passing | jq
健康检查配置¶
检查参数¶
check {
# 检查间隔
interval = "10s"
# 超时时间
timeout = "1s"
# 初始延迟
# initial_status = "critical"
# 失败后注销时间
deregister_critical_service_after = "30s"
# 成功阈值
success_before_passing = 1
# 失败阈值
failures_before_critical = 3
}
服务定义¶
service {
name = "user-service"
# 多个健康检查
check {
id = "http-check"
http = "http://localhost:8080/health"
interval = "10s"
}
check {
id = "tcp-check"
tcp = "localhost:8080"
interval = "5s"
}
}
健康检查最佳实践¶
1. 检查端点设计¶
# /health 端点
@app.route('/health')
def health():
# 检查数据库连接
if not check_database():
return 'Database unavailable', 503
# 检查缓存连接
if not check_redis():
return 'Redis unavailable', 503
return 'OK', 200
# /ready 端点(Kubernetes)
@app.route('/ready')
def ready():
# 检查服务是否准备好接收流量
if not is_ready():
return 'Not ready', 503
return 'OK', 200
2. 检查间隔设置¶
3. 失败处理¶
# 配置失败阈值
check {
http = "http://localhost:8080/health"
interval = "10s"
# 连续失败 3 次才标记为 critical
failures_before_critical = 3
# 成功 1 次就标记为 passing
success_before_passing = 1
# critical 状态 30 秒后注销服务
deregister_critical_service_after = "30s"
}
小结¶
健康检查要点:
- 检查类型:HTTP、TCP、TTL、gRPC、Docker
- 服务检查:配置文件、HTTP API
- 查询状态:HTTP API、CLI
- 配置参数:间隔、超时、阈值
- 最佳实践:端点设计、间隔设置、失败处理
下一章我们将学习集群部署。