第八章:生产实践¶
本章总结 Istio 生产环境的最佳实践和常见问题处理。
资源规划¶
控制平面资源¶
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
components:
pilot:
k8s:
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 4000m
memory: 8Gi
hpaSpec:
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
数据平面资源¶
# Sidecar 资源配置
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-sidecar-injector
namespace: istio-system
data:
values: |
global:
proxy:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 2000m
memory: 1Gi
Ingress Gateway 资源¶
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
components:
ingressGateways:
- name: istio-ingressgateway
k8s:
replicas: 3
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 2000m
memory: 2Gi
hpaSpec:
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
高可用配置¶
Istiod 高可用¶
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
components:
pilot:
k8s:
replicas: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: istiod
topologyKey: kubernetes.io/hostname
Gateway 高可用¶
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
components:
ingressGateways:
- name: istio-ingressgateway
k8s:
replicas: 3
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: istio-ingressgateway
topologyKey: topology.kubernetes.io/zone
性能调优¶
连接池优化¶
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: my-service
spec:
host: my-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1000
connectTimeout: 5s
tcpKeepalive:
time: 7200s
interval: 75s
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 1000
http2MaxRequests: 10000
maxRequestsPerConnection: 100
代理资源限制¶
apiVersion: v1
kind: ConfigMap
metadata:
name: istio
namespace: istio-system
data:
mesh: |-
defaultConfig:
proxyStatsMatcher:
inclusionRegexps:
- ".*"
concurrency: 2 # 代理线程数
禁用不需要的功能¶
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
profile: default
meshConfig:
defaultConfig:
proxyMetadata:
OUTPUT_CERTS: "false"
# 禁用访问日志(高性能场景)
accessLogFile: ""
# 减少指标
enablePrometheusMerge: false
安全加固¶
最小化权限¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: istiod
namespace: istio-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: istiod-clusterrole
rules:
- apiGroups: [""]
resources: ["pods", "services", "endpoints"]
verbs: ["get", "list", "watch"]
# 仅授予必要权限
启用 mTLS¶
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
网络策略¶
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: istiod-policy
namespace: istio-system
spec:
podSelector:
matchLabels:
app: istiod
ingress:
- from:
- namespaceSelector:
matchLabels:
name: istio-system
- podSelector:
matchLabels:
istio: ingressgateway
监控告警¶
Prometheus 告警规则¶
groups:
- name: istio-alerts
rules:
- alert: IstiodHighMemory
expr: container_memory_working_set_bytes{container="discovery"} > 4Gi
for: 5m
labels:
severity: warning
annotations:
summary: "Istiod memory usage high"
- alert: PilotPushErrors
expr: rate(pilot_xds_push_errors[5m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pilot push errors detected"
- alert: EnvoyClusterUpstreamCxOverflow
expr: rate(envoy_cluster_upstream_cx_overflow[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Connection pool overflow"
Grafana 仪表板¶
关键指标面板:
- 控制平面
- Istiod 内存使用
- XDS 推送延迟
-
连接的代理数量
-
数据平面
- 请求速率
- P50/P99 延迟
- 错误率
-
连接池使用率
-
Gateway
- 入站流量
- TLS 握手
- 连接数
故障排查¶
常用命令¶
# 检查配置
istioctl analyze
# 检查代理状态
istioctl proxy-status
# 检查代理配置
istioctl proxy-config cluster <pod>
istioctl proxy-config route <pod>
istioctl proxy-config listener <pod>
# 检查认证
istioctl authn tls-check <pod>
# 查看日志
kubectl logs <pod> -c istio-proxy
常见问题¶
1. Sidecar 注入失败
# 检查命名空间标签
kubectl get namespace -L istio-injection
# 检查注入器日志
kubectl logs -n istio-system -l app=sidecar-injector
2. mTLS 问题
3. 路由问题
# 检查路由配置
istioctl proxy-config route <pod> -o json
# 检查集群配置
istioctl proxy-config cluster <pod> -o json
4. 性能问题
升级策略¶
金丝雀升级¶
# 安装新版本
istioctl install --set revision=1-21-0
# 逐步迁移
kubectl label namespace my-ns istio.io/rev=1-21-0
# 验证后清理旧版本
istioctl uninstall --revision=1-20-0
回滚¶
# 切换回旧版本
kubectl label namespace my-ns istio.io/rev=1-20-0
# 重启 Pod
kubectl rollout restart deployment -n my-ns
最佳实践总结¶
1. 资源配置¶
- Istiod 至少 2 副本
- Gateway 至少 3 副本
- 合理设置资源限制
2. 安全配置¶
- 启用严格 mTLS
- 配置授权策略
- 最小权限原则
3. 可观测性¶
- 收集指标、日志、追踪
- 配置告警规则
- 定期检查配置
4. 流量管理¶
- 合理设置超时
- 配置重试策略
- 启用熔断器
5. 运维¶
- 使用金丝雀升级
- 定期备份配置
- 建立故障恢复流程
小结¶
Istio 生产部署需要关注:
- 资源规划:合理配置控制平面和数据平面资源
- 高可用:多副本、反亲和性
- 性能调优:连接池、代理优化
- 安全加固:mTLS、授权策略
- 监控告警:完整的可观测性
- 故障排查:常用命令和问题解决
完成本教程后,你应该能够在生产环境中部署和管理 Istio 服务网格。