跳转至

第八章:生产实践

本章总结 Istio 生产环境的最佳实践和常见问题处理。

资源规划

控制平面资源

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    pilot:
      k8s:
        resources:
          requests:
            cpu: 500m
            memory: 2Gi
          limits:
            cpu: 4000m
            memory: 8Gi
        hpaSpec:
          minReplicas: 2
          maxReplicas: 5
          metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 80

数据平面资源

# Sidecar 资源配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-sidecar-injector
  namespace: istio-system
data:
  values: |
    global:
      proxy:
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 2000m
            memory: 1Gi

Ingress Gateway 资源

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    ingressGateways:
    - name: istio-ingressgateway
      k8s:
        replicas: 3
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 2000m
            memory: 2Gi
        hpaSpec:
          minReplicas: 3
          maxReplicas: 10
          metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 60

高可用配置

Istiod 高可用

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    pilot:
      k8s:
        replicas: 3
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: istiod
              topologyKey: kubernetes.io/hostname

Gateway 高可用

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    ingressGateways:
    - name: istio-ingressgateway
      k8s:
        replicas: 3
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: istio-ingressgateway
                topologyKey: topology.kubernetes.io/zone

性能调优

连接池优化

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: my-service
spec:
  host: my-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1000
        connectTimeout: 5s
        tcpKeepalive:
          time: 7200s
          interval: 75s
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 1000
        http2MaxRequests: 10000
        maxRequestsPerConnection: 100

代理资源限制

apiVersion: v1
kind: ConfigMap
metadata:
  name: istio
  namespace: istio-system
data:
  mesh: |-
    defaultConfig:
      proxyStatsMatcher:
        inclusionRegexps:
        - ".*"
      concurrency: 2  # 代理线程数

禁用不需要的功能

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: default
  meshConfig:
    defaultConfig:
      proxyMetadata:
        OUTPUT_CERTS: "false"
    # 禁用访问日志(高性能场景)
    accessLogFile: ""
    # 减少指标
    enablePrometheusMerge: false

安全加固

最小化权限

apiVersion: v1
kind: ServiceAccount
metadata:
  name: istiod
  namespace: istio-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: istiod-clusterrole
rules:
- apiGroups: [""]
  resources: ["pods", "services", "endpoints"]
  verbs: ["get", "list", "watch"]
# 仅授予必要权限

启用 mTLS

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

网络策略

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: istiod-policy
  namespace: istio-system
spec:
  podSelector:
    matchLabels:
      app: istiod
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: istio-system
    - podSelector:
        matchLabels:
          istio: ingressgateway

监控告警

Prometheus 告警规则

groups:
- name: istio-alerts
  rules:
  - alert: IstiodHighMemory
    expr: container_memory_working_set_bytes{container="discovery"} > 4Gi
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Istiod memory usage high"

  - alert: PilotPushErrors
    expr: rate(pilot_xds_push_errors[5m]) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Pilot push errors detected"

  - alert: EnvoyClusterUpstreamCxOverflow
    expr: rate(envoy_cluster_upstream_cx_overflow[5m]) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Connection pool overflow"

Grafana 仪表板

关键指标面板:

  1. 控制平面
  2. Istiod 内存使用
  3. XDS 推送延迟
  4. 连接的代理数量

  5. 数据平面

  6. 请求速率
  7. P50/P99 延迟
  8. 错误率
  9. 连接池使用率

  10. Gateway

  11. 入站流量
  12. TLS 握手
  13. 连接数

故障排查

常用命令

# 检查配置
istioctl analyze

# 检查代理状态
istioctl proxy-status

# 检查代理配置
istioctl proxy-config cluster <pod>
istioctl proxy-config route <pod>
istioctl proxy-config listener <pod>

# 检查认证
istioctl authn tls-check <pod>

# 查看日志
kubectl logs <pod> -c istio-proxy

常见问题

1. Sidecar 注入失败

# 检查命名空间标签
kubectl get namespace -L istio-injection

# 检查注入器日志
kubectl logs -n istio-system -l app=sidecar-injector

2. mTLS 问题

# 检查 mTLS 状态
istioctl authn tls-check <pod>

# 查看证书
istioctl proxy-config secret <pod>

3. 路由问题

# 检查路由配置
istioctl proxy-config route <pod> -o json

# 检查集群配置
istioctl proxy-config cluster <pod> -o json

4. 性能问题

# 查看代理统计
kubectl exec <pod> -c istio-proxy -- curl localhost:15000/stats

# 查看内存使用
kubectl top pods

升级策略

金丝雀升级

# 安装新版本
istioctl install --set revision=1-21-0

# 逐步迁移
kubectl label namespace my-ns istio.io/rev=1-21-0

# 验证后清理旧版本
istioctl uninstall --revision=1-20-0

回滚

# 切换回旧版本
kubectl label namespace my-ns istio.io/rev=1-20-0

# 重启 Pod
kubectl rollout restart deployment -n my-ns

最佳实践总结

1. 资源配置

  • Istiod 至少 2 副本
  • Gateway 至少 3 副本
  • 合理设置资源限制

2. 安全配置

  • 启用严格 mTLS
  • 配置授权策略
  • 最小权限原则

3. 可观测性

  • 收集指标、日志、追踪
  • 配置告警规则
  • 定期检查配置

4. 流量管理

  • 合理设置超时
  • 配置重试策略
  • 启用熔断器

5. 运维

  • 使用金丝雀升级
  • 定期备份配置
  • 建立故障恢复流程

小结

Istio 生产部署需要关注:

  • 资源规划:合理配置控制平面和数据平面资源
  • 高可用:多副本、反亲和性
  • 性能调优:连接池、代理优化
  • 安全加固:mTLS、授权策略
  • 监控告警:完整的可观测性
  • 故障排查:常用命令和问题解决

完成本教程后,你应该能够在生产环境中部署和管理 Istio 服务网格。