SpringBoot + 应用启动健康检查 + 就绪探针:K8s 部署时自动检测,避免流量打向未就绪实例

前言

在 Kubernetes(K8s)环境中部署应用时,一个常见的问题就是:流量被分发到还未完全就绪的实例,导致用户请求失败或超时。这不仅影响用户体验,还可能引发连锁故障,造成严重的业务损失。

想象一下这样的场景:你的应用正在 K8s 中进行滚动更新,新的 Pod 刚启动,但应用还在初始化数据库连接、加载缓存数据、预热连接池。此时,K8s 的 Service 已经将流量路由到这个新 Pod,但由于应用还未完全就绪,所有请求都失败了。更糟糕的是,如果多个新 Pod 同时出现这种情况,整个服务可能陷入不可用状态。

应用启动健康检查就绪探针(Readiness Probe)是解决这个问题的关键技术。它们可以帮助 K8s 准确判断应用是否真正准备好接收流量,避免将请求发送到未就绪的实例。

本文将详细介绍如何在 SpringBoot 项目中实现应用启动健康检查和就绪探针,构建一个在 K8s 环境中稳定可靠的应用系统。

一、健康检查和就绪探针的核心概念

1.1 什么是健康检查

健康检查(Health Check)是一种用于检测应用运行状态的机制,它定期检查应用是否正常运行。健康检查通常包括:

  • 存活检查(Liveness Check):检查应用是否存活,如果不存活则重启容器
  • 就绪检查(Readiness Check):检查应用是否准备好接收流量,如果未就绪则从 Service 中移除

1.2 K8s 探针类型

Kubernetes 提供了两种主要的探针类型:

探针类型作用失败处理典型场景
Liveness Probe检测容器是否存活重启容器应用死锁、内存泄漏
Readiness Probe检测应用是否就绪从 Service 中移除应用初始化、依赖服务未就绪

1.3 探针检测方式

Kubernetes 支持三种探针检测方式:

  • HTTP GET:通过 HTTP 请求检查应用状态
  • TCP Socket:通过 TCP 连接检查端口是否开放
  • Exec Command:在容器内执行命令检查状态

HTTP GET 方式是最常用的,因为它可以检查应用的具体业务逻辑状态。

二、SpringBoot 健康检查实现

2.1 Spring Boot Actuator

Spring Boot Actuator 提供了开箱即用的健康检查功能,通过 /actuator/health 端点暴露应用健康状态。

默认健康检查

  • 数据库连接状态
  • Redis 连接状态
  • 磁盘空间状态
  • 内存使用状态

自定义健康检查

  • 业务逻辑状态
  • 外部服务连接状态
  • 缓存预热状态

2.2 健康状态定义

Spring Boot 定义了三种健康状态:

状态含义HTTP 状态码说明
UP健康200应用正常运行
DOWN不健康503应用存在问题
OUT_OF_SERVICE不可用503应用维护中

2.3 健康检查响应格式

简单响应

{
  "status": "UP"
}

详细响应

{
  "status": "UP",
  "components": {
    "db": {
      "status": "UP",
      "details": {
        "database": "MySQL",
        "validationQuery": "isValid()"
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 499963174912,
        "free": 384846878720,
        "threshold": 10485760
      }
    }
  }
}

三、就绪探针实现原理

3.1 就绪探针的作用

就绪探针(Readiness Probe)用于判断应用是否准备好接收流量,它的核心作用包括:

  • 流量控制:只有就绪的 Pod 才会接收流量
  • 滚动更新:确保新 Pod 完全就绪后才替换旧 Pod
  • 故障恢复:应用恢复后自动重新加入 Service

3.2 就绪探针的实现方式

方式一:使用独立的就绪检查端点

@RestController
public class ReadinessController {

    @GetMapping("/ready")
    public ResponseEntity<String> ready() {
        if (isApplicationReady()) {
            return ResponseEntity.ok("OK");
        } else {
            return ResponseEntity.status(503).body("Not Ready");
        }
    }
}

方式二:使用 Spring Boot Actuator 的就绪状态

@Component
public class ReadinessHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        if (isApplicationReady()) {
            return Health.up().build();
        } else {
            return Health.down().withDetail("reason", "Application not ready").build();
        }
    }
}

3.3 就绪检查的判断条件

典型就绪条件

  • 数据库连接已建立
  • Redis 连接已建立
  • 缓存已预热
  • 连接池已初始化
  • 外部服务已连接
  • 业务初始化已完成

四、SpringBoot 实现健康检查和就绪探针

4.1 项目依赖

<dependencies>
    <!-- Spring Boot Starter -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- Spring Boot Actuator -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>

    <!-- Spring Data JPA -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>

    <!-- Spring Data Redis -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>

    <!-- MySQL Driver -->
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <scope>runtime</scope>
    </dependency>

    <!-- Lombok -->
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>

    <!-- Test -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

4.2 核心配置类

HealthConfig 健康检查配置

@Configuration
@Data
public class HealthConfig {

    private boolean checkDatabase = true;

    private boolean checkRedis = true;

    private boolean checkExternalService = true;

    private int readinessTimeoutSeconds = 60;

    private int livenessTimeoutSeconds = 30;

}

4.3 健康检查指示器

DatabaseHealthIndicator 数据库健康检查

@Component
@Slf4j
public class DatabaseHealthIndicator implements HealthIndicator {

    @Autowired
    private DataSource dataSource;

    @Autowired
    private HealthConfig healthConfig;

    @Override
    public Health health() {
        if (!healthConfig.isCheckDatabase()) {
            return Health.up().build();
        }

        try (Connection connection = dataSource.getConnection()) {
            if (connection.isValid(5)) {
                return Health.up()
                        .withDetail("database", "MySQL")
                        .withDetail("validationQuery", "isValid()")
                        .build();
            } else {
                return Health.down()
                        .withDetail("reason", "Database connection is not valid")
                        .build();
            }
        } catch (SQLException e) {
            log.error("Database health check failed", e);
            return Health.down()
                    .withDetail("error", e.getMessage())
                    .build();
        }
    }

}

RedisHealthIndicator Redis 健康检查

@Component
@Slf4j
public class RedisHealthIndicator implements HealthIndicator {

    @Autowired
    private StringRedisTemplate redisTemplate;

    @Autowired
    private HealthConfig healthConfig;

    @Override
    public Health health() {
        if (!healthConfig.isCheckRedis()) {
            return Health.up().build();
        }

        try {
            redisTemplate.opsForValue().set("health:check", "OK", 10, TimeUnit.SECONDS);
            String result = redisTemplate.opsForValue().get("health:check");

            if ("OK".equals(result)) {
                return Health.up()
                        .withDetail("redis", "Redis")
                        .withDetail("ping", "PONG")
                        .build();
            } else {
                return Health.down()
                        .withDetail("reason", "Redis ping failed")
                        .build();
            }
        } catch (Exception e) {
            log.error("Redis health check failed", e);
            return Health.down()
                    .withDetail("error", e.getMessage())
                    .build();
        }
    }

}

ReadinessHealthIndicator 就绪健康检查

@Component
@Slf4j
public class ReadinessHealthIndicator implements HealthIndicator {

    @Autowired
    private ApplicationContext applicationContext;

    @Autowired
    private HealthConfig healthConfig;

    private volatile boolean ready = false;

    @PostConstruct
    public void init() {
        log.info("Application is starting, readiness check will be enabled after {} seconds",
                healthConfig.getReadinessTimeoutSeconds());
    }

    public void setReady(boolean ready) {
        this.ready = ready;
        log.info("Application readiness changed to: {}", ready);
    }

    @Override
    public Health health() {
        if (ready) {
            return Health.up()
                    .withDetail("ready", true)
                    .withDetail("message", "Application is ready to accept traffic")
                    .build();
        } else {
            return Health.down()
                    .withDetail("ready", false)
                    .withDetail("message", "Application is initializing")
                    .build();
        }
    }

}

4.4 应用启动监听器

ApplicationStartupListener 应用启动监听器

@Component
@Slf4j
public class ApplicationStartupListener implements ApplicationListener<ApplicationReadyEvent> {

    @Autowired
    private ReadinessHealthIndicator readinessHealthIndicator;

    @Autowired
    private HealthConfig healthConfig;

    @Override
    public void onApplicationEvent(ApplicationReadyEvent event) {
        log.info("Application is starting initialization");

        CompletableFuture.runAsync(() -> {
            try {
                Thread.sleep(healthConfig.getReadinessTimeoutSeconds() * 1000L);
                readinessHealthIndicator.setReady(true);
                log.info("Application is ready to accept traffic");
            } catch (InterruptedException e) {
                log.error("Application initialization interrupted", e);
                Thread.currentThread().interrupt();
            }
        });
    }

}

4.5 控制器

HealthController 健康检查控制器

@RestController
@RequestMapping("/health")
@Slf4j
public class HealthController {

    @Autowired
    private ReadinessHealthIndicator readinessHealthIndicator;

    @GetMapping("/liveness")
    public ResponseEntity<Map<String, String>> liveness() {
        Map<String, String> response = new HashMap<>();
        response.put("status", "alive");
        response.put("timestamp", Instant.now().toString());
        return ResponseEntity.ok(response);
    }

    @GetMapping("/readiness")
    public ResponseEntity<Map<String, Object>> readiness() {
        Health health = readinessHealthIndicator.health();

        if (health.getStatus() == Status.UP) {
            Map<String, Object> response = new HashMap<>();
            response.put("status", "ready");
            response.put("timestamp", Instant.now().toString());
            return ResponseEntity.ok(response);
        } else {
            Map<String, Object> response = new HashMap<>();
            response.put("status", "not_ready");
            response.put("timestamp", Instant.now().toString());
            response.put("details", health.getDetails());
            return ResponseEntity.status(503).body(response);
        }
    }

}

五、Kubernetes 部署配置

5.1 Deployment 配置

apiVersion: apps/v1
kind: Deployment
metadata:
  name: health-check-demo
  labels:
    app: health-check-demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: health-check-demo
  template:
    metadata:
      labels:
        app: health-check-demo
    spec:
      containers:
      - name: health-check-demo
        image: health-check-demo:1.0.0
        ports:
        - containerPort: 8080
          name: http
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
          successThreshold: 1
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          successThreshold: 1
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

5.2 Service 配置

apiVersion: v1
kind: Service
metadata:
  name: health-check-demo-service
spec:
  selector:
    app: health-check-demo
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

5.3 探针参数说明

参数说明推荐值说明
initialDelaySeconds首次检查延迟Liveness: 30sReadiness: 10s容器启动后多久开始检查
periodSeconds检查周期Liveness: 10sReadiness: 5s每隔多久检查一次
timeoutSeconds超时时间Liveness: 5sReadiness: 3s检查超时时间
failureThreshold失败阈值3连续失败多少次判定为失败
successThreshold成功阈值1连续成功多少次判定为成功

六、最佳实践

6.1 合理设置探针参数

原则

  • 根据应用启动时间设置 initialDelaySeconds
  • 根据应用响应时间设置 timeoutSeconds
  • 根据业务重要性设置 periodSeconds
  • 避免过于频繁的探针检查

建议

  • Liveness Probe:initialDelaySeconds=30s, periodSeconds=10s
  • Readiness Probe:initialDelaySeconds=10s, periodSeconds=5s

6.2 分离存活和就绪检查

原则

  • Liveness Probe 只检查应用是否存活
  • Readiness Probe 检查应用是否就绪
  • 避免将就绪检查用于存活检查

实现

  • 使用不同的端点
  • 使用不同的检查逻辑
  • 使用不同的超时时间

6.3 优雅关闭

原则

  • 在关闭前停止接收新请求
  • 等待现有请求完成
  • 释放所有资源

实现

@PreDestroy
public void shutdown() {
    log.info("Application is shutting down");
    readinessHealthIndicator.setReady(false);
}

6.4 监控和告警

监控指标

  • 探针失败次数
  • 探针响应时间
  • Pod 重启次数
  • 应用就绪时间

告警规则

  • 探针连续失败 3 次告警
  • Pod 重启次数超过阈值告警
  • 应用就绪时间超过阈值告警

七、总结

应用启动健康检查和就绪探针是 K8s 环境中部署应用的关键技术,它们可以帮助 K8s 准确判断应用是否真正准备好接收流量,避免将请求发送到未就绪的实例。

在实际项目中,我们应该合理设置探针参数,分离存活和就绪检查,实现优雅关闭,并建立完善的监控告警机制,确保应用在 K8s 环境中稳定可靠地运行。

互动话题

  1. 你的项目中是如何实现应用健康检查的?
  2. 你认为就绪探针最大的挑战是什么?
  3. 你有遇到过因为探针配置不当导致的问题吗?

欢迎在评论区留言讨论!更多技术文章,欢迎关注公众号:服务端技术精选


标题:SpringBoot + 应用启动健康检查 + 就绪探针:K8s 部署时自动检测,避免流量打向未就绪实例
作者:jiangyi
地址:http://jiangyi.space/articles/2026/04/02/1774778385213.html
公众号:服务端技术精选
    评论
    0 评论
avatar

取消