SpringBoot + 规则执行耗时突增告警:某条规则突然变慢?5 秒内通知负责人排查
背景:规则执行性能的挑战
在现代软件系统中,规则引擎被广泛应用于业务逻辑处理、决策制定等场景。规则的执行性能直接影响系统的响应速度和用户体验。然而,规则执行耗时突增的问题时有发生,可能导致系统性能下降、响应超时,甚至服务崩溃。
传统的监控方式通常是基于系统级别的指标(如CPU、内存使用率),难以精确到具体的规则执行耗时。当规则执行耗时突增时,往往无法及时发现和定位问题,导致问题扩大化。
本文将介绍如何使用SpringBoot实现规则执行耗时突增告警机制,当某条规则执行时间超过阈值时,在5秒内通知负责人排查,确保系统的稳定运行。
核心概念
1. 规则执行耗时监控
规则执行耗时监控是指对规则执行过程的时间消耗进行实时监控和分析。
| 监控维度 | 描述 | 示例 |
|---|---|---|
| 单条规则耗时 | 监控每条规则的执行时间 | 规则A执行耗时100ms |
| 规则组耗时 | 监控规则组的总执行时间 | 规则组A执行耗时500ms |
| 规则执行频率 | 监控规则的执行次数和频率 | 规则A每分钟执行100次 |
| 耗时趋势 | 监控规则执行耗时的变化趋势 | 规则A耗时从10ms增长到100ms |
| 异常耗时 | 监控规则执行的异常耗时 | 规则A执行耗时超过1000ms |
2. 告警机制
告警机制是指当规则执行耗时超过阈值时,及时通知相关负责人。
| 告警方式 | 描述 | 优点 | 缺点 |
|---|---|---|---|
| 邮件告警 | 通过邮件发送告警信息 | 正式,可包含详细信息 | 可能延迟,依赖邮件系统 |
| 短信告警 | 通过短信发送告警信息 | 即时,直接 | 字数限制,成本较高 |
| 钉钉/企业微信 | 通过企业即时通讯工具发送告警 | 即时,可群聊 | 依赖网络,可能被忽略 |
| 电话告警 | 通过电话语音发送告警 | 最直接,强制注意 | 成本高,可能打扰 |
| 系统内部告警 | 在系统内部展示告警信息 | 集成度高 | 可能被忽略 |
3. 规则引擎
规则引擎是一种用于管理和执行业务规则的组件。
| 规则引擎类型 | 描述 | 优点 | 缺点 |
|---|---|---|---|
| Drools | 基于Java的规则引擎,功能强大 | 功能丰富,支持复杂规则 | 学习曲线陡峭 |
| Easy Rules | 轻量级规则引擎,API简洁 | 简单易用,集成方便 | 功能相对简单 |
| Aviator | 轻量级表达式引擎,性能优异 | 性能好,语法简洁 | 功能相对简单 |
| 自定义规则引擎 | 根据业务需求定制 | 完全符合业务需求 | 开发成本高 |
4. 性能监控
性能监控是指对系统性能指标进行实时监控和分析。
| 监控指标 | 描述 | 重要性 |
|---|---|---|
| 响应时间 | 系统处理请求的时间 | 高 |
| 吞吐量 | 系统单位时间处理的请求数 | 高 |
| 错误率 | 系统处理请求的错误比例 | 高 |
| 资源使用率 | CPU、内存、磁盘等资源的使用情况 | 中 |
| 并发数 | 系统同时处理的请求数 | 中 |
技术实现
1. 核心依赖
<!-- Spring Boot Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring Boot Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Spring Boot DevTools (用于开发环境) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<!-- Easy Rules -->
<dependency>
<groupId>org.jeasy</groupId>
<artifactId>easy-rules-core</artifactId>
<version>4.1.0</version>
</dependency>
<dependency>
<groupId>org.jeasy</groupId>
<artifactId>easy-rules-mvel</artifactId>
<version>4.1.0</version>
</dependency>
<!-- SnakeYAML -->
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<!-- Spring Boot Test -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<!-- Prometheus (可选) -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- Spring Boot Mail (可选,用于邮件告警) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-mail</artifactId>
</dependency>
2. 配置管理类
package com.example.rulealert.config;
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
import java.util.List;
/**
* 规则告警配置属性
*/
@Data
@Component
@ConfigurationProperties(prefix = "rule.alert")
public class RuleAlertProperties {
private long threshold; // 告警阈值(毫秒)
private long checkInterval; // 检查间隔(毫秒)
private List<String> recipients; // 告警接收人
private String emailSubject; // 邮件主题
private String emailTemplate; // 邮件模板
private String dingtalkWebhook; // 钉钉webhook
private String wechatWebhook; // 企业微信webhook
}
3. 规则执行监控服务
package com.example.rulealert.service;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.jeasy.rules.api.Facts;
import org.jeasy.rules.api.Rules;
import org.jeasy.rules.api.RulesEngine;
import org.jeasy.rules.core.DefaultRulesEngine;
import org.jeasy.rules.mvel.MVELRule;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicLong;
/**
* 规则执行监控服务
*/
@Slf4j
@Service
public class RuleExecutionMonitorService {
@Autowired
private RuleAlertProperties ruleAlertProperties;
@Autowired
private AlertService alertService;
private RulesEngine rulesEngine = new DefaultRulesEngine();
private Map<String, RuleExecutionStats> ruleStatsMap = new ConcurrentHashMap<>();
private ScheduledExecutorService executorService;
/**
* 初始化规则执行监控服务
*/
@PostConstruct
public void init() {
// 启动定时任务,检查规则执行耗时
executorService = Executors.newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(this::checkRuleExecutionTime, ruleAlertProperties.getCheckInterval(), ruleAlertProperties.getCheckInterval(), TimeUnit.MILLISECONDS);
log.info("Rule execution monitor started with check interval: {}ms", ruleAlertProperties.getCheckInterval());
}
/**
* 执行规则并监控耗时
*/
public void executeRulesWithMonitoring(Rules rules, Facts facts) {
for (Object ruleObj : rules) {
if (ruleObj instanceof MVELRule) {
MVELRule rule = (MVELRule) ruleObj;
String ruleName = rule.getName();
long startTime = System.currentTimeMillis();
try {
// 执行规则
rulesEngine.fire(new Rules(rule), facts);
} catch (Exception e) {
log.error("Error executing rule: {}", ruleName, e);
} finally {
long endTime = System.currentTimeMillis();
long executionTime = endTime - startTime;
// 更新规则执行统计信息
updateRuleStats(ruleName, executionTime);
// 检查是否超过阈值
if (executionTime > ruleAlertProperties.getThreshold()) {
alertService.sendAlert(ruleName, executionTime);
}
}
}
}
}
/**
* 更新规则执行统计信息
*/
private void updateRuleStats(String ruleName, long executionTime) {
RuleExecutionStats stats = ruleStatsMap.computeIfAbsent(ruleName, k -> new RuleExecutionStats());
stats.update(executionTime);
}
/**
* 检查规则执行耗时
*/
private void checkRuleExecutionTime() {
for (Map.Entry<String, RuleExecutionStats> entry : ruleStatsMap.entrySet()) {
String ruleName = entry.getKey();
RuleExecutionStats stats = entry.getValue();
long avgExecutionTime = stats.getAverageExecutionTime();
if (avgExecutionTime > ruleAlertProperties.getThreshold()) {
alertService.sendAlert(ruleName, avgExecutionTime);
}
}
}
/**
* 获取规则执行统计信息
*/
public Map<String, RuleExecutionStats> getRuleStats() {
return ruleStatsMap;
}
/**
* 销毁资源
*/
@PreDestroy
public void destroy() {
if (executorService != null) {
executorService.shutdown();
}
}
/**
* 规则执行统计信息
*/
@Data
public static class RuleExecutionStats {
private AtomicLong totalExecutionTime = new AtomicLong(0);
private AtomicLong executionCount = new AtomicLong(0);
private AtomicLong maxExecutionTime = new AtomicLong(0);
private AtomicLong lastExecutionTime = new AtomicLong(0);
/**
* 更新统计信息
*/
public void update(long executionTime) {
totalExecutionTime.addAndGet(executionTime);
executionCount.incrementAndGet();
maxExecutionTime.updateAndGet(currentMax -> Math.max(currentMax, executionTime));
lastExecutionTime.set(executionTime);
}
/**
* 获取平均执行时间
*/
public long getAverageExecutionTime() {
long count = executionCount.get();
return count > 0 ? totalExecutionTime.get() / count : 0;
}
}
}
4. 告警服务
package com.example.rulealert.service;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.mail.SimpleMailMessage;
import org.springframework.mail.javamail.JavaMailSender;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;
import java.util.Map;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* 告警服务
*/
@Slf4j
@Service
public class AlertService {
@Autowired
private RuleAlertProperties ruleAlertProperties;
@Autowired(required = false)
private JavaMailSender mailSender;
private RestTemplate restTemplate = new RestTemplate();
private ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
/**
* 发送告警
*/
public void sendAlert(String ruleName, long executionTime) {
log.warn("Rule execution time exceeded threshold: {} took {}ms", ruleName, executionTime);
// 异步发送告警
executorService.schedule(() -> {
try {
// 发送邮件告警
sendEmailAlert(ruleName, executionTime);
// 发送钉钉告警
sendDingTalkAlert(ruleName, executionTime);
// 发送企业微信告警
sendWeChatAlert(ruleName, executionTime);
} catch (Exception e) {
log.error("Error sending alert", e);
}
}, 0, TimeUnit.SECONDS);
}
/**
* 发送邮件告警
*/
private void sendEmailAlert(String ruleName, long executionTime) {
if (mailSender != null && ruleAlertProperties.getRecipients() != null && !ruleAlertProperties.getRecipients().isEmpty()) {
try {
SimpleMailMessage message = new SimpleMailMessage();
message.setSubject(ruleAlertProperties.getEmailSubject() != null ? ruleAlertProperties.getEmailSubject() : "Rule Execution Time Alert");
message.setText(String.format("Rule %s execution time exceeded threshold: %dms", ruleName, executionTime));
message.setTo(ruleAlertProperties.getRecipients().toArray(new String[0]));
mailSender.send(message);
log.info("Email alert sent for rule: {}", ruleName);
} catch (Exception e) {
log.error("Error sending email alert", e);
}
}
}
/**
* 发送钉钉告警
*/
private void sendDingTalkAlert(String ruleName, long executionTime) {
if (ruleAlertProperties.getDingtalkWebhook() != null) {
try {
Map<String, Object> payload = new java.util.HashMap<>();
payload.put("msgtype", "text");
Map<String, String> text = new java.util.HashMap<>();
text.put("content", String.format("Rule %s execution time exceeded threshold: %dms", ruleName, executionTime));
payload.put("text", text);
restTemplate.postForObject(ruleAlertProperties.getDingtalkWebhook(), payload, String.class);
log.info("DingTalk alert sent for rule: {}", ruleName);
} catch (Exception e) {
log.error("Error sending DingTalk alert", e);
}
}
}
/**
* 发送企业微信告警
*/
private void sendWeChatAlert(String ruleName, long executionTime) {
if (ruleAlertProperties.getWechatWebhook() != null) {
try {
Map<String, Object> payload = new java.util.HashMap<>();
payload.put("msgtype", "text");
Map<String, String> text = new java.util.HashMap<>();
text.put("content", String.format("Rule %s execution time exceeded threshold: %dms", ruleName, executionTime));
payload.put("text", text);
restTemplate.postForObject(ruleAlertProperties.getWechatWebhook(), payload, String.class);
log.info("WeChat alert sent for rule: {}", ruleName);
} catch (Exception e) {
log.error("Error sending WeChat alert", e);
}
}
}
}
5. 规则管理服务
package com.example.rulealert.service;
import lombok.extern.slf4j.Slf4j;
import org.jeasy.rules.api.Facts;
import org.jeasy.rules.api.Rules;
import org.jeasy.rules.mvel.MVELRule;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.io.File;
import java.io.IOException;
import java.nio.file.*;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* 规则管理服务
*/
@Slf4j
@Service
public class RuleManagementService {
@Autowired
private RuleExecutionMonitorService ruleExecutionMonitorService;
private Rules currentRules = new Rules();
/**
* 初始化规则管理服务
*/
public void init() {
// 加载初始规则
loadRules();
// 启动规则热加载
startRuleHotReload();
// 监听文件变化
startFileWatcher();
}
/**
* 加载规则
*/
public void loadRules() {
Rules newRules = new Rules();
try {
// 从文件加载规则
File ruleDir = new File("rules");
if (ruleDir.exists() && ruleDir.isDirectory()) {
File[] ruleFiles = ruleDir.listFiles((dir, name) -> name.endsWith(".yaml") || name.endsWith(".yml"));
if (ruleFiles != null) {
for (File ruleFile : ruleFiles) {
try {
MVELRule rule = MVELRuleFactory.createRuleFromFile(ruleFile);
newRules.register(rule);
log.info("Loaded rule: {}", rule.getName());
} catch (Exception e) {
log.error("Failed to load rule file: {}", ruleFile.getName(), e);
}
}
}
}
// 更新规则
currentRules = newRules;
log.info("Rules loaded successfully");
} catch (Exception e) {
log.error("Failed to load rules", e);
}
}
/**
* 启动规则热加载
*/
private void startRuleHotReload() {
ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(this::loadRules, 30000, 30000, TimeUnit.MILLISECONDS);
log.info("Rule hot reload started");
}
/**
* 启动文件监听器
*/
private void startFileWatcher() {
try {
Path rulePath = Paths.get("rules");
if (!Files.exists(rulePath)) {
Files.createDirectories(rulePath);
}
WatchService watchService = FileSystems.getDefault().newWatchService();
rulePath.register(watchService, StandardWatchEventKinds.ENTRY_CREATE, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_DELETE);
new Thread(() -> {
while (true) {
try {
WatchKey key = watchService.take();
for (WatchEvent<?> event : key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
Path fileName = (Path) event.context();
log.info("File {}: {}", kind.name(), fileName);
// 重新加载规则
loadRules();
}
key.reset();
} catch (Exception e) {
log.error("File watcher error", e);
}
}
}).start();
log.info("File watcher started");
} catch (IOException e) {
log.error("Failed to start file watcher", e);
}
}
/**
* 执行规则
*/
public void executeRules(Facts facts) {
ruleExecutionMonitorService.executeRulesWithMonitoring(currentRules, facts);
}
/**
* 获取当前规则数量
*/
public int getRuleCount() {
return currentRules.size();
}
}
6. 规则工厂类
package com.example.rulealert.service;
import org.jeasy.rules.mvel.MVELRule;
import org.yaml.snakeyaml.Yaml;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Map;
/**
* 规则工厂类
*/
public class MVELRuleFactory {
private static final Yaml yaml = new Yaml();
/**
* 从文件创建规则
*/
public static MVELRule createRuleFromFile(File file) throws IOException {
try (InputStream inputStream = new FileInputStream(file)) {
Map<String, Object> ruleMap = yaml.load(inputStream);
return createRuleFromMap(ruleMap);
}
}
/**
* 从Map创建规则
*/
private static MVELRule createRuleFromMap(Map<String, Object> ruleMap) {
MVELRule rule = new MVELRule();
rule.setName((String) ruleMap.getOrDefault("name", "unnamed-rule"));
rule.setDescription((String) ruleMap.getOrDefault("description", ""));
rule.setCondition((String) ruleMap.getOrDefault("condition", "true"));
if (ruleMap.get("actions") != null) {
if (ruleMap.get("actions") instanceof String) {
rule.setActions((String) ruleMap.get("actions"));
} else if (ruleMap.get("actions") instanceof java.util.List) {
java.util.List<String> actions = (java.util.List<String>) ruleMap.get("actions");
rule.setActions(String.join(";", actions));
}
}
if (ruleMap.get("priority") != null) {
rule.setPriority(((Number) ruleMap.get("priority")).intValue());
}
return rule;
}
}
7. 业务服务
package com.example.rulealert.service;
import org.jeasy.rules.api.Facts;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
/**
* 业务服务
*/
@Service
public class BusinessService {
@Autowired
private RuleManagementService ruleManagementService;
@Autowired
private RuleExecutionMonitorService ruleExecutionMonitorService;
/**
* 处理业务逻辑
*/
public String processBusinessLogic(String input) {
Facts facts = new Facts();
facts.put("input", input);
facts.put("result", "");
// 执行规则
ruleManagementService.executeRules(facts);
// 获取结果
String result = (String) facts.get("result");
return result.isEmpty() ? "No rule matched" : result;
}
/**
* 获取规则数量
*/
public int getRuleCount() {
return ruleManagementService.getRuleCount();
}
/**
* 获取规则执行统计信息
*/
public java.util.Map<String, RuleExecutionMonitorService.RuleExecutionStats> getRuleStats() {
return ruleExecutionMonitorService.getRuleStats();
}
/**
* 手动触发规则加载
*/
public void reloadRules() {
ruleManagementService.loadRules();
}
}
8. 控制器
package com.example.rulealert.controller;
import com.example.rulealert.service.BusinessService;
import com.example.rulealert.service.RuleExecutionMonitorService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;
import java.util.Map;
/**
* 控制器
*/
@RestController
@RequestMapping("/api")
public class RuleController {
@Autowired
private BusinessService businessService;
/**
* 处理业务逻辑
*/
@PostMapping("/process")
public String process(@RequestBody String input) {
return businessService.processBusinessLogic(input);
}
/**
* 获取规则数量
*/
@GetMapping("/rules/count")
public int getRuleCount() {
return businessService.getRuleCount();
}
/**
* 获取规则执行统计信息
*/
@GetMapping("/rules/stats")
public Map<String, RuleExecutionMonitorService.RuleExecutionStats> getRuleStats() {
return businessService.getRuleStats();
}
/**
* 手动触发规则加载
*/
@PostMapping("/rules/reload")
public String reloadRules() {
businessService.reloadRules();
return "Rules reloaded successfully";
}
}
9. 应用主类
package com.example.rulealert;
import com.example.rulealert.service.RuleManagementService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
/**
* 规则执行耗时告警应用主类
*/
@SpringBootApplication
public class RuleAlertApplication implements CommandLineRunner {
@Autowired
private RuleManagementService ruleManagementService;
public static void main(String[] args) {
SpringApplication.run(RuleAlertApplication.class, args);
}
@Override
public void run(String... args) throws Exception {
// 初始化规则管理服务
ruleManagementService.init();
}
}
10. 配置文件
# 应用配置
spring.application.name=rule-alert-demo
server.port=8080
# 规则告警配置
rule.alert.threshold=1000
rule.alert.check-interval=5000
rule.alert.recipients=admin@example.com,tech@example.com
rule.alert.email-subject=Rule Execution Time Alert
rule.alert.dingtalk-webhook=https://oapi.dingtalk.com/robot/send?access_token=your-token
rule.alert.wechat-webhook=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=your-key
# 邮件配置
spring.mail.host=smtp.example.com
spring.mail.port=587
spring.mail.username=alert@example.com
spring.mail.password=your-password
spring.mail.properties.mail.smtp.auth=true
spring.mail.properties.mail.smtp.starttls.enable=true
# Actuator配置
management.endpoints.web.exposure.include=health,info,metrics
management.endpoint.health.show-details=always
# 日志配置
logging.level.com.example.rulealert=DEBUG
11. 前端页面
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>规则执行耗时监控</title>
<!-- 引入ECharts -->
<script src="https://cdn.jsdelivr.net/npm/echarts@5.4.3/dist/echarts.min.js"></script>
<!-- 引入Ant Design -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/antd@5.12.8/dist/reset.css">
<style>
body {
font-family: Arial, sans-serif;
margin: 0;
padding: 20px;
background-color: #f0f2f5;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.header {
margin-bottom: 20px;
}
.header h1 {
color: #1890ff;
}
.controls {
margin-bottom: 20px;
}
.button {
padding: 8px 16px;
background-color: #1890ff;
color: white;
border: none;
border-radius: 4px;
cursor: pointer;
margin-right: 10px;
}
.button:hover {
background-color: #40a9ff;
}
.button-danger {
background-color: #f5222d;
}
.button-danger:hover {
background-color: #ff4d4f;
}
.button-success {
background-color: #52c41a;
}
.button-success:hover {
background-color: #73d13d;
}
.panel {
margin-bottom: 20px;
padding: 16px;
background-color: white;
border-radius: 4px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.input-group {
margin-bottom: 10px;
}
.input-group label {
display: inline-block;
width: 100px;
}
.input-group input {
padding: 8px;
border: 1px solid #d9d9d9;
border-radius: 4px;
width: 300px;
}
.result {
margin-top: 10px;
padding: 10px;
background-color: #f0f2f5;
border-radius: 4px;
}
.chart-container {
width: 100%;
height: 400px;
margin-bottom: 20px;
background-color: white;
border-radius: 4px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.stats-panel {
margin-bottom: 20px;
padding: 16px;
background-color: white;
border-radius: 4px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.stats-table {
width: 100%;
border-collapse: collapse;
}
.stats-table th, .stats-table td {
padding: 8px;
border-bottom: 1px solid #f0f0f0;
text-align: left;
}
.stats-table th {
background-color: #f5f5f5;
}
.log-panel {
margin-top: 20px;
padding: 16px;
background-color: white;
border-radius: 4px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
max-height: 300px;
overflow-y: auto;
}
.log-entry {
margin-bottom: 8px;
padding: 4px;
border-bottom: 1px solid #f0f0f0;
}
.log-error {
color: #f5222d;
}
.log-info {
color: #1890ff;
}
.log-success {
color: #52c41a;
}
.log-warning {
color: #faad14;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>规则执行耗时监控</h1>
</div>
<div class="controls">
<button class="button" onclick="reloadRules()">重新加载规则</button>
<button class="button" onclick="getRuleCount()">获取规则数量</button>
<button class="button" onclick="getRuleStats()">获取规则统计</button>
</div>
<div class="panel">
<h3>规则测试</h3>
<div class="input-group">
<label for="testInput">输入:</label>
<input type="text" id="testInput" value="hello world">
</div>
<button class="button" onclick="testRule()">测试规则</button>
<div class="result">
<p>结果: <span id="testResult">N/A</span></p>
</div>
</div>
<div class="chart-container" id="executionTimeChart"></div>
<div class="stats-panel">
<h3>规则执行统计</h3>
<table class="stats-table" id="statsTable">
<thead>
<tr>
<th>规则名称</th>
<th>执行次数</th>
<th>平均耗时(ms)</th>
<th>最大耗时(ms)</th>
<th>最后耗时(ms)</th>
</tr>
</thead>
<tbody>
<!-- 动态生成 -->
</tbody>
</table>
</div>
<div class="log-panel">
<h3>操作日志</h3>
<div id="logContainer"></div>
</div>
</div>
<script>
// 初始化ECharts实例
var executionTimeChart = echarts.init(document.getElementById('executionTimeChart'));
// 规则执行时间数据
var ruleNames = [];
var avgExecutionTimes = [];
var maxExecutionTimes = [];
// 初始化图表
function initExecutionTimeChart() {
var option = {
title: {
text: '规则执行耗时统计',
left: 'center'
},
tooltip: {
trigger: 'axis',
axisPointer: {
type: 'shadow'
}
},
legend: {
data: ['平均耗时', '最大耗时'],
top: 30
},
xAxis: {
type: 'category',
data: ruleNames
},
yAxis: {
type: 'value',
name: '耗时(ms)'
},
series: [
{
name: '平均耗时',
type: 'bar',
data: avgExecutionTimes,
itemStyle: {
color: '#1890ff'
}
},
{
name: '最大耗时',
type: 'bar',
data: maxExecutionTimes,
itemStyle: {
color: '#f5222d'
}
}
]
};
executionTimeChart.setOption(option);
}
// 重新加载规则
function reloadRules() {
fetch('/api/rules/reload', {
method: 'POST'
})
.then(response => response.text())
.then(data => {
addLog('重新加载规则: ' + data, 'success');
getRuleCount();
})
.catch(error => {
addLog('重新加载规则失败: ' + error, 'error');
});
}
// 获取规则数量
function getRuleCount() {
fetch('/api/rules/count')
.then(response => response.text())
.then(data => {
addLog('规则数量: ' + data, 'info');
})
.catch(error => {
addLog('获取规则数量失败: ' + error, 'error');
});
}
// 获取规则统计
function getRuleStats() {
fetch('/api/rules/stats')
.then(response => response.json())
.then(data => {
updateStatsTable(data);
updateExecutionTimeChart(data);
addLog('获取规则统计成功', 'info');
})
.catch(error => {
addLog('获取规则统计失败: ' + error, 'error');
});
}
// 测试规则
function testRule() {
var input = document.getElementById('testInput').value;
fetch('/api/process', {
method: 'POST',
headers: {
'Content-Type': 'text/plain'
},
body: input
})
.then(response => response.text())
.then(data => {
document.getElementById('testResult').textContent = data;
addLog('测试规则: input=' + input + ', result=' + data, 'info');
// 测试后更新统计
getRuleStats();
})
.catch(error => {
addLog('测试规则失败: ' + error, 'error');
});
}
// 更新统计表格
function updateStatsTable(data) {
var tbody = document.querySelector('#statsTable tbody');
tbody.innerHTML = '';
for (var ruleName in data) {
var stats = data[ruleName];
var row = document.createElement('tr');
row.innerHTML = `
<td>${ruleName}</td>
<td>${stats.executionCount}</td>
<td>${stats.averageExecutionTime}</td>
<td>${stats.maxExecutionTime}</td>
<td>${stats.lastExecutionTime}</td>
`;
tbody.appendChild(row);
}
}
// 更新执行时间图表
function updateExecutionTimeChart(data) {
ruleNames = [];
avgExecutionTimes = [];
maxExecutionTimes = [];
for (var ruleName in data) {
var stats = data[ruleName];
ruleNames.push(ruleName);
avgExecutionTimes.push(stats.averageExecutionTime);
maxExecutionTimes.push(stats.maxExecutionTime);
}
initExecutionTimeChart();
}
// 添加日志
function addLog(message, type) {
var logContainer = document.getElementById('logContainer');
var logEntry = document.createElement('div');
logEntry.className = 'log-entry log-' + type;
logEntry.textContent = '[' + new Date().toLocaleString() + '] ' + message;
logContainer.appendChild(logEntry);
logContainer.scrollTop = logContainer.scrollHeight;
}
// 页面加载时初始化
window.onload = function() {
getRuleCount();
getRuleStats();
};
// 窗口大小改变时重新调整图表大小
window.onresize = function() {
executionTimeChart.resize();
};
// 定期更新统计信息
setInterval(getRuleStats, 5000);
</script>
</body>
</html>
核心流程
1. 规则执行监控流程
- 规则执行:业务服务调用规则管理服务执行规则
- 耗时计算:规则执行监控服务计算每条规则的执行时间
- 统计更新:更新规则执行统计信息(总执行时间、执行次数、最大耗时等)
- 阈值检查:检查规则执行时间是否超过阈值
- 告警触发:如果超过阈值,触发告警
- 定时检查:定期检查规则执行耗时趋势
2. 告警流程
- 告警触发:规则执行时间超过阈值,触发告警
- 告警发送:通过邮件、钉钉、企业微信等渠道发送告警
- 告警处理:负责人收到告警后,进行排查和处理
- 问题解决:定位并解决规则执行慢的问题
- 告警恢复:问题解决后,告警恢复
3. 规则管理流程
- 规则加载:从文件系统加载规则
- 规则热更新:监控规则文件变化,自动更新规则
- 规则执行:执行规则并监控耗时
- 统计分析:分析规则执行统计信息
- 告警触发:根据统计信息触发告警
技术要点
1. 规则执行监控
- 实时监控:实时监控每条规则的执行时间
- 统计分析:统计规则执行的平均时间、最大时间等指标
- 阈值检查:检查规则执行时间是否超过阈值
- 定时检查:定期检查规则执行耗时趋势
2. 告警机制
- 多渠道告警:支持邮件、钉钉、企业微信等多种告警渠道
- 异步发送:异步发送告警,避免阻塞业务流程
- 及时通知:确保告警在5秒内送达
- 告警内容:包含规则名称、执行时间等详细信息
3. 规则引擎集成
- Easy Rules:使用轻量级的Easy Rules引擎执行规则
- MVEL表达式:使用MVEL表达式定义规则条件和动作
- 规则管理:统一管理规则的加载、更新和执行
4. 性能优化
- 异步处理:使用异步方式处理告警和监控
- 缓存机制:缓存规则执行统计信息,减少计算开销
- 批处理:批量处理规则执行统计,减少网络开销
- 资源管理:合理管理线程池和其他资源
5. 监控与告警
- 实时监控:实时监控规则执行情况
- 趋势分析:分析规则执行耗时的变化趋势
- 告警阈值:设置合理的告警阈值
- 告警策略:根据规则的重要性设置不同的告警策略
最佳实践
1. 规则设计最佳实践
- 规则拆分:将复杂规则拆分为多个简单规则,便于监控和优化
- 规则优先级:为规则设置合理的优先级,确保执行顺序正确
- 规则条件:规则条件应简洁明了,避免复杂表达式
- 规则动作:规则动作应专注于业务逻辑,避免耗时操作
- 规则测试:在加载前测试规则的正确性和性能
2. 监控配置最佳实践
- 合理设置阈值:根据规则的复杂度和业务需求,设置合理的执行时间阈值
- 多维度监控:监控规则执行的多个维度,如平均时间、最大时间、执行频率等
- 趋势分析:分析规则执行耗时的变化趋势,及时发现异常
- 告警策略:根据规则的重要性,设置不同的告警策略
- 告警渠道:选择合适的告警渠道,确保告警及时送达
3. 性能优化最佳实践
- 规则优化:优化规则条件和动作,减少执行时间
- 缓存使用:合理使用缓存,减少重复计算
- 并行执行:对于独立的规则,考虑并行执行
- 资源管理:合理管理规则引擎的资源使用
- 监控调优:根据监控结果,持续优化规则和系统
4. 告警管理最佳实践
- 告警分级:根据规则执行耗时的严重程度,设置不同级别的告警
- 告警聚合:对相同规则的告警进行聚合,避免告警风暴
- 告警抑制:在特定情况下,暂时抑制告警
- 告警升级:如果告警长时间未处理,自动升级告警级别
- 告警记录:记录告警历史,便于分析和追溯
5. 运维管理最佳实践
- 定期检查:定期检查规则执行情况,及时发现问题
- 性能测试:定期进行规则性能测试,确保规则执行效率
- 备份策略:定期备份规则文件,防止规则丢失
- 变更管理:对规则变更进行严格的管理和审核
- 文档管理:维护规则的文档,便于理解和维护
常见问题
1. 规则执行耗时突增
问题:某条规则执行时间突然增加,导致系统性能下降
解决方案:
- 检查规则条件是否过于复杂
- 检查规则动作是否包含耗时操作
- 检查规则依赖的外部服务是否响应缓慢
- 检查系统资源是否充足
- 对规则进行优化,如拆分复杂规则、使用缓存等
2. 告警频繁触发
问题:告警频繁触发,导致告警疲劳
解决方案:
- 调整告警阈值,避免误告警
- 对相同规则的告警进行聚合
- 设置告警抑制规则,在特定情况下暂时抑制告警
- 优化规则执行性能,从根本上解决问题
3. 告警未及时送达
问题:告警未及时送达,导致问题处理延迟
解决方案:
- 确保告警渠道正常工作
- 设置多个告警渠道,提高告警送达率
- 对重要告警,使用多种告警渠道
- 定期测试告警系统,确保其正常工作
4. 规则执行统计不准确
问题:规则执行统计信息不准确,影响告警判断
解决方案:
- 确保统计逻辑正确
- 避免统计过程中的并发问题
- 定期校准统计数据
- 对统计数据进行验证和测试
5. 系统性能影响
问题:监控规则执行耗时对系统性能造成影响
解决方案:
- 使用异步方式进行监控和统计
- 减少监控的频率和粒度
- 优化监控代码,减少开销
- 对监控数据进行采样,减少数据量
代码优化建议
1. 规则执行监控服务优化
/**
* 优化的规则执行监控服务
*/
@Service
public class OptimizedRuleExecutionMonitorService {
@Autowired
private RuleAlertProperties ruleAlertProperties;
@Autowired
private AlertService alertService;
private RulesEngine rulesEngine = new DefaultRulesEngine();
private Map<String, RuleExecutionStats> ruleStatsMap = new ConcurrentHashMap<>();
private ScheduledExecutorService executorService;
private AtomicLong lastCheckTime = new AtomicLong(0);
/**
* 初始化规则执行监控服务
*/
@PostConstruct
public void init() {
// 启动定时任务,检查规则执行耗时
executorService = Executors.newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(this::checkRuleExecutionTime, ruleAlertProperties.getCheckInterval(), ruleAlertProperties.getCheckInterval(), TimeUnit.MILLISECONDS);
log.info("Rule execution monitor started with check interval: {}ms", ruleAlertProperties.getCheckInterval());
}
/**
* 执行规则并监控耗时
*/
public void executeRulesWithMonitoring(Rules rules, Facts facts) {
for (Object ruleObj : rules) {
if (ruleObj instanceof MVELRule) {
MVELRule rule = (MVELRule) ruleObj;
String ruleName = rule.getName();
long startTime = System.currentTimeMillis();
try {
// 执行规则
rulesEngine.fire(new Rules(rule), facts);
} catch (Exception e) {
log.error("Error executing rule: {}", ruleName, e);
} finally {
long endTime = System.currentTimeMillis();
long executionTime = endTime - startTime;
// 更新规则执行统计信息
updateRuleStats(ruleName, executionTime);
// 检查是否超过阈值
if (executionTime > ruleAlertProperties.getThreshold()) {
alertService.sendAlert(ruleName, executionTime);
}
}
}
}
}
/**
* 更新规则执行统计信息
*/
private void updateRuleStats(String ruleName, long executionTime) {
RuleExecutionStats stats = ruleStatsMap.computeIfAbsent(ruleName, k -> new RuleExecutionStats());
stats.update(executionTime);
}
/**
* 检查规则执行耗时
*/
private void checkRuleExecutionTime() {
long currentTime = System.currentTimeMillis();
if (currentTime - lastCheckTime.get() < ruleAlertProperties.getCheckInterval()) {
return;
}
lastCheckTime.set(currentTime);
for (Map.Entry<String, RuleExecutionStats> entry : ruleStatsMap.entrySet()) {
String ruleName = entry.getKey();
RuleExecutionStats stats = entry.getValue();
long avgExecutionTime = stats.getAverageExecutionTime();
if (avgExecutionTime > ruleAlertProperties.getThreshold()) {
alertService.sendAlert(ruleName, avgExecutionTime);
}
}
}
// 其他方法与原实现类似...
}
2. 告警服务优化
/**
* 优化的告警服务
*/
@Service
public class OptimizedAlertService {
@Autowired
private RuleAlertProperties ruleAlertProperties;
@Autowired(required = false)
private JavaMailSender mailSender;
private RestTemplate restTemplate = new RestTemplate();
private ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
private Map<String, Long> lastAlertTimeMap = new ConcurrentHashMap<>();
private static final long ALERT_INTERVAL = 60000; // 1分钟内不重复发送相同告警
/**
* 发送告警
*/
public void sendAlert(String ruleName, long executionTime) {
// 检查是否在告警间隔内
long currentTime = System.currentTimeMillis();
Long lastAlertTime = lastAlertTimeMap.get(ruleName);
if (lastAlertTime != null && currentTime - lastAlertTime < ALERT_INTERVAL) {
log.info("Alert for rule {} skipped, within interval", ruleName);
return;
}
log.warn("Rule execution time exceeded threshold: {} took {}ms", ruleName, executionTime);
// 更新最后告警时间
lastAlertTimeMap.put(ruleName, currentTime);
// 异步发送告警
executorService.schedule(() -> {
try {
// 发送邮件告警
sendEmailAlert(ruleName, executionTime);
// 发送钉钉告警
sendDingTalkAlert(ruleName, executionTime);
// 发送企业微信告警
sendWeChatAlert(ruleName, executionTime);
} catch (Exception e) {
log.error("Error sending alert", e);
}
}, 0, TimeUnit.SECONDS);
}
// 其他方法与原实现类似...
}
3. 业务服务优化
/**
* 优化的业务服务
*/
@Service
public class OptimizedBusinessService {
@Autowired
private RuleManagementService ruleManagementService;
@Autowired
private RuleExecutionMonitorService ruleExecutionMonitorService;
/**
* 处理业务逻辑
*/
public String processBusinessLogic(String input) {
Facts facts = new Facts();
facts.put("input", input);
facts.put("result", "");
facts.put("startTime", System.currentTimeMillis());
try {
// 执行规则
ruleManagementService.executeRules(facts);
} catch (Exception e) {
log.error("Error processing business logic", e);
return "Error: " + e.getMessage();
} finally {
long endTime = System.currentTimeMillis();
long totalTime = endTime - (Long) facts.get("startTime");
log.info("Total processing time: {}ms", totalTime);
}
// 获取结果
String result = (String) facts.get("result");
return result.isEmpty() ? "No rule matched" : result;
}
// 其他方法与原实现类似...
}
互动话题
- 您在使用规则引擎时遇到过哪些性能问题?是如何解决的?
- 您认为规则执行耗时监控在实际项目中的重要性如何?
- 您在实际项目中如何设置告警阈值?
- 您对多渠道告警有什么经验?
- 您认为未来规则引擎的性能监控发展趋势是什么?
欢迎在评论区交流讨论!
公众号:服务端技术精选,关注最新技术动态,分享实用技巧。
标题:SpringBoot + 规则执行耗时突增告警:某条规则突然变慢?5 秒内通知负责人排查
作者:jiangyi
地址:http://jiangyi.space/articles/2026/04/19/1775988032675.html
公众号:服务端技术精选
- 背景:规则执行性能的挑战
- 核心概念
- 1. 规则执行耗时监控
- 2. 告警机制
- 3. 规则引擎
- 4. 性能监控
- 技术实现
- 1. 核心依赖
- 2. 配置管理类
- 3. 规则执行监控服务
- 4. 告警服务
- 5. 规则管理服务
- 6. 规则工厂类
- 7. 业务服务
- 8. 控制器
- 9. 应用主类
- 10. 配置文件
- 11. 前端页面
- 核心流程
- 1. 规则执行监控流程
- 2. 告警流程
- 3. 规则管理流程
- 技术要点
- 1. 规则执行监控
- 2. 告警机制
- 3. 规则引擎集成
- 4. 性能优化
- 5. 监控与告警
- 最佳实践
- 1. 规则设计最佳实践
- 2. 监控配置最佳实践
- 3. 性能优化最佳实践
- 4. 告警管理最佳实践
- 5. 运维管理最佳实践
- 常见问题
- 1. 规则执行耗时突增
- 2. 告警频繁触发
- 3. 告警未及时送达
- 4. 规则执行统计不准确
- 5. 系统性能影响
- 代码优化建议
- 1. 规则执行监控服务优化
- 2. 告警服务优化
- 3. 业务服务优化
- 互动话题
评论
0 评论