爬虫指纹识别与动态拦截:绕过频率限制?设备指纹+行为分析精准封杀!
在互联网时代,数据就是资产。爬虫作为获取数据的重要手段,既有正当的搜索引擎爬虫,也有恶意的竞争对手爬虫、数据盗窃爬虫。这些恶意爬虫往往:
- 伪装成正常用户,绕过频率限制
- 使用大量 IP 代理,规避 IP 封禁
- 模拟浏览器行为,绕过基础检测
- 凌晨高频访问,抢夺数据资源
- 绕过反爬机制,持续获取数据
今天,我们来探讨如何构建一个爬虫指纹识别与动态拦截系统,通过设备指纹+行为分析精准识别并封杀恶意爬虫。
问题背景
恶意爬虫的常见特征
┌─────────────────────────────────────────────────────────────┐
│ 恶意爬虫识别难点: │
│ │
│ 1. 伪装正常用户: │
│ - User-Agent 模拟真实浏览器 │
│ - 使用 Selenium、Playwright 等工具 │
│ - Cookie 和 Session 正常 │
│ │
│ 2. 规避频率限制: │
│ - 使用 IP 代理池,每次请求换 IP │
│ - 分布式爬虫,多台机器协同 │
│ - 慢速爬取,伪装人类访问节奏 │
│ │
│ 3. 绕过基础检测: │
│ - JavaScript 渲染,绕过静态检测 │
│ - TLS 指纹模拟,伪装正常客户端 │
│ - 验证码识别,破解防护措施 │
└─────────────────────────────────────────────────────────────┘
传统反爬手段的局限性
┌─────────────────────────────────────────────────────────────┐
│ 传统反爬方案及其局限: │
│ │
│ 1. IP 黑名单: │
│ - 问题:代理 IP 太多,封不胜封 │
│ - 误伤:移动网络 NAT 出口 IP 共享 │
│ │
│ 2. 频率限制: │
│ - 问题:分布式爬虫可控制请求间隔 │
│ - 误伤:正常用户多设备同时访问 │
│ │
│ 3. User-Agent 检测: │
│ - 问题:可随意伪造 │
│ - 误伤:正常浏览器可能被误判 │
│ │
│ 4. 验证码: │
│ - 问题:影响用户体验,破解成本低 │
│ - 误伤:老年用户、操作不便 │
└─────────────────────────────────────────────────────────────┘
整体架构设计
核心思想
┌─────────────────────────────────────────────────────────────┐
│ 爬虫指纹识别与动态拦截架构: │
│ │
│ 1. 设备指纹:采集浏览器/客户端特征,形成唯一标识 │
│ 2. 行为分析:分析用户操作行为,识别异常模式 │
│ 3. 风险评估:综合多维度信息,计算风险分数 │
│ 4. 动态拦截:根据风险等级,实施不同处置策略 │
│ │
│ 关键设计: │
│ - 被动采集:不影响正常用户体验 │
│ - 多维度指纹:提高伪造难度 │
│ - 实时检测:毫秒级响应 │
│ - 渐进式处置:减少误伤 │
└─────────────────────────────────────────────────────────────┘
架构流程图
请求进入
↓
提取请求特征(IP、UA、Header、Cookie)
↓
生成设备指纹
↓
查询指纹风险档案
↓
┌─────────────────────────────────────────┐
│ 风险评估: │
│ - 设备指纹命中黑名单? │
│ - 行为异常? │
│ - 请求特征缺失? │
│ - 访问频率异常? │
└─────────────────────────────────────────┘
↓
风险分数 >= 阈值?
↓
┌─────────────────────────────┐
│ 低风险:放行 │
│ 中风险:验证码挑战 │
│ 高风险:拦截 + 记录 │
│ 黑名单:直接拒绝 │
└─────────────────────────────┘
核心代码实现
1. 设备指纹实体
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Entity
@Table(name = "device_fingerprint")
public class DeviceFingerprint {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String fingerprint;
private String ip;
private String userAgent;
private String acceptLanguage;
private String acceptEncoding;
private String accept;
private String referer;
private String cookie;
private String secChUa;
private String secChUaMobile;
private String secChUaPlatform;
private String secFetchDest;
private String secFetchMode;
private String secFetchSite;
private String secFetchUser;
private String upgradeInsecureRequests;
private Integer screenWidth;
private Integer screenHeight;
private Integer colorDepth;
private String timezone;
private String language;
private String platform;
private String hardwareConcurrency;
private String deviceMemory;
private String touchPoints;
private String canvasHash;
private String webglHash;
private String audioHash;
private String fontsHash;
private Integer riskScore;
private String riskLevel;
private String status;
private LocalDateTime firstSeenTime;
private LocalDateTime lastSeenTime;
private Integer visitCount;
private Integer blockCount;
}
public enum RiskLevel {
LOW("低风险"),
MEDIUM("中风险"),
HIGH("高风险"),
BLOCKED("已封禁");
private final String description;
}
2. 设备指纹服务
@Service
@Slf4j
public class DeviceFingerprintService {
@Autowired
private DeviceFingerprintRepository repository;
@Autowired
private FingerprintRiskAssessor riskAssessor;
private final Map<String, String> ipBlacklist = new ConcurrentHashMap<>();
public String generateFingerprint(HttpServletRequest request) {
String ip = getClientIp(request);
String userAgent = request.getHeader("User-Agent");
String secChUa = request.getHeader("Sec-CH-UA");
String secFetchDest = request.getHeader("Sec-Fetch-Dest");
String secFetchMode = request.getHeader("Sec-Fetch-Mode");
String secFetchSite = request.getHeader("Sec-Fetch-Site");
String secFetchUser = request.getHeader("Sec-Fetch-User");
String canvasHash = request.getHeader("X-Canvas-Hash");
String webglHash = request.getHeader("X-WebGL-Hash");
String audioHash = request.getHeader("X-Audio-Hash");
String fontsHash = request.getHeader("X-Fonts-Hash");
String rawFingerprint = String.join("|",
ip,
userAgent != null ? userAgent : "",
secChUa != null ? secChUa : "",
secFetchDest != null ? secFetchDest : "",
secFetchMode != null ? secFetchMode : "",
secFetchSite != null ? secFetchSite : "",
canvasHash != null ? canvasHash : "",
webglHash != null ? webglHash : "",
audioHash != null ? audioHash : "",
fontsHash != null ? fontsHash : ""
);
return hashFingerprint(rawFingerprint);
}
public DeviceFingerprint getOrCreateFingerprint(HttpServletRequest request) {
String fingerprint = generateFingerprint(request);
Optional<DeviceFingerprint> existing = repository.findByFingerprint(fingerprint);
if (existing.isPresent()) {
DeviceFingerprint fp = existing.get();
fp.setLastSeenTime(LocalDateTime.now());
fp.setVisitCount(fp.getVisitCount() + 1);
return repository.save(fp);
}
DeviceFingerprint fp = DeviceFingerprint.builder()
.fingerprint(fingerprint)
.ip(getClientIp(request))
.userAgent(request.getHeader("User-Agent"))
.acceptLanguage(request.getHeader("Accept-Language"))
.acceptEncoding(request.getHeader("Accept-Encoding"))
.accept(request.getHeader("Accept"))
.referer(request.getHeader("Referer"))
.secChUa(request.getHeader("Sec-CH-UA"))
.secChUaMobile(request.getHeader("Sec-CH-UA-Mobile"))
.secChUaPlatform(request.getHeader("Sec-CH-UA-Platform"))
.secFetchDest(request.getHeader("Sec-Fetch-Dest"))
.secFetchMode(request.getHeader("Sec-Fetch-Mode"))
.secFetchSite(request.getHeader("Sec-Fetch-Site"))
.secFetchUser(request.getHeader("Sec-Fetch-User"))
.upgradeInsecureRequests(request.getHeader("Upgrade-Insecure-Requests"))
.riskScore(0)
.riskLevel(RiskLevel.LOW.name())
.status("ACTIVE")
.firstSeenTime(LocalDateTime.now())
.lastSeenTime(LocalDateTime.now())
.visitCount(1)
.blockCount(0)
.build();
return repository.save(fp);
}
public void updateRiskScore(String fingerprint, int score) {
repository.findByFingerprint(fingerprint).ifPresent(fp -> {
fp.setRiskScore(score);
fp.setRiskLevel(calculateRiskLevel(score));
fp.setLastSeenTime(LocalDateTime.now());
repository.save(fp);
});
}
private RiskLevel calculateRiskLevel(int score) {
if (score >= 80) {
return RiskLevel.HIGH;
} else if (score >= 50) {
return RiskLevel.MEDIUM;
} else {
return RiskLevel.LOW;
}
}
private String hashFingerprint(String raw) {
try {
MessageDigest md = MessageDigest.getInstance("SHA-256");
byte[] hash = md.digest(raw.getBytes(StandardCharsets.UTF_8));
return Base64.getEncoder().encodeToString(hash).substring(0, 32);
} catch (NoSuchAlgorithmException e) {
return UUID.randomUUID().toString();
}
}
private String getClientIp(HttpServletRequest request) {
String ip = request.getHeader("X-Forwarded-For");
if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
ip = request.getHeader("Proxy-Client-IP");
}
if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
ip = request.getHeader("WL-Proxy-Client-IP");
}
if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
ip = request.getRemoteAddr();
}
if (ip != null && ip.contains(",")) {
ip = ip.split(",")[0].trim();
}
return ip;
}
}
3. 行为分析服务
@Service
@Slf4j
public class BehaviorAnalysisService {
@Autowired
private VisitRecordRepository visitRecordRepository;
@Autowired
private BehaviorPatternRepository patternRepository;
private final Map<String, Deque<Long>> requestTimestamps = new ConcurrentHashMap<>();
private final Map<String, Deque<Integer>> requestIntervals = new ConcurrentHashMap<>();
private static final int WINDOW_SIZE = 20;
private static final int MIN_HUMAN_INTERVAL_MS = 3000;
public BehaviorAnalysis analyze(String fingerprint, String ip) {
List<VisitRecord> recentVisits = visitRecordRepository
.findRecentByFingerprint(fingerprint, LocalDateTime.now().minusMinutes(10));
BehaviorAnalysis analysis = BehaviorAnalysis.builder()
.fingerprint(fingerprint)
.ip(ip)
.visitCount(recentVisits.size())
.analyzedAt(LocalDateTime.now())
.build();
if (recentVisits.isEmpty()) {
analysis.setAnomalyScore(0);
return analysis;
}
int score = 0;
score += analyzeFrequency(recentVisits);
score += analyzeIntervalPattern(recentVisits);
score += analyzePageSequence(recentVisits);
score += analyzeTimePattern(recentVisits);
analysis.setAnomalyScore(Math.min(100, score));
analysis.setDetails(buildDetails(recentVisits));
log.debug("行为分析结果: fingerprint={}, score={}", fingerprint, score);
return analysis;
}
private int analyzeFrequency(List<VisitRecord> visits) {
if (visits.size() < 5) {
return 0;
}
LocalDateTime first = visits.get(0).getVisitTime();
LocalDateTime last = visits.get(visits.size() - 1).getVisitTime();
long durationMinutes = java.time.Duration.between(first, last).toMinutes();
if (durationMinutes == 0) {
durationMinutes = 1;
}
double requestsPerMinute = visits.size() / (double) durationMinutes;
if (requestsPerMinute > 60) {
return 40;
} else if (requestsPerMinute > 30) {
return 25;
} else if (requestsPerMinute > 10) {
return 10;
}
return 0;
}
private int analyzeIntervalPattern(List<VisitRecord> visits) {
if (visits.size() < 3) {
return 0;
}
List<Integer> intervals = new ArrayList<>();
for (int i = 1; i < visits.size(); i++) {
long intervalMs = java.time.Duration
.between(visits.get(i - 1).getVisitTime(), visits.get(i).getVisitTime())
.toMillis();
intervals.add((int) intervalMs);
}
double avgInterval = intervals.stream().mapToInt(Integer::intValue).average().orElse(0);
double variance = intervals.stream()
.mapToDouble(i -> Math.pow(i - avgInterval, 2))
.average().orElse(0);
double stdDev = Math.sqrt(variance);
if (avgInterval < MIN_HUMAN_INTERVAL_MS && stdDev < 100) {
return 30;
}
if (stdDev < 50) {
return 20;
}
return 0;
}
private int analyzePageSequence(List<VisitRecord> visits) {
Set<String> uniquePages = visits.stream()
.map(VisitRecord::getUrl)
.collect(Collectors.toSet());
double pageRepetition = 1.0 - (uniquePages.size() / (double) visits.size());
if (pageRepetition > 0.9) {
return 15;
} else if (pageRepetition > 0.7) {
return 10;
}
return 0;
}
private int analyzeTimePattern(List<VisitRecord> visits) {
Set<HourMinute> timePoints = visits.stream()
.map(v -> new HourMinute(v.getVisitTime().getHour(), v.getVisitTime().getMinute()))
.collect(Collectors.toSet());
if (timePoints.size() < 3) {
return 0;
}
List<HourMinute> sorted = timePoints.stream().sorted().collect(Collectors.toList());
boolean hasRegularPattern = false;
for (int i = 1; i < sorted.size(); i++) {
int diff1 = sorted.get(i).toMinutes() - sorted.get(i - 1).toMinutes();
if (i > 1) {
int diff2 = sorted.get(i - 1).toMinutes() - sorted.get(i - 2).toMinutes();
if (Math.abs(diff1 - diff2) < 2) {
hasRegularPattern = true;
break;
}
}
}
return hasRegularPattern ? 15 : 0;
}
private Map<String, Object> buildDetails(List<VisitRecord> visits) {
Map<String, Object> details = new HashMap<>();
details.put("recentUrls", visits.stream()
.map(VisitRecord::getUrl)
.limit(10)
.collect(Collectors.toList()));
details.put("firstVisit", visits.get(0).getVisitTime());
details.put("lastVisit", visits.get(visits.size() - 1).getVisitTime());
return details;
}
public void recordVisit(String fingerprint, String ip, String url) {
VisitRecord record = VisitRecord.builder()
.fingerprint(fingerprint)
.ip(ip)
.url(url)
.visitTime(LocalDateTime.now())
.build();
visitRecordRepository.save(record);
List<VisitRecord> recentVisits = visitRecordRepository
.findRecentByFingerprint(fingerprint, LocalDateTime.now().minusMinutes(10));
if (recentVisits.size() > 100) {
log.warn("高频访问告警: fingerprint={}, count={}", fingerprint, recentVisits.size());
}
}
@Data
@AllArgsConstructor
private static class HourMinute implements Comparable<HourMinute> {
private int hour;
private int minute;
public int toMinutes() {
return hour * 60 + minute;
}
@Override
public int compareTo(HourMinute o) {
return Integer.compare(this.toMinutes(), o.toMinutes());
}
}
}
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class BehaviorAnalysis {
private String fingerprint;
private String ip;
private int visitCount;
private int anomalyScore;
private LocalDateTime analyzedAt;
private Map<String, Object> details;
}
4. 风险评估器
@Component
@Slf4j
public class FingerprintRiskAssessor {
@Autowired
private DeviceFingerprintRepository fingerprintRepository;
@Autowired
private BlacklistRepository blacklistRepository;
@Autowired
private BehaviorAnalysisService behaviorAnalysisService;
private final Pattern botUserAgents = Pattern.compile(
"(curl|wget|python|scrapy|java|go-http|okhttp|apache-httpclient|libwww|perl|node-fetch|axios|got|reqwest|httpx)"
);
public RiskAssessment assess(HttpServletRequest request, DeviceFingerprint fingerprint) {
int totalScore = 0;
List<String> riskFactors = new ArrayList<>();
if (isBlacklisted(fingerprint.getFingerprint())) {
totalScore += 100;
riskFactors.add("设备指纹在黑名单中");
}
if (isIpBlacklisted(fingerprint.getIp())) {
totalScore += 60;
riskFactors.add("IP地址在黑名单中");
}
if (isSuspiciousUserAgent(fingerprint.getUserAgent())) {
totalScore += 30;
riskFactors.add("可疑User-Agent: " + fingerprint.getUserAgent());
}
if (isMissingSecurityHeaders(request)) {
totalScore += 20;
riskFactors.add("缺少安全Header");
}
if (isAbnormalHeaderCombination(fingerprint)) {
totalScore += 25;
riskFactors.add("Header组合异常");
}
BehaviorAnalysis behavior = behaviorAnalysisService.analyze(
fingerprint.getFingerprint(), fingerprint.getIp());
totalScore += behavior.getAnomalyScore();
if (behavior.getAnomalyScore() > 0) {
riskFactors.add("行为异常: score=" + behavior.getAnomalyScore());
}
RiskLevel level = calculateRiskLevel(totalScore);
RiskAssessment assessment = RiskAssessment.builder()
.fingerprint(fingerprint.getFingerprint())
.ip(fingerprint.getIp())
.riskScore(totalScore)
.riskLevel(level)
.riskFactors(riskFactors)
.behaviorAnalysis(behavior)
.assessmentTime(LocalDateTime.now())
.build();
fingerprint.setRiskScore(totalScore);
fingerprint.setRiskLevel(level.name());
fingerprintRepository.save(fingerprint);
if (level == RiskLevel.HIGH || level == RiskLevel.BLOCKED) {
log.warn("高风险请求检测: fingerprint={}, ip={}, score={}, level={}",
fingerprint.getFingerprint(), fingerprint.getIp(), totalScore, level);
}
return assessment;
}
private boolean isBlacklisted(String fingerprint) {
return blacklistRepository.findByFingerprintAndStatus(fingerprint, "BLACKLISTED").isPresent();
}
private boolean isIpBlacklisted(String ip) {
return blacklistRepository.findByIpAndStatus(ip, "BLACKLISTED").isPresent();
}
private boolean isSuspiciousUserAgent(String userAgent) {
if (userAgent == null || userAgent.isEmpty()) {
return true;
}
return botUserAgents.matcher(userAgent.toLowerCase()).find();
}
private boolean isMissingSecurityHeaders(HttpServletRequest request) {
int missing = 0;
if (request.getHeader("Sec-CH-UA") == null) missing++;
if (request.getHeader("Sec-Fetch-Dest") == null) missing++;
if (request.getHeader("Sec-Fetch-Mode") == null) missing++;
if (request.getHeader("Sec-Fetch-Site") == null) missing++;
if (request.getHeader("Sec-Fetch-User") == null) missing++;
return missing >= 4;
}
private boolean isAbnormalHeaderCombination(DeviceFingerprint fp) {
boolean hasSecChUa = fp.getSecChUa() != null;
boolean hasSecFetch = fp.getSecFetchDest() != null && fp.getSecFetchMode() != null;
if (hasSecChUa && !hasSecFetch) {
return true;
}
if (fp.getUserAgent() != null && fp.getUserAgent().contains("Chrome")
&& (fp.getSecChUa() == null || fp.getSecChUa().isEmpty())) {
return true;
}
return false;
}
private RiskLevel calculateRiskLevel(int score) {
if (score >= 80) {
return RiskLevel.BLOCKED;
} else if (score >= 50) {
return RiskLevel.HIGH;
} else if (score >= 30) {
return RiskLevel.MEDIUM;
} else {
return RiskLevel.LOW;
}
}
}
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class RiskAssessment {
private String fingerprint;
private String ip;
private int riskScore;
private RiskLevel riskLevel;
private List<String> riskFactors;
private BehaviorAnalysis behaviorAnalysis;
private LocalDateTime assessmentTime;
}
5. 动态拦截过滤器
@Component
@Slf4j
public class BotDetectionFilter extends OncePerRequestFilter {
@Autowired
private DeviceFingerprintService fingerprintService;
@Autowired
private FingerprintRiskAssessor riskAssessor;
@Autowired
private BehaviorAnalysisService behaviorAnalysisService;
@Autowired
private BlockManager blockManager;
@Autowired
private CaptchaService captchaService;
@Value("${bot.detection.enabled:true}")
private boolean detectionEnabled;
@Value("${bot.detection.risk-threshold:50}")
private int riskThreshold;
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain)
throws ServletException, IOException {
if (!detectionEnabled) {
filterChain.doFilter(request, response);
return;
}
String path = request.getRequestURI();
if (isExcludedPath(path)) {
filterChain.doFilter(request, response);
return;
}
try {
DeviceFingerprint fingerprint = fingerprintService.getOrCreateFingerprint(request);
if (blockManager.isBlocked(fingerprint.getFingerprint(), fingerprint.getIp())) {
log.warn("请求被拦截: fingerprint={}, ip={}",
fingerprint.getFingerprint(), fingerprint.getIp());
writeBlockedResponse(response, "请求被拦截");
return;
}
RiskAssessment assessment = riskAssessor.assess(request, fingerprint);
behaviorAnalysisService.recordVisit(
fingerprint.getFingerprint(),
fingerprint.getIp(),
path
);
switch (assessment.getRiskLevel()) {
case LOW:
filterChain.doFilter(request, response);
break;
case MEDIUM:
handleMediumRisk(request, response, filterChain, assessment);
break;
case HIGH:
handleHighRisk(request, response, filterChain, assessment);
break;
case BLOCKED:
handleBlocked(request, response, filterChain, assessment);
break;
}
} catch (Exception e) {
log.error("Bot检测异常", e);
filterChain.doFilter(request, response);
}
}
private void handleMediumRisk(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain,
RiskAssessment assessment) throws IOException {
log.info("中风险请求: fingerprint={}, score={}",
assessment.getFingerprint(), assessment.getRiskScore());
response.setHeader("X-Risk-Level", "MEDIUM");
response.setHeader("X-Captcha-Required", "true");
response.setHeader("X-Risk-Score", String.valueOf(assessment.getRiskScore()));
if ("true".equals(request.getHeader("X-Captcha-Token"))) {
if (captchaService.verify(request.getHeader("X-Captcha-Token"))) {
filterChain.doFilter(request, response);
} else {
writeBlockedResponse(response, "验证码失败");
}
} else {
writeCaptchaChallenge(response, assessment);
}
}
private void handleHighRisk(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain,
RiskAssessment assessment) throws IOException {
log.warn("高风险请求: fingerprint={}, ip={}, score={}",
assessment.getFingerprint(), assessment.getIp(), assessment.getRiskScore());
blockManager.tempBlock(assessment.getFingerprint(), assessment.getIp(), 5);
writeBlockedResponse(response, "请求异常,请稍后重试");
}
private void handleBlocked(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain,
RiskAssessment assessment) throws IOException {
log.error("已封禁请求: fingerprint={}, ip={}, score={}",
assessment.getFingerprint(), assessment.getIp(), assessment.getRiskScore());
blockManager.permanentBlock(assessment.getFingerprint(), assessment.getIp());
writeBlockedResponse(response, "请求被拒绝");
}
private void writeBlockedResponse(HttpServletResponse response, String message) throws IOException {
response.setStatus(HttpServletResponse.SC_FORBIDDEN);
response.setContentType("application/json");
response.getWriter().write(String.format(
"{\"success\":false,\"message\":\"%s\",\"code\":\"BLOCKED\"}", message));
}
private void writeCaptchaChallenge(HttpServletResponse response, RiskAssessment assessment) throws IOException {
response.setStatus(HttpServletResponse.SC_UNAUTHORIZED);
response.setContentType("application/json");
response.getWriter().write(String.format(
"{\"success\":false,\"message\":\"请完成验证\",\"code\":\"CAPTCHA_REQUIRED\",\"riskScore\":%d}",
assessment.getRiskScore()));
}
private boolean isExcludedPath(String path) {
return path.startsWith("/static") ||
path.startsWith("/health") ||
path.startsWith("/actuator") ||
path.equals("/favicon.ico");
}
}
6. 封禁管理器
@Component
@Slf4j
public class BlockManager {
@Autowired
private BlacklistRepository blacklistRepository;
private final Map<String, Long> tempBlockCache = new ConcurrentHashMap<>();
private final Map<String, Long> permanentBlockCache = new ConcurrentHashMap<>();
private static final long TEMP_BLOCK_DURATION_MS = 5 * 60 * 1000;
public boolean isBlocked(String fingerprint, String ip) {
if (permanentBlockCache.containsKey(fingerprint) ||
permanentBlockCache.containsKey(ip)) {
return true;
}
Long blockTime = tempBlockCache.get(fingerprint);
if (blockTime != null && System.currentTimeMillis() < blockTime) {
return true;
}
blockTime = tempBlockCache.get(ip);
if (blockTime != null && System.currentTimeMillis() < blockTime) {
return true;
}
return blacklistRepository.findByFingerprintAndStatus(fingerprint, "BLACKLISTED").isPresent() ||
blacklistRepository.findByIpAndStatus(ip, "BLACKLISTED").isPresent();
}
public void tempBlock(String fingerprint, String ip, int durationMinutes) {
long expireTime = System.currentTimeMillis() + durationMinutes * 60 * 1000L;
tempBlockCache.put(fingerprint, expireTime);
tempBlockCache.put(ip, expireTime);
log.info("临时封禁: fingerprint={}, ip={}, duration={}min",
fingerprint, ip, durationMinutes);
}
public void permanentBlock(String fingerprint, String ip) {
permanentBlockCache.put(fingerprint, System.currentTimeMillis());
permanentBlockCache.put(ip, System.currentTimeMillis());
saveToBlacklist(fingerprint, ip, "PERMANENT");
log.warn("永久封禁: fingerprint={}, ip={}", fingerprint, ip);
}
private void saveToBlacklist(String fingerprint, String ip, String type) {
blacklistRepository.findByFingerprint(fingerprint).ifPresentOrElse(
record -> {
record.setStatus("BLACKLISTED");
record.setBlockType(type);
record.setBlockTime(LocalDateTime.now());
blacklistRepository.save(record);
},
() -> {
Blacklist blacklist = Blacklist.builder()
.fingerprint(fingerprint)
.ip(ip)
.status("BLACKLISTED")
.blockType(type)
.blockTime(LocalDateTime.now())
.reason("高风险自动封禁")
.build();
blacklistRepository.save(blacklist);
}
);
}
public void unblock(String fingerprint, String ip) {
tempBlockCache.remove(fingerprint);
tempBlockCache.remove(ip);
permanentBlockCache.remove(fingerprint);
permanentBlockCache.remove(ip);
blacklistRepository.findByFingerprintAndStatus(fingerprint, "BLACKLISTED")
.ifPresent(record -> {
record.setStatus("REMOVED");
blacklistRepository.save(record);
});
log.info("解除封禁: fingerprint={}, ip={}", fingerprint, ip);
}
}
7. 验证码服务
@Service
@Slf4j
public class CaptchaService {
private final Map<String, CaptchaToken> tokenStore = new ConcurrentHashMap<>();
public String generateToken(String fingerprint, String ip) {
String token = UUID.randomUUID().toString();
CaptchaToken captchaToken = CaptchaToken.builder()
.token(token)
.fingerprint(fingerprint)
.ip(ip)
.createTime(LocalDateTime.now())
.expireTime(LocalDateTime.now().plusMinutes(5))
.verified(false)
.build();
tokenStore.put(token, captchaToken);
log.info("生成验证码Token: token={}, fingerprint={}", token, fingerprint);
return token;
}
public boolean verify(String token) {
if (token == null || token.isEmpty()) {
return false;
}
CaptchaToken captchaToken = tokenStore.get(token);
if (captchaToken == null) {
log.warn("验证码Token不存在: token={}", token);
return false;
}
if (LocalDateTime.now().isAfter(captchaToken.getExpireTime())) {
log.warn("验证码Token已过期: token={}", token);
tokenStore.remove(token);
return false;
}
if (captchaToken.isVerified()) {
log.warn("验证码Token已使用: token={}", token);
return false;
}
captchaToken.setVerified(true);
tokenStore.put(token, captchaToken);
log.info("验证码Token验证成功: token={}", token);
return true;
}
public boolean requiresCaptcha(RiskAssessment assessment) {
return assessment.getRiskLevel() == RiskLevel.MEDIUM ||
assessment.getRiskScore() >= 30;
}
}
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
class CaptchaToken {
private String token;
private String fingerprint;
private String ip;
private LocalDateTime createTime;
private LocalDateTime expireTime;
private boolean verified;
}
配置说明
server:
port: 8080
spring:
application:
name: bot-detection-demo
bot:
detection:
enabled: true
risk-threshold: 50
captcha-required-threshold: 30
temp-block-duration-minutes: 5
permanent-block-threshold: 80
fingerprint:
hash-algorithm: SHA-256
min-header-count: 5
behavior:
analysis-window-minutes: 10
max-requests-per-minute: 60
min-human-interval-ms: 3000
blacklist:
auto-add-high-risk: true
retention-days: 90
logging:
level:
com.example.bot: DEBUG
| 配置项 | 说明 | 默认值 |
|---|---|---|
| bot.detection.enabled | 是否启用爬虫检测 | true |
| bot.detection.risk-threshold | 风险分数阈值 | 50 |
| bot.detection.temp-block-duration-minutes | 临时封禁时长(分钟) | 5 |
| bot.detection.permanent-block-threshold | 永久封禁阈值 | 80 |
| bot.behavior.max-requests-per-minute | 每分钟最大请求数 | 60 |
风险评估维度
指纹维度
| 风险因素 | 分值 | 说明 |
|---|---|---|
| 设备指纹在黑名单 | +100 | 直接封禁 |
| IP地址在黑名单 | +60 | 高风险 |
| 可疑User-Agent | +30 | 工具类爬虫 |
| 缺少安全Header | +20 | 可能是模拟请求 |
| Header组合异常 | +25 | 伪造特征 |
行为维度
| 风险因素 | 分值 | 说明 |
|---|---|---|
| 高频访问 | +40 | >60次/分钟 |
| 请求间隔规律 | +30 | 固定间隔 |
| 页面重复率高 | +15 | >90%重复 |
| 时间规律性强 | +15 | 定时任务特征 |
常见问题
Q: 如何处理误封?
A: 采用以下策略减少误封:
- 渐进式处置:先验证码挑战,再临时封禁,最后永久封禁
- 白名单机制:搜索引擎爬虫、合作伙伴加入白名单
- 申诉通道:提供人工申诉入口
- 自动解封:临时封禁自动解除
Q: 如何应对高级爬虫?
A: 高级爬虫(如 Selenium、Playwright)可通过以下特征识别:
- 浏览器指纹:Canvas、WebGL、Audio 哈希
- 行为特征:鼠标轨迹、点击模式
- TLS 指纹:JA3 指纹
- JavaScript 执行:检测浏览器特有 API
Q: 如何平衡安全性与用户体验?
A: 建议采用分级策略:
- 低风险(0-29分):直接放行
- 中风险(30-49分):验证码挑战
- 高风险(50-79分):临时封禁
- 封禁(80+分):永久封禁
总结
通过本文的优化方案,我们可以实现:
- 设备指纹识别:多维度特征生成唯一指纹,提高伪造难度
- 行为分析:实时分析访问模式,识别异常行为
- 动态拦截:根据风险等级实施不同处置策略
- 渐进式处置:减少误伤,保护正常用户体验
- 完整审计:记录所有检测和处置行为
关键设计:
- DeviceFingerprint:设备指纹实体,记录多维度特征
- BehaviorAnalysisService:行为分析服务,识别异常模式
- FingerprintRiskAssessor:风险评估器,综合多维度评分
- BotDetectionFilter:动态拦截过滤器,根据风险等级处置
- BlockManager:封禁管理器,管理临时和永久封禁
在实际生产环境中,建议根据业务特点调整风险阈值和处置策略,在安全性和用户体验之间取得平衡。
源码获取
文章已同步至小程序博客栏目,需要源码的请关注小程序博客。
公众号:服务端技术精选
小程序码:
标题:爬虫指纹识别与动态拦截:绕过频率限制?设备指纹+行为分析精准封杀!
作者:jiangyi
地址:http://jiangyi.space/articles/2026/05/15/1778386990789.html
公众号:服务端技术精选
评论
0 评论