爬虫指纹识别与动态拦截:绕过频率限制?设备指纹+行为分析精准封杀!

在互联网时代,数据就是资产。爬虫作为获取数据的重要手段,既有正当的搜索引擎爬虫,也有恶意的竞争对手爬虫、数据盗窃爬虫。这些恶意爬虫往往:

  • 伪装成正常用户,绕过频率限制
  • 使用大量 IP 代理,规避 IP 封禁
  • 模拟浏览器行为,绕过基础检测
  • 凌晨高频访问,抢夺数据资源
  • 绕过反爬机制,持续获取数据

今天,我们来探讨如何构建一个爬虫指纹识别与动态拦截系统,通过设备指纹+行为分析精准识别并封杀恶意爬虫。

问题背景

恶意爬虫的常见特征

┌─────────────────────────────────────────────────────────────┐
│  恶意爬虫识别难点:                                          │
│                                                             │
│  1. 伪装正常用户:                                          │
│     - User-Agent 模拟真实浏览器                             │
│     - 使用 Selenium、Playwright 等工具                      │
│     - Cookie 和 Session 正常                                │
│                                                             │
│  2. 规避频率限制:                                          │
│     - 使用 IP 代理池,每次请求换 IP                        │
│     - 分布式爬虫,多台机器协同                               │
│     - 慢速爬取,伪装人类访问节奏                             │
│                                                             │
│  3. 绕过基础检测:                                          │
│     - JavaScript 渲染,绕过静态检测                          │
│     - TLS 指纹模拟,伪装正常客户端                           │
│     - 验证码识别,破解防护措施                               │
└─────────────────────────────────────────────────────────────┘

传统反爬手段的局限性

┌─────────────────────────────────────────────────────────────┐
│  传统反爬方案及其局限:                                      │
│                                                             │
│  1. IP 黑名单:                                             │
│     - 问题:代理 IP 太多,封不胜封                           │
│     - 误伤:移动网络 NAT 出口 IP 共享                        │
│                                                             │
│  2. 频率限制:                                              │
│     - 问题:分布式爬虫可控制请求间隔                         │
│     - 误伤:正常用户多设备同时访问                           │
│                                                             │
│  3. User-Agent 检测:                                       │
│     - 问题:可随意伪造                                       │
│     - 误伤:正常浏览器可能被误判                              │
│                                                             │
│  4. 验证码:                                               │
│     - 问题:影响用户体验,破解成本低                         │
│     - 误伤:老年用户、操作不便                               │
└─────────────────────────────────────────────────────────────┘

整体架构设计

核心思想

┌─────────────────────────────────────────────────────────────┐
│  爬虫指纹识别与动态拦截架构:                                  │
│                                                             │
│  1. 设备指纹:采集浏览器/客户端特征,形成唯一标识               │
│  2. 行为分析:分析用户操作行为,识别异常模式                  │
│  3. 风险评估:综合多维度信息,计算风险分数                    │
│  4. 动态拦截:根据风险等级,实施不同处置策略                  │
│                                                             │
│  关键设计:                                                  │
│  - 被动采集:不影响正常用户体验                               │
│  - 多维度指纹:提高伪造难度                                   │
│  - 实时检测:毫秒级响应                                       │
│  - 渐进式处置:减少误伤                                       │
└─────────────────────────────────────────────────────────────┘

架构流程图

请求进入
    ↓
提取请求特征(IP、UA、Header、Cookie)
    ↓
生成设备指纹
    ↓
查询指纹风险档案
    ↓
┌─────────────────────────────────────────┐
│  风险评估:                               │
│  - 设备指纹命中黑名单?                   │
│  - 行为异常?                             │
│  - 请求特征缺失?                         │
│  - 访问频率异常?                         │
└─────────────────────────────────────────┘
    ↓
风险分数 >= 阈值?
    ↓
┌─────────────────────────────┐
│  低风险:放行               │
│  中风险:验证码挑战         │
│  高风险:拦截 + 记录        │
│  黑名单:直接拒绝           │
└─────────────────────────────┘

核心代码实现

1. 设备指纹实体

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Entity
@Table(name = "device_fingerprint")
public class DeviceFingerprint {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String fingerprint;

    private String ip;

    private String userAgent;

    private String acceptLanguage;

    private String acceptEncoding;

    private String accept;

    private String referer;

    private String cookie;

    private String secChUa;

    private String secChUaMobile;

    private String secChUaPlatform;

    private String secFetchDest;

    private String secFetchMode;

    private String secFetchSite;

    private String secFetchUser;

    private String upgradeInsecureRequests;

    private Integer screenWidth;

    private Integer screenHeight;

    private Integer colorDepth;

    private String timezone;

    private String language;

    private String platform;

    private String hardwareConcurrency;

    private String deviceMemory;

    private String touchPoints;

    private String canvasHash;

    private String webglHash;

    private String audioHash;

    private String fontsHash;

    private Integer riskScore;

    private String riskLevel;

    private String status;

    private LocalDateTime firstSeenTime;

    private LocalDateTime lastSeenTime;

    private Integer visitCount;

    private Integer blockCount;
}

public enum RiskLevel {
    LOW("低风险"),
    MEDIUM("中风险"),
    HIGH("高风险"),
    BLOCKED("已封禁");

    private final String description;
}

2. 设备指纹服务

@Service
@Slf4j
public class DeviceFingerprintService {

    @Autowired
    private DeviceFingerprintRepository repository;

    @Autowired
    private FingerprintRiskAssessor riskAssessor;

    private final Map<String, String> ipBlacklist = new ConcurrentHashMap<>();

    public String generateFingerprint(HttpServletRequest request) {
        String ip = getClientIp(request);
        String userAgent = request.getHeader("User-Agent");
        String secChUa = request.getHeader("Sec-CH-UA");
        String secFetchDest = request.getHeader("Sec-Fetch-Dest");
        String secFetchMode = request.getHeader("Sec-Fetch-Mode");
        String secFetchSite = request.getHeader("Sec-Fetch-Site");
        String secFetchUser = request.getHeader("Sec-Fetch-User");

        String canvasHash = request.getHeader("X-Canvas-Hash");
        String webglHash = request.getHeader("X-WebGL-Hash");
        String audioHash = request.getHeader("X-Audio-Hash");
        String fontsHash = request.getHeader("X-Fonts-Hash");

        String rawFingerprint = String.join("|",
                ip,
                userAgent != null ? userAgent : "",
                secChUa != null ? secChUa : "",
                secFetchDest != null ? secFetchDest : "",
                secFetchMode != null ? secFetchMode : "",
                secFetchSite != null ? secFetchSite : "",
                canvasHash != null ? canvasHash : "",
                webglHash != null ? webglHash : "",
                audioHash != null ? audioHash : "",
                fontsHash != null ? fontsHash : ""
        );

        return hashFingerprint(rawFingerprint);
    }

    public DeviceFingerprint getOrCreateFingerprint(HttpServletRequest request) {
        String fingerprint = generateFingerprint(request);

        Optional<DeviceFingerprint> existing = repository.findByFingerprint(fingerprint);

        if (existing.isPresent()) {
            DeviceFingerprint fp = existing.get();
            fp.setLastSeenTime(LocalDateTime.now());
            fp.setVisitCount(fp.getVisitCount() + 1);
            return repository.save(fp);
        }

        DeviceFingerprint fp = DeviceFingerprint.builder()
                .fingerprint(fingerprint)
                .ip(getClientIp(request))
                .userAgent(request.getHeader("User-Agent"))
                .acceptLanguage(request.getHeader("Accept-Language"))
                .acceptEncoding(request.getHeader("Accept-Encoding"))
                .accept(request.getHeader("Accept"))
                .referer(request.getHeader("Referer"))
                .secChUa(request.getHeader("Sec-CH-UA"))
                .secChUaMobile(request.getHeader("Sec-CH-UA-Mobile"))
                .secChUaPlatform(request.getHeader("Sec-CH-UA-Platform"))
                .secFetchDest(request.getHeader("Sec-Fetch-Dest"))
                .secFetchMode(request.getHeader("Sec-Fetch-Mode"))
                .secFetchSite(request.getHeader("Sec-Fetch-Site"))
                .secFetchUser(request.getHeader("Sec-Fetch-User"))
                .upgradeInsecureRequests(request.getHeader("Upgrade-Insecure-Requests"))
                .riskScore(0)
                .riskLevel(RiskLevel.LOW.name())
                .status("ACTIVE")
                .firstSeenTime(LocalDateTime.now())
                .lastSeenTime(LocalDateTime.now())
                .visitCount(1)
                .blockCount(0)
                .build();

        return repository.save(fp);
    }

    public void updateRiskScore(String fingerprint, int score) {
        repository.findByFingerprint(fingerprint).ifPresent(fp -> {
            fp.setRiskScore(score);
            fp.setRiskLevel(calculateRiskLevel(score));
            fp.setLastSeenTime(LocalDateTime.now());
            repository.save(fp);
        });
    }

    private RiskLevel calculateRiskLevel(int score) {
        if (score >= 80) {
            return RiskLevel.HIGH;
        } else if (score >= 50) {
            return RiskLevel.MEDIUM;
        } else {
            return RiskLevel.LOW;
        }
    }

    private String hashFingerprint(String raw) {
        try {
            MessageDigest md = MessageDigest.getInstance("SHA-256");
            byte[] hash = md.digest(raw.getBytes(StandardCharsets.UTF_8));
            return Base64.getEncoder().encodeToString(hash).substring(0, 32);
        } catch (NoSuchAlgorithmException e) {
            return UUID.randomUUID().toString();
        }
    }

    private String getClientIp(HttpServletRequest request) {
        String ip = request.getHeader("X-Forwarded-For");
        if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getHeader("Proxy-Client-IP");
        }
        if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getHeader("WL-Proxy-Client-IP");
        }
        if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getRemoteAddr();
        }
        if (ip != null && ip.contains(",")) {
            ip = ip.split(",")[0].trim();
        }
        return ip;
    }
}

3. 行为分析服务

@Service
@Slf4j
public class BehaviorAnalysisService {

    @Autowired
    private VisitRecordRepository visitRecordRepository;

    @Autowired
    private BehaviorPatternRepository patternRepository;

    private final Map<String, Deque<Long>> requestTimestamps = new ConcurrentHashMap<>();
    private final Map<String, Deque<Integer>> requestIntervals = new ConcurrentHashMap<>();

    private static final int WINDOW_SIZE = 20;
    private static final int MIN_HUMAN_INTERVAL_MS = 3000;

    public BehaviorAnalysis analyze(String fingerprint, String ip) {
        List<VisitRecord> recentVisits = visitRecordRepository
                .findRecentByFingerprint(fingerprint, LocalDateTime.now().minusMinutes(10));

        BehaviorAnalysis analysis = BehaviorAnalysis.builder()
                .fingerprint(fingerprint)
                .ip(ip)
                .visitCount(recentVisits.size())
                .analyzedAt(LocalDateTime.now())
                .build();

        if (recentVisits.isEmpty()) {
            analysis.setAnomalyScore(0);
            return analysis;
        }

        int score = 0;

        score += analyzeFrequency(recentVisits);
        score += analyzeIntervalPattern(recentVisits);
        score += analyzePageSequence(recentVisits);
        score += analyzeTimePattern(recentVisits);

        analysis.setAnomalyScore(Math.min(100, score));
        analysis.setDetails(buildDetails(recentVisits));

        log.debug("行为分析结果: fingerprint={}, score={}", fingerprint, score);

        return analysis;
    }

    private int analyzeFrequency(List<VisitRecord> visits) {
        if (visits.size() < 5) {
            return 0;
        }

        LocalDateTime first = visits.get(0).getVisitTime();
        LocalDateTime last = visits.get(visits.size() - 1).getVisitTime();
        long durationMinutes = java.time.Duration.between(first, last).toMinutes();

        if (durationMinutes == 0) {
            durationMinutes = 1;
        }

        double requestsPerMinute = visits.size() / (double) durationMinutes;

        if (requestsPerMinute > 60) {
            return 40;
        } else if (requestsPerMinute > 30) {
            return 25;
        } else if (requestsPerMinute > 10) {
            return 10;
        }
        return 0;
    }

    private int analyzeIntervalPattern(List<VisitRecord> visits) {
        if (visits.size() < 3) {
            return 0;
        }

        List<Integer> intervals = new ArrayList<>();
        for (int i = 1; i < visits.size(); i++) {
            long intervalMs = java.time.Duration
                    .between(visits.get(i - 1).getVisitTime(), visits.get(i).getVisitTime())
                    .toMillis();
            intervals.add((int) intervalMs);
        }

        double avgInterval = intervals.stream().mapToInt(Integer::intValue).average().orElse(0);
        double variance = intervals.stream()
                .mapToDouble(i -> Math.pow(i - avgInterval, 2))
                .average().orElse(0);
        double stdDev = Math.sqrt(variance);

        if (avgInterval < MIN_HUMAN_INTERVAL_MS && stdDev < 100) {
            return 30;
        }

        if (stdDev < 50) {
            return 20;
        }

        return 0;
    }

    private int analyzePageSequence(List<VisitRecord> visits) {
        Set<String> uniquePages = visits.stream()
                .map(VisitRecord::getUrl)
                .collect(Collectors.toSet());

        double pageRepetition = 1.0 - (uniquePages.size() / (double) visits.size());

        if (pageRepetition > 0.9) {
            return 15;
        } else if (pageRepetition > 0.7) {
            return 10;
        }
        return 0;
    }

    private int analyzeTimePattern(List<VisitRecord> visits) {
        Set<HourMinute> timePoints = visits.stream()
                .map(v -> new HourMinute(v.getVisitTime().getHour(), v.getVisitTime().getMinute()))
                .collect(Collectors.toSet());

        if (timePoints.size() < 3) {
            return 0;
        }

        List<HourMinute> sorted = timePoints.stream().sorted().collect(Collectors.toList());
        boolean hasRegularPattern = false;

        for (int i = 1; i < sorted.size(); i++) {
            int diff1 = sorted.get(i).toMinutes() - sorted.get(i - 1).toMinutes();
            if (i > 1) {
                int diff2 = sorted.get(i - 1).toMinutes() - sorted.get(i - 2).toMinutes();
                if (Math.abs(diff1 - diff2) < 2) {
                    hasRegularPattern = true;
                    break;
                }
            }
        }

        return hasRegularPattern ? 15 : 0;
    }

    private Map<String, Object> buildDetails(List<VisitRecord> visits) {
        Map<String, Object> details = new HashMap<>();
        details.put("recentUrls", visits.stream()
                .map(VisitRecord::getUrl)
                .limit(10)
                .collect(Collectors.toList()));
        details.put("firstVisit", visits.get(0).getVisitTime());
        details.put("lastVisit", visits.get(visits.size() - 1).getVisitTime());
        return details;
    }

    public void recordVisit(String fingerprint, String ip, String url) {
        VisitRecord record = VisitRecord.builder()
                .fingerprint(fingerprint)
                .ip(ip)
                .url(url)
                .visitTime(LocalDateTime.now())
                .build();

        visitRecordRepository.save(record);

        List<VisitRecord> recentVisits = visitRecordRepository
                .findRecentByFingerprint(fingerprint, LocalDateTime.now().minusMinutes(10));
        if (recentVisits.size() > 100) {
            log.warn("高频访问告警: fingerprint={}, count={}", fingerprint, recentVisits.size());
        }
    }

    @Data
    @AllArgsConstructor
    private static class HourMinute implements Comparable<HourMinute> {
        private int hour;
        private int minute;

        public int toMinutes() {
            return hour * 60 + minute;
        }

        @Override
        public int compareTo(HourMinute o) {
            return Integer.compare(this.toMinutes(), o.toMinutes());
        }
    }
}

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class BehaviorAnalysis {
    private String fingerprint;
    private String ip;
    private int visitCount;
    private int anomalyScore;
    private LocalDateTime analyzedAt;
    private Map<String, Object> details;
}

4. 风险评估器

@Component
@Slf4j
public class FingerprintRiskAssessor {

    @Autowired
    private DeviceFingerprintRepository fingerprintRepository;

    @Autowired
    private BlacklistRepository blacklistRepository;

    @Autowired
    private BehaviorAnalysisService behaviorAnalysisService;

    private final Pattern botUserAgents = Pattern.compile(
            "(curl|wget|python|scrapy|java|go-http|okhttp|apache-httpclient|libwww|perl|node-fetch|axios|got|reqwest|httpx)"
    );

    public RiskAssessment assess(HttpServletRequest request, DeviceFingerprint fingerprint) {
        int totalScore = 0;
        List<String> riskFactors = new ArrayList<>();

        if (isBlacklisted(fingerprint.getFingerprint())) {
            totalScore += 100;
            riskFactors.add("设备指纹在黑名单中");
        }

        if (isIpBlacklisted(fingerprint.getIp())) {
            totalScore += 60;
            riskFactors.add("IP地址在黑名单中");
        }

        if (isSuspiciousUserAgent(fingerprint.getUserAgent())) {
            totalScore += 30;
            riskFactors.add("可疑User-Agent: " + fingerprint.getUserAgent());
        }

        if (isMissingSecurityHeaders(request)) {
            totalScore += 20;
            riskFactors.add("缺少安全Header");
        }

        if (isAbnormalHeaderCombination(fingerprint)) {
            totalScore += 25;
            riskFactors.add("Header组合异常");
        }

        BehaviorAnalysis behavior = behaviorAnalysisService.analyze(
                fingerprint.getFingerprint(), fingerprint.getIp());
        totalScore += behavior.getAnomalyScore();

        if (behavior.getAnomalyScore() > 0) {
            riskFactors.add("行为异常: score=" + behavior.getAnomalyScore());
        }

        RiskLevel level = calculateRiskLevel(totalScore);
        RiskAssessment assessment = RiskAssessment.builder()
                .fingerprint(fingerprint.getFingerprint())
                .ip(fingerprint.getIp())
                .riskScore(totalScore)
                .riskLevel(level)
                .riskFactors(riskFactors)
                .behaviorAnalysis(behavior)
                .assessmentTime(LocalDateTime.now())
                .build();

        fingerprint.setRiskScore(totalScore);
        fingerprint.setRiskLevel(level.name());
        fingerprintRepository.save(fingerprint);

        if (level == RiskLevel.HIGH || level == RiskLevel.BLOCKED) {
            log.warn("高风险请求检测: fingerprint={}, ip={}, score={}, level={}",
                    fingerprint.getFingerprint(), fingerprint.getIp(), totalScore, level);
        }

        return assessment;
    }

    private boolean isBlacklisted(String fingerprint) {
        return blacklistRepository.findByFingerprintAndStatus(fingerprint, "BLACKLISTED").isPresent();
    }

    private boolean isIpBlacklisted(String ip) {
        return blacklistRepository.findByIpAndStatus(ip, "BLACKLISTED").isPresent();
    }

    private boolean isSuspiciousUserAgent(String userAgent) {
        if (userAgent == null || userAgent.isEmpty()) {
            return true;
        }
        return botUserAgents.matcher(userAgent.toLowerCase()).find();
    }

    private boolean isMissingSecurityHeaders(HttpServletRequest request) {
        int missing = 0;
        if (request.getHeader("Sec-CH-UA") == null) missing++;
        if (request.getHeader("Sec-Fetch-Dest") == null) missing++;
        if (request.getHeader("Sec-Fetch-Mode") == null) missing++;
        if (request.getHeader("Sec-Fetch-Site") == null) missing++;
        if (request.getHeader("Sec-Fetch-User") == null) missing++;
        return missing >= 4;
    }

    private boolean isAbnormalHeaderCombination(DeviceFingerprint fp) {
        boolean hasSecChUa = fp.getSecChUa() != null;
        boolean hasSecFetch = fp.getSecFetchDest() != null && fp.getSecFetchMode() != null;

        if (hasSecChUa && !hasSecFetch) {
            return true;
        }

        if (fp.getUserAgent() != null && fp.getUserAgent().contains("Chrome")
                && (fp.getSecChUa() == null || fp.getSecChUa().isEmpty())) {
            return true;
        }

        return false;
    }

    private RiskLevel calculateRiskLevel(int score) {
        if (score >= 80) {
            return RiskLevel.BLOCKED;
        } else if (score >= 50) {
            return RiskLevel.HIGH;
        } else if (score >= 30) {
            return RiskLevel.MEDIUM;
        } else {
            return RiskLevel.LOW;
        }
    }
}

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class RiskAssessment {
    private String fingerprint;
    private String ip;
    private int riskScore;
    private RiskLevel riskLevel;
    private List<String> riskFactors;
    private BehaviorAnalysis behaviorAnalysis;
    private LocalDateTime assessmentTime;
}

5. 动态拦截过滤器

@Component
@Slf4j
public class BotDetectionFilter extends OncePerRequestFilter {

    @Autowired
    private DeviceFingerprintService fingerprintService;

    @Autowired
    private FingerprintRiskAssessor riskAssessor;

    @Autowired
    private BehaviorAnalysisService behaviorAnalysisService;

    @Autowired
    private BlockManager blockManager;

    @Autowired
    private CaptchaService captchaService;

    @Value("${bot.detection.enabled:true}")
    private boolean detectionEnabled;

    @Value("${bot.detection.risk-threshold:50}")
    private int riskThreshold;

    @Override
    protected void doFilterInternal(HttpServletRequest request,
                                    HttpServletResponse response,
                                    FilterChain filterChain)
            throws ServletException, IOException {

        if (!detectionEnabled) {
            filterChain.doFilter(request, response);
            return;
        }

        String path = request.getRequestURI();
        if (isExcludedPath(path)) {
            filterChain.doFilter(request, response);
            return;
        }

        try {
            DeviceFingerprint fingerprint = fingerprintService.getOrCreateFingerprint(request);

            if (blockManager.isBlocked(fingerprint.getFingerprint(), fingerprint.getIp())) {
                log.warn("请求被拦截: fingerprint={}, ip={}",
                        fingerprint.getFingerprint(), fingerprint.getIp());
                writeBlockedResponse(response, "请求被拦截");
                return;
            }

            RiskAssessment assessment = riskAssessor.assess(request, fingerprint);

            behaviorAnalysisService.recordVisit(
                    fingerprint.getFingerprint(),
                    fingerprint.getIp(),
                    path
            );

            switch (assessment.getRiskLevel()) {
                case LOW:
                    filterChain.doFilter(request, response);
                    break;

                case MEDIUM:
                    handleMediumRisk(request, response, filterChain, assessment);
                    break;

                case HIGH:
                    handleHighRisk(request, response, filterChain, assessment);
                    break;

                case BLOCKED:
                    handleBlocked(request, response, filterChain, assessment);
                    break;
            }

        } catch (Exception e) {
            log.error("Bot检测异常", e);
            filterChain.doFilter(request, response);
        }
    }

    private void handleMediumRisk(HttpServletRequest request,
                                  HttpServletResponse response,
                                  FilterChain filterChain,
                                  RiskAssessment assessment) throws IOException {
        log.info("中风险请求: fingerprint={}, score={}",
                assessment.getFingerprint(), assessment.getRiskScore());

        response.setHeader("X-Risk-Level", "MEDIUM");
        response.setHeader("X-Captcha-Required", "true");
        response.setHeader("X-Risk-Score", String.valueOf(assessment.getRiskScore()));

        if ("true".equals(request.getHeader("X-Captcha-Token"))) {
            if (captchaService.verify(request.getHeader("X-Captcha-Token"))) {
                filterChain.doFilter(request, response);
            } else {
                writeBlockedResponse(response, "验证码失败");
            }
        } else {
            writeCaptchaChallenge(response, assessment);
        }
    }

    private void handleHighRisk(HttpServletRequest request,
                                HttpServletResponse response,
                                FilterChain filterChain,
                                RiskAssessment assessment) throws IOException {
        log.warn("高风险请求: fingerprint={}, ip={}, score={}",
                assessment.getFingerprint(), assessment.getIp(), assessment.getRiskScore());

        blockManager.tempBlock(assessment.getFingerprint(), assessment.getIp(), 5);

        writeBlockedResponse(response, "请求异常,请稍后重试");
    }

    private void handleBlocked(HttpServletRequest request,
                               HttpServletResponse response,
                               FilterChain filterChain,
                               RiskAssessment assessment) throws IOException {
        log.error("已封禁请求: fingerprint={}, ip={}, score={}",
                assessment.getFingerprint(), assessment.getIp(), assessment.getRiskScore());

        blockManager.permanentBlock(assessment.getFingerprint(), assessment.getIp());

        writeBlockedResponse(response, "请求被拒绝");
    }

    private void writeBlockedResponse(HttpServletResponse response, String message) throws IOException {
        response.setStatus(HttpServletResponse.SC_FORBIDDEN);
        response.setContentType("application/json");
        response.getWriter().write(String.format(
                "{\"success\":false,\"message\":\"%s\",\"code\":\"BLOCKED\"}", message));
    }

    private void writeCaptchaChallenge(HttpServletResponse response, RiskAssessment assessment) throws IOException {
        response.setStatus(HttpServletResponse.SC_UNAUTHORIZED);
        response.setContentType("application/json");
        response.getWriter().write(String.format(
                "{\"success\":false,\"message\":\"请完成验证\",\"code\":\"CAPTCHA_REQUIRED\",\"riskScore\":%d}",
                assessment.getRiskScore()));
    }

    private boolean isExcludedPath(String path) {
        return path.startsWith("/static") ||
               path.startsWith("/health") ||
               path.startsWith("/actuator") ||
               path.equals("/favicon.ico");
    }
}

6. 封禁管理器

@Component
@Slf4j
public class BlockManager {

    @Autowired
    private BlacklistRepository blacklistRepository;

    private final Map<String, Long> tempBlockCache = new ConcurrentHashMap<>();
    private final Map<String, Long> permanentBlockCache = new ConcurrentHashMap<>();

    private static final long TEMP_BLOCK_DURATION_MS = 5 * 60 * 1000;

    public boolean isBlocked(String fingerprint, String ip) {
        if (permanentBlockCache.containsKey(fingerprint) ||
            permanentBlockCache.containsKey(ip)) {
            return true;
        }

        Long blockTime = tempBlockCache.get(fingerprint);
        if (blockTime != null && System.currentTimeMillis() < blockTime) {
            return true;
        }

        blockTime = tempBlockCache.get(ip);
        if (blockTime != null && System.currentTimeMillis() < blockTime) {
            return true;
        }

        return blacklistRepository.findByFingerprintAndStatus(fingerprint, "BLACKLISTED").isPresent() ||
               blacklistRepository.findByIpAndStatus(ip, "BLACKLISTED").isPresent();
    }

    public void tempBlock(String fingerprint, String ip, int durationMinutes) {
        long expireTime = System.currentTimeMillis() + durationMinutes * 60 * 1000L;

        tempBlockCache.put(fingerprint, expireTime);
        tempBlockCache.put(ip, expireTime);

        log.info("临时封禁: fingerprint={}, ip={}, duration={}min",
                fingerprint, ip, durationMinutes);
    }

    public void permanentBlock(String fingerprint, String ip) {
        permanentBlockCache.put(fingerprint, System.currentTimeMillis());
        permanentBlockCache.put(ip, System.currentTimeMillis());

        saveToBlacklist(fingerprint, ip, "PERMANENT");

        log.warn("永久封禁: fingerprint={}, ip={}", fingerprint, ip);
    }

    private void saveToBlacklist(String fingerprint, String ip, String type) {
        blacklistRepository.findByFingerprint(fingerprint).ifPresentOrElse(
                record -> {
                    record.setStatus("BLACKLISTED");
                    record.setBlockType(type);
                    record.setBlockTime(LocalDateTime.now());
                    blacklistRepository.save(record);
                },
                () -> {
                    Blacklist blacklist = Blacklist.builder()
                            .fingerprint(fingerprint)
                            .ip(ip)
                            .status("BLACKLISTED")
                            .blockType(type)
                            .blockTime(LocalDateTime.now())
                            .reason("高风险自动封禁")
                            .build();
                    blacklistRepository.save(blacklist);
                }
        );
    }

    public void unblock(String fingerprint, String ip) {
        tempBlockCache.remove(fingerprint);
        tempBlockCache.remove(ip);
        permanentBlockCache.remove(fingerprint);
        permanentBlockCache.remove(ip);

        blacklistRepository.findByFingerprintAndStatus(fingerprint, "BLACKLISTED")
                .ifPresent(record -> {
                    record.setStatus("REMOVED");
                    blacklistRepository.save(record);
                });

        log.info("解除封禁: fingerprint={}, ip={}", fingerprint, ip);
    }
}

7. 验证码服务

@Service
@Slf4j
public class CaptchaService {

    private final Map<String, CaptchaToken> tokenStore = new ConcurrentHashMap<>();

    public String generateToken(String fingerprint, String ip) {
        String token = UUID.randomUUID().toString();
        CaptchaToken captchaToken = CaptchaToken.builder()
                .token(token)
                .fingerprint(fingerprint)
                .ip(ip)
                .createTime(LocalDateTime.now())
                .expireTime(LocalDateTime.now().plusMinutes(5))
                .verified(false)
                .build();

        tokenStore.put(token, captchaToken);
        log.info("生成验证码Token: token={}, fingerprint={}", token, fingerprint);

        return token;
    }

    public boolean verify(String token) {
        if (token == null || token.isEmpty()) {
            return false;
        }

        CaptchaToken captchaToken = tokenStore.get(token);
        if (captchaToken == null) {
            log.warn("验证码Token不存在: token={}", token);
            return false;
        }

        if (LocalDateTime.now().isAfter(captchaToken.getExpireTime())) {
            log.warn("验证码Token已过期: token={}", token);
            tokenStore.remove(token);
            return false;
        }

        if (captchaToken.isVerified()) {
            log.warn("验证码Token已使用: token={}", token);
            return false;
        }

        captchaToken.setVerified(true);
        tokenStore.put(token, captchaToken);

        log.info("验证码Token验证成功: token={}", token);
        return true;
    }

    public boolean requiresCaptcha(RiskAssessment assessment) {
        return assessment.getRiskLevel() == RiskLevel.MEDIUM ||
               assessment.getRiskScore() >= 30;
    }
}

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
class CaptchaToken {
    private String token;
    private String fingerprint;
    private String ip;
    private LocalDateTime createTime;
    private LocalDateTime expireTime;
    private boolean verified;
}

配置说明

server:
  port: 8080

spring:
  application:
    name: bot-detection-demo

bot:
  detection:
    enabled: true
    risk-threshold: 50
    captcha-required-threshold: 30
    temp-block-duration-minutes: 5
    permanent-block-threshold: 80

  fingerprint:
    hash-algorithm: SHA-256
    min-header-count: 5

  behavior:
    analysis-window-minutes: 10
    max-requests-per-minute: 60
    min-human-interval-ms: 3000

  blacklist:
    auto-add-high-risk: true
    retention-days: 90

logging:
  level:
    com.example.bot: DEBUG
配置项说明默认值
bot.detection.enabled是否启用爬虫检测true
bot.detection.risk-threshold风险分数阈值50
bot.detection.temp-block-duration-minutes临时封禁时长(分钟)5
bot.detection.permanent-block-threshold永久封禁阈值80
bot.behavior.max-requests-per-minute每分钟最大请求数60

风险评估维度

指纹维度

风险因素分值说明
设备指纹在黑名单+100直接封禁
IP地址在黑名单+60高风险
可疑User-Agent+30工具类爬虫
缺少安全Header+20可能是模拟请求
Header组合异常+25伪造特征

行为维度

风险因素分值说明
高频访问+40>60次/分钟
请求间隔规律+30固定间隔
页面重复率高+15>90%重复
时间规律性强+15定时任务特征

常见问题

Q: 如何处理误封?

A: 采用以下策略减少误封:

  1. 渐进式处置:先验证码挑战,再临时封禁,最后永久封禁
  2. 白名单机制:搜索引擎爬虫、合作伙伴加入白名单
  3. 申诉通道:提供人工申诉入口
  4. 自动解封:临时封禁自动解除

Q: 如何应对高级爬虫?

A: 高级爬虫(如 Selenium、Playwright)可通过以下特征识别:

  1. 浏览器指纹:Canvas、WebGL、Audio 哈希
  2. 行为特征:鼠标轨迹、点击模式
  3. TLS 指纹:JA3 指纹
  4. JavaScript 执行:检测浏览器特有 API

Q: 如何平衡安全性与用户体验?

A: 建议采用分级策略:

  1. 低风险(0-29分):直接放行
  2. 中风险(30-49分):验证码挑战
  3. 高风险(50-79分):临时封禁
  4. 封禁(80+分):永久封禁

总结

通过本文的优化方案,我们可以实现:

  1. 设备指纹识别:多维度特征生成唯一指纹,提高伪造难度
  2. 行为分析:实时分析访问模式,识别异常行为
  3. 动态拦截:根据风险等级实施不同处置策略
  4. 渐进式处置:减少误伤,保护正常用户体验
  5. 完整审计:记录所有检测和处置行为

关键设计

  • DeviceFingerprint:设备指纹实体,记录多维度特征
  • BehaviorAnalysisService:行为分析服务,识别异常模式
  • FingerprintRiskAssessor:风险评估器,综合多维度评分
  • BotDetectionFilter:动态拦截过滤器,根据风险等级处置
  • BlockManager:封禁管理器,管理临时和永久封禁

在实际生产环境中,建议根据业务特点调整风险阈值和处置策略,在安全性和用户体验之间取得平衡。


源码获取

文章已同步至小程序博客栏目,需要源码的请关注小程序博客。

公众号:服务端技术精选

小程序码:


标题:爬虫指纹识别与动态拦截:绕过频率限制?设备指纹+行为分析精准封杀!
作者:jiangyi
地址:http://jiangyi.space/articles/2026/05/15/1778386990789.html
公众号:服务端技术精选
    评论
    0 评论
avatar

取消