大数据量下第100万页怎么查?99%的程序员都搞错了!

大数据量下第100万页怎么查?99%的程序员都搞错了!

产品经理说要支持分页查询,用户可以查看第100万页的数据,你心想"这有什么难的,不就是LIMIT offset, size吗?"结果一上线,系统直接卡死,用户投诉不断...今天就来聊聊大数据量分页查询的正确姿势,让你的系统在百万级数据下依然丝滑如德芙!

一、传统分页的性能陷阱

在开始介绍解决方案之前,我们先来看看传统分页方式的问题。

1.1 LIMIT offset, size的问题

-- 看似简单的分页查询,背后隐藏着巨大的性能问题
SELECT * FROM user_behavior 
ORDER BY id 
LIMIT 10000000, 20;

让我们分析一下这个查询的执行过程:

// 传统分页的执行过程
public class TraditionalPagination {
    
    public void explainProblem() {
        System.out.println("=== 传统分页的执行过程 ===");
        System.out.println("1. 数据库扫描前10000020条记录");
        System.out.println("2. 丢弃前10000000条记录");
        System.out.println("3. 返回最后20条记录");
        System.out.println("4. 随着offset增大,性能急剧下降");
    }
}

1.2 性能测试对比

// 性能测试结果
public class PerformanceTest {
    
    public void testResults() {
        System.out.println("=== 不同offset下的查询耗时 ===");
        System.out.println("LIMIT 0, 20:     0.01秒");
        System.out.println("LIMIT 1000, 20:  0.05秒");
        System.out.println("LIMIT 10000, 20: 0.5秒");
        System.out.println("LIMIT 100000, 20: 5秒");
        System.out.println("LIMIT 1000000, 20: 50秒");
        System.out.println("LIMIT 10000000, 20: 500秒");
    }
}

二、大数据量分页的核心挑战

2.1 数据量与性能的关系

// 数据量对分页性能的影响
public class DataVolumeImpact {
    
    public void impactAnalysis() {
        System.out.println("=== 数据量对分页性能的影响 ===");
        System.out.println("数据量    |  查询第1页 |  查询第100页 |  查询第1000页");
        System.out.println("1万条    |  0.01秒   |  0.02秒     |  0.05秒");
        System.out.println("100万条  |  0.01秒   |  0.5秒      |  5秒");
        System.out.println("1亿条    |  0.01秒   |  50秒       |  500秒");
        System.out.println("结论:性能与offset成正比关系");
    }
}

2.2 用户行为分析

// 用户真实分页行为
public class UserBehaviorAnalysis {
    
    public void analysis() {
        System.out.println("=== 用户真实分页行为 ===");
        System.out.println("90%的用户只查看前10页");
        System.out.println("9%的用户查看10-100页");
        System.out.println("0.9%的用户查看100-1000页");
        System.out.println("0.1%的用户查看1000页以上");
        System.out.println("但一旦支持查看第100万页,系统就可能崩溃");
    }
}

三、正确的分页解决方案

3.1 游标分页(推荐)

游标分页是解决大数据量分页的最佳方案:

-- 游标分页实现
-- 第一页查询
SELECT * FROM user_behavior 
WHERE create_time >= '2023-01-01' 
ORDER BY create_time, id 
LIMIT 20;

-- 后续页查询(以上一页最后一条记录为游标)
SELECT * FROM user_behavior 
WHERE (create_time > '2023-01-01 10:30:45' 
       OR (create_time = '2023-01-01 10:30:45' AND id > 123456))
ORDER BY create_time, id 
LIMIT 20;
@Service
@Slf4j
public class CursorPaginationService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 游标分页查询
     */
    public PageResult<UserBehavior> cursorPagination(CursorPaginationRequest request) {
        String sql = "SELECT * FROM user_behavior WHERE ";
        
        // 构造查询条件
        if (request.getCursor() != null) {
            // 有游标的情况
            sql += "(create_time > ? OR (create_time = ? AND id > ?)) ";
        }
        sql += "ORDER BY create_time, id LIMIT ?";
        
        List<UserBehavior> data;
        if (request.getCursor() != null) {
            data = jdbcTemplate.query(sql, new UserBehaviorRowMapper(),
                    request.getCursor().getCreateTime(), 
                    request.getCursor().getCreateTime(), 
                    request.getCursor().getId(),
                    request.getPageSize());
        } else {
            // 第一页查询
            data = jdbcTemplate.query(sql, new UserBehaviorRowMapper(), 
                    request.getPageSize());
        }
        
        // 构造返回结果
        PageResult<UserBehavior> result = new PageResult<>();
        result.setData(data);
        result.setHasMore(data.size() == request.getPageSize());
        
        // 设置下一页游标
        if (!data.isEmpty()) {
            UserBehavior lastItem = data.get(data.size() - 1);
            Cursor nextCursor = new Cursor();
            nextCursor.setCreateTime(lastItem.getCreateTime());
            nextCursor.setId(lastItem.getId());
            result.setNextCursor(nextCursor);
        }
        
        return result;
    }
}

3.2 延迟关联分页

对于需要返回完整信息的场景,可以使用延迟关联:

-- 延迟关联分页
-- 第一步:只查询主键
SELECT id FROM user_behavior 
ORDER BY create_time DESC 
LIMIT 10000000, 20;

-- 第二步:根据主键查询完整信息
SELECT * FROM user_behavior 
WHERE id IN (123456, 123457, ...);
@Service
@Slf4j
public class DelayedJoinPaginationService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 延迟关联分页
     */
    public PageResult<UserBehavior> delayedJoinPagination(int pageNum, int pageSize) {
        long startTime = System.currentTimeMillis();
        
        try {
            // 第一步:查询主键列表
            String idSql = "SELECT id FROM user_behavior ORDER BY create_time DESC LIMIT ?, ?";
            List<Long> ids = jdbcTemplate.query(idSql, 
                (rs, rowNum) -> rs.getLong("id"), 
                (pageNum - 1) * pageSize, pageSize);
            
            if (ids.isEmpty()) {
                return PageResult.empty();
            }
            
            // 第二步:根据主键查询完整信息
            String dataSql = "SELECT * FROM user_behavior WHERE id IN (" + 
                           ids.stream().map(String::valueOf).collect(Collectors.joining(",")) + 
                           ") ORDER BY create_time DESC";
            List<UserBehavior> data = jdbcTemplate.query(dataSql, new UserBehaviorRowMapper());
            
            // 保证顺序与ID列表一致
            Map<Long, UserBehavior> dataMap = data.stream()
                .collect(Collectors.toMap(UserBehavior::getId, Function.identity()));
            List<UserBehavior> orderedData = ids.stream()
                .map(dataMap::get)
                .filter(Objects::nonNull)
                .collect(Collectors.toList());
            
            long endTime = System.currentTimeMillis();
            log.info("延迟关联分页查询完成: pageNum={}, pageSize={}, time={}ms", 
                    pageNum, pageSize, endTime - startTime);
            
            PageResult<UserBehavior> result = new PageResult<>();
            result.setData(orderedData);
            result.setPageNum(pageNum);
            result.setPageSize(pageSize);
            return result;
        } catch (Exception e) {
            log.error("延迟关联分页查询失败: pageNum={}, pageSize={}", pageNum, pageSize, e);
            throw new RuntimeException("分页查询失败", e);
        }
    }
}

3.3 覆盖索引分页

通过覆盖索引优化分页查询:

-- 创建覆盖索引
CREATE INDEX idx_user_behavior_cover ON user_behavior (create_time, id, user_id, action_type);

-- 使用覆盖索引的分页查询
SELECT id, user_id, action_type, create_time 
FROM user_behavior 
ORDER BY create_time DESC 
LIMIT 10000000, 20;
@Service
@Slf4j
public class CoverIndexPaginationService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 覆盖索引分页查询
     */
    public PageResult<UserBehaviorSummary> coverIndexPagination(int pageNum, int pageSize) {
        String sql = "SELECT id, user_id, action_type, create_time " +
                    "FROM user_behavior " +
                    "ORDER BY create_time DESC " +
                    "LIMIT ?, ?";
        
        List<UserBehaviorSummary> summaries = jdbcTemplate.query(sql, 
            new UserBehaviorSummaryRowMapper(), 
            (pageNum - 1) * pageSize, pageSize);
        
        PageResult<UserBehaviorSummary> result = new PageResult<>();
        result.setData(summaries);
        result.setPageNum(pageNum);
        result.setPageSize(pageSize);
        return result;
    }
    
    /**
     * 根据摘要信息查询完整详情(按需加载)
     */
    public List<UserBehavior> getFullDetails(List<Long> ids) {
        if (ids.isEmpty()) {
            return new ArrayList<>();
        }
        
        String sql = "SELECT * FROM user_behavior WHERE id IN (" + 
                    ids.stream().map(String::valueOf).collect(Collectors.joining(",")) + ")";
        
        return jdbcTemplate.query(sql, new UserBehaviorRowMapper());
    }
}

四、分页优化实战

4.1 分页缓存策略

@Service
@Slf4j
public class PaginationCacheService {
    
    @Autowired
    private RedisTemplate<String, Object> redisTemplate;
    
    // 分页结果缓存
    private final LoadingCache<String, PageResult<UserBehavior>> pageCache = 
        Caffeine.newBuilder()
            .maximumSize(1000)
            .expireAfterWrite(10, TimeUnit.MINUTES)
            .build(this::loadPageFromDatabase);
    
    /**
     * 带缓存的分页查询
     */
    public PageResult<UserBehavior> getCachedPage(PaginationRequest request) {
        String cacheKey = buildCacheKey(request);
        
        try {
            // 先从本地缓存获取
            PageResult<UserBehavior> result = pageCache.getIfPresent(cacheKey);
            if (result != null) {
                return result;
            }
            
            // 再从Redis获取
            result = (PageResult<UserBehavior>) redisTemplate.opsForValue().get(cacheKey);
            if (result != null) {
                pageCache.put(cacheKey, result);
                return result;
            }
            
            // 数据库查询
            result = loadPageFromDatabase(request);
            if (result != null) {
                // 回写缓存
                redisTemplate.opsForValue().set(cacheKey, result, Duration.ofMinutes(10));
                pageCache.put(cacheKey, result);
            }
            
            return result;
        } catch (Exception e) {
            log.error("分页缓存查询失败: cacheKey={}", cacheKey, e);
            return loadPageFromDatabase(request);
        }
    }
    
    private String buildCacheKey(PaginationRequest request) {
        return "pagination:" + request.getPageNum() + ":" + request.getPageSize() + 
               ":" + (request.getFilter() != null ? request.getFilter().hashCode() : "none");
    }
    
    private PageResult<UserBehavior> loadPageFromDatabase(PaginationRequest request) {
        // 实际的数据库查询逻辑
        return new PageResult<>();
    }
    
    private PageResult<UserBehavior> loadPageFromDatabase(String cacheKey) {
        // 根据缓存key解析查询参数并执行查询
        return new PageResult<>();
    }
}

4.2 异步预加载

@Service
@Slf4j
public class AsyncPreloadService {
    
    @Autowired
    private PaginationService paginationService;
    
    private final ExecutorService preloadExecutor = 
        Executors.newFixedThreadPool(5);
    
    /**
     * 异步预加载下一页数据
     */
    public void preloadNextPage(PaginationRequest currentRequest) {
        preloadExecutor.submit(() -> {
            try {
                PaginationRequest nextRequest = new PaginationRequest();
                nextRequest.setPageNum(currentRequest.getPageNum() + 1);
                nextRequest.setPageSize(currentRequest.getPageSize());
                nextRequest.setFilter(currentRequest.getFilter());
                
                // 预加载下一页数据到缓存
                paginationService.getCachedPage(nextRequest);
                
                log.info("预加载下一页数据完成: pageNum={}", nextRequest.getPageNum());
            } catch (Exception e) {
                log.error("预加载下一页数据失败", e);
            }
        });
    }
    
    /**
     * 批量预加载热门页面
     */
    public void preloadHotPages(List<Integer> pageNumbers, int pageSize) {
        pageNumbers.parallelStream().forEach(pageNum -> {
            try {
                PaginationRequest request = new PaginationRequest();
                request.setPageNum(pageNum);
                request.setPageSize(pageSize);
                
                paginationService.getCachedPage(request);
            } catch (Exception e) {
                log.error("预加载页面失败: pageNum={}", pageNum, e);
            }
        });
    }
}

五、特殊场景处理

5.1 按时间范围分页

@Service
@Slf4j
public class TimeRangePaginationService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 按时间范围分页
     */
    public PageResult<UserBehavior> timeRangePagination(TimeRangePaginationRequest request) {
        StringBuilder sql = new StringBuilder();
        sql.append("SELECT * FROM user_behavior WHERE 1=1 ");
        
        List<Object> params = new ArrayList<>();
        
        // 时间范围条件
        if (request.getStartTime() != null) {
            sql.append("AND create_time >= ? ");
            params.add(request.getStartTime());
        }
        if (request.getEndTime() != null) {
            sql.append("AND create_time <= ? ");
            params.add(request.getEndTime());
        }
        
        // 游标条件
        if (request.getCursor() != null) {
            sql.append("AND (create_time > ? OR (create_time = ? AND id > ?)) ");
            params.add(request.getCursor().getCreateTime());
            params.add(request.getCursor().getCreateTime());
            params.add(request.getCursor().getId());
        }
        
        sql.append("ORDER BY create_time, id LIMIT ?");
        params.add(request.getPageSize());
        
        List<UserBehavior> data = jdbcTemplate.query(sql.toString(), 
            new UserBehaviorRowMapper(), params.toArray());
        
        PageResult<UserBehavior> result = new PageResult<>();
        result.setData(data);
        result.setHasMore(data.size() == request.getPageSize());
        
        if (!data.isEmpty()) {
            UserBehavior lastItem = data.get(data.size() - 1);
            Cursor nextCursor = new Cursor();
            nextCursor.setCreateTime(lastItem.getCreateTime());
            nextCursor.setId(lastItem.getId());
            result.setNextCursor(nextCursor);
        }
        
        return result;
    }
}

5.2 复合条件分页

@Service
@Slf4j
public class ComplexConditionPaginationService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 复合条件分页查询
     */
    public PageResult<UserBehavior> complexPagination(ComplexPaginationRequest request) {
        StringBuilder countSql = new StringBuilder();
        StringBuilder dataSql = new StringBuilder();
        List<Object> params = new ArrayList<>();
        
        countSql.append("SELECT COUNT(*) FROM user_behavior WHERE 1=1 ");
        dataSql.append("SELECT * FROM user_behavior WHERE 1=1 ");
        
        // 用户ID条件
        if (request.getUserId() != null) {
            countSql.append("AND user_id = ? ");
            dataSql.append("AND user_id = ? ");
            params.add(request.getUserId());
        }
        
        // 行为类型条件
        if (request.getActionTypes() != null && !request.getActionTypes().isEmpty()) {
            String placeholders = request.getActionTypes().stream()
                .map(t -> "?")
                .collect(Collectors.joining(","));
            countSql.append("AND action_type IN (" + placeholders + ") ");
            dataSql.append("AND action_type IN (" + placeholders + ") ");
            params.addAll(request.getActionTypes());
        }
        
        // 时间范围条件
        if (request.getStartTime() != null) {
            countSql.append("AND create_time >= ? ");
            dataSql.append("AND create_time >= ? ");
            params.add(request.getStartTime());
        }
        if (request.getEndTime() != null) {
            countSql.append("AND create_time <= ? ");
            dataSql.append("AND create_time <= ? ");
            params.add(request.getEndTime());
        }
        
        // 游标条件
        if (request.getCursor() != null) {
            dataSql.append("AND (create_time > ? OR (create_time = ? AND id > ?)) ");
            params.add(request.getCursor().getCreateTime());
            params.add(request.getCursor().getCreateTime());
            params.add(request.getCursor().getId());
        }
        
        dataSql.append("ORDER BY create_time, id LIMIT ?");
        params.add(request.getPageSize());
        
        // 查询总数
        Long totalCount = jdbcTemplate.queryForObject(countSql.toString(), Long.class, params.toArray());
        
        // 查询数据
        List<UserBehavior> data = jdbcTemplate.query(dataSql.toString(), 
            new UserBehaviorRowMapper(), params.toArray());
        
        PageResult<UserBehavior> result = new PageResult<>();
        result.setData(data);
        result.setTotalCount(totalCount);
        result.setHasMore(data.size() == request.getPageSize());
        
        if (!data.isEmpty()) {
            UserBehavior lastItem = data.get(data.size() - 1);
            Cursor nextCursor = new Cursor();
            nextCursor.setCreateTime(lastItem.getCreateTime());
            nextCursor.setId(lastItem.getId());
            result.setNextCursor(nextCursor);
        }
        
        return result;
    }
}

六、监控与优化

6.1 性能监控

@Component
public class PaginationMetrics {
    
    private final MeterRegistry meterRegistry;
    private final Timer paginationTimer;
    private final Counter paginationErrorCounter;
    private final DistributionSummary pageSizeSummary;
    
    public PaginationMetrics(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        this.paginationTimer = Timer.builder("pagination.query.time")
                .description("分页查询耗时")
                .register(meterRegistry);
        this.paginationErrorCounter = Counter.builder("pagination.errors")
                .description("分页查询错误数")
                .register(meterRegistry);
        this.pageSizeSummary = DistributionSummary.builder("pagination.page.size")
                .description("分页大小分布")
                .register(meterRegistry);
    }
    
    public Sample startTimer() {
        return Timer.start(meterRegistry);
    }
    
    public void recordTimer(Sample sample, int pageSize, boolean success) {
        sample.stop(paginationTimer);
        pageSizeSummary.record(pageSize);
        
        if (!success) {
            paginationErrorCounter.increment();
        }
    }
    
    public void recordSlowQuery(int pageNum, long duration) {
        if (duration > 1000) { // 超过1秒的查询
            log.warn("慢查询警告: pageNum={}, duration={}ms", pageNum, duration);
        }
    }
}

6.2 数据库优化建议

-- 分页查询优化建议

-- 1. 为排序字段创建索引
CREATE INDEX idx_user_behavior_create_time ON user_behavior (create_time, id);

-- 2. 避免SELECT *,只查询需要的字段
SELECT id, user_id, action_type, create_time FROM user_behavior 
ORDER BY create_time DESC LIMIT 1000000, 20;

-- 3. 使用覆盖索引
CREATE INDEX idx_user_behavior_cover ON user_behavior (create_time, id, user_id, action_type);

-- 4. 分区表优化(按时间分区)
CREATE TABLE user_behavior (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    action_type TINYINT,
    create_time DATETIME,
    -- 其他字段
) PARTITION BY RANGE (YEAR(create_time)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

七、最佳实践总结

7.1 分页方案选择指南

public class PaginationStrategyGuide {
    
    public String selectStrategy(PaginationScenario scenario) {
        if (scenario.getDataVolume() < 100000) {
            return "传统LIMIT分页";
        }
        
        if (scenario.getConcurrentUsers() > 1000) {
            return "游标分页";
        }
        
        if (scenario.getPageSize() > 100) {
            return "延迟关联分页";
        }
        
        if (scenario.getQueryFrequency() > 1000) {
            return "覆盖索引分页";
        }
        
        return "游标分页"; // 默认推荐
    }
}

7.2 性能优化清单

public class PerformanceOptimizationChecklist {
    
    public void checklist() {
        System.out.println("=== 分页查询性能优化清单 ===");
        System.out.println("✅ 索引优化:为排序字段创建合适的索引");
        System.out.println("✅ 查询优化:避免SELECT *,只查询需要的字段");
        System.out.println("✅ 缓存策略:合理使用缓存减少数据库压力");
        System.out.println("✅ 分页策略:选择合适的分页方案");
        System.out.println("✅ 数据库配置:调整数据库参数优化性能");
        System.out.println("✅ 监控告警:建立完善的监控体系");
        System.out.println("✅ 预加载机制:异步预加载热门数据");
        System.out.println("✅ 分区表:大数据量时使用分区表");
    }
}

八、常见问题解答

8.1 为什么不能直接查第100万页?

public class CommonQuestions {
    
    public void explainWhy() {
        System.out.println("=== 为什么不能直接查第100万页? ===");
        System.out.println("1. 性能问题:数据库需要扫描并丢弃前999999页的数据");
        System.out.println("2. 用户体验:等待时间过长,用户会失去耐心");
        System.out.println("3. 系统稳定性:大量资源被占用,影响其他请求");
        System.out.println("4. 实际需求:99%的用户不会查看到第100万页");
        System.out.println("解决方案:使用游标分页或限制最大页码");
    }
}

8.2 如何限制最大页码?

@Service
public class MaxPageLimitService {
    
    private static final int MAX_PAGE_NUM = 10000; // 最大1万页
    
    public void validatePageNum(int pageNum) {
        if (pageNum > MAX_PAGE_NUM) {
            throw new IllegalArgumentException("页码不能超过" + MAX_PAGE_NUM);
        }
        
        if (pageNum < 1) {
            throw new IllegalArgumentException("页码必须大于0");
        }
    }
    
    public PageResult<UserBehavior> safePagination(int pageNum, int pageSize) {
        validatePageNum(pageNum);
        
        // 检查是否还有数据
        long totalCount = getTotalCount();
        long maxPageNum = (totalCount + pageSize - 1) / pageSize;
        
        if (pageNum > maxPageNum) {
            return PageResult.empty();
        }
        
        // 执行分页查询
        return performPagination(pageNum, pageSize);
    }
    
    private long getTotalCount() {
        // 获取总记录数
        return 0;
    }
    
    private PageResult<UserBehavior> performPagination(int pageNum, int pageSize) {
        // 执行分页查询逻辑
        return new PageResult<>();
    }
}

结语

大数据量分页查询是每个后端开发都会遇到的挑战,选择合适的分页策略对系统性能至关重要。通过本文介绍的游标分页、延迟关联分页、覆盖索引分页等方案,相信你能解决百万级数据分页的性能问题。

关键要点总结:

  1. 避免传统LIMIT分页:offset越大性能越差
  2. 推荐游标分页:适合无限滚动场景
  3. 合理使用缓存:减少数据库查询压力
  4. 索引优化:为排序字段创建合适的索引
  5. 监控告警:及时发现和解决性能问题

记住,技术的价值在于解决实际问题。在面对大数据量分页查询时,不要盲目使用传统方案,而要根据具体业务场景选择最合适的技术方案。

如果你觉得这篇文章对你有帮助,欢迎分享给更多的朋友。在分页查询优化的路上,我们一起成长!


关注「服务端技术精选」,获取更多干货技术文章!


标题:大数据量下第100万页怎么查?99%的程序员都搞错了!
作者:jiangyi
地址:http://jiangyi.space/articles/2025/12/21/1766304283220.html

    0 评论
avatar