Elasticsearch实现MySQL like%XX%实战:让你的模糊查询快如闪电!

Elasticsearch实现MySQL like%XX%实战:让你的模糊查询快如闪电!

作为一名后端开发,你有没有遇到过这样的场景:产品经理跑过来说:"我们要在搜索框里支持模糊查询,就像MySQL的like '%关键词%'一样!"你心里想:"这不就是个简单的模糊查询吗?"但当你在MySQL里执行like '%关键词%'时,DBA却告诉你:"千万别这么用,性能会爆炸的!"

今天就来聊聊如何用Elasticsearch实现MySQL like%XX%的功能,让你的模糊查询快如闪电!

一、MySQL like查询的性能问题

在开始讨论解决方案之前,我们先来看看MySQL like查询的性能问题。

1.1 like查询的性能瓶颈

-- 前缀匹配 - 性能较好(可以使用索引)
SELECT * FROM products WHERE name LIKE '手机%';

-- 后缀匹配 - 性能较差(无法使用索引)
SELECT * FROM products WHERE name LIKE '%手机';

-- 包含匹配 - 性能最差(无法使用索引)
SELECT * FROM products WHERE name LIKE '%手机%';

当使用%关键词%这种包含匹配时,MySQL无法使用索引,只能进行全表扫描,性能极其低下。

1.2 性能测试对比

假设我们有一个包含100万条记录的商品表:

-- 创建测试表
CREATE TABLE products (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price DECIMAL(10,2),
    category VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    INDEX idx_name (name),
    INDEX idx_category (category)
);

-- 插入测试数据
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
    DECLARE i INT DEFAULT 0;
    WHILE i < 1000000 DO
        INSERT INTO products (name, description, price, category) 
        VALUES (
            CONCAT('商品', i, '手机'),
            CONCAT('这是商品', i, '的详细描述,包含各种手机相关的信息'),
            ROUND(RAND() * 10000, 2),
            CASE ROUND(RAND() * 3)
                WHEN 0 THEN '电子产品'
                WHEN 1 THEN '手机数码'
                WHEN 2 THEN '家用电器'
                ELSE '其他'
            END
        );
        SET i = i + 1;
    END WHILE;
END$$
DELIMITER ;

CALL insert_test_data();

执行模糊查询的性能对比:

-- 前缀匹配查询(可以使用索引)
SELECT * FROM products WHERE name LIKE '商品123%' LIMIT 10;
-- 执行时间:约0.01秒

-- 包含匹配查询(无法使用索引)
SELECT * FROM products WHERE name LIKE '%123%' LIMIT 10;
-- 执行时间:约2-3秒

二、Elasticsearch的全文搜索优势

Elasticsearch作为专业的搜索引擎,在全文搜索方面有着天然的优势。

2.1 Elasticsearch的核心特性

  1. 倒排索引:为每个词建立索引,支持快速查找
  2. 分词器:支持多种语言的分词处理
  3. 分布式架构:支持水平扩展,处理海量数据
  4. 近实时搜索:数据写入后几乎立即可搜索

2.2 Elasticsearch与MySQL的对比

特性MySQLElasticsearch
索引类型B+树索引倒排索引
模糊查询全表扫描倒排索引查找
性能O(n)O(1)
分词支持有限丰富
分布式需要额外配置原生支持
实时性较差近实时

三、Elasticsearch实现模糊查询

3.1 环境准备

首先,我们需要安装和配置Elasticsearch:

# 下载并安装Elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz
cd elasticsearch-8.11.0

# 启动Elasticsearch
./bin/elasticsearch

3.2 创建索引和映射

# 创建商品索引
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "ik_smart_pinyin": {
          "type": "custom",
          "tokenizer": "ik_smart",
          "filter": ["pinyin_filter"]
        }
      },
      "filter": {
        "pinyin_filter": {
          "type": "pinyin",
          "keep_separate_first_letter": false,
          "keep_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "lowercase": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "long"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_smart_pinyin",
        "search_analyzer": "ik_smart_pinyin"
      },
      "description": {
        "type": "text",
        "analyzer": "ik_smart_pinyin",
        "search_analyzer": "ik_smart_pinyin"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "category": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      }
    }
  }
}'

3.3 数据同步

我们需要将MySQL中的数据同步到Elasticsearch:

@Service
public class ProductService {
    
    @Autowired
    private ProductRepository productRepository;
    
    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;
    
    /**
     * 同步MySQL数据到Elasticsearch
     */
    public void syncDataToElasticsearch() {
        // 删除现有索引
        elasticsearchTemplate.deleteIndex(ProductDocument.class);
        elasticsearchTemplate.createIndex(ProductDocument.class);
        elasticsearchTemplate.putMapping(ProductDocument.class);
        
        // 分批同步数据
        int pageSize = 1000;
        int page = 0;
        List<Product> products;
        
        do {
            Pageable pageable = PageRequest.of(page, pageSize);
            products = productRepository.findAll(pageable).getContent();
            
            // 转换为Elasticsearch文档
            List<ProductDocument> documents = products.stream()
                .map(this::convertToDocument)
                .collect(Collectors.toList());
            
            // 批量保存到Elasticsearch
            if (!documents.isEmpty()) {
                elasticsearchTemplate.save(documents);
            }
            
            page++;
        } while (products.size() == pageSize);
    }
    
    /**
     * 转换Product实体为ProductDocument
     */
    private ProductDocument convertToDocument(Product product) {
        ProductDocument document = new ProductDocument();
        document.setId(product.getId());
        document.setName(product.getName());
        document.setDescription(product.getDescription());
        document.setPrice(product.getPrice());
        document.setCategory(product.getCategory());
        document.setCreatedAt(product.getCreatedAt());
        return document;
    }
}

3.4 Elasticsearch文档实体

@Document(indexName = "products")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ProductDocument {
    
    @Id
    private Long id;
    
    @Field(type = FieldType.Text, analyzer = "ik_smart_pinyin")
    private String name;
    
    @Field(type = FieldType.Text, analyzer = "ik_smart_pinyin")
    private String description;
    
    @Field(type = FieldType.Scaled_Float, scalingFactor = 100)
    private BigDecimal price;
    
    @Field(type = FieldType.Keyword)
    private String category;
    
    @Field(type = FieldType.Date)
    private Date createdAt;
}

四、实现各种模糊查询

4.1 简单模糊查询

@Service
public class ProductSearchService {
    
    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;
    
    /**
     * 简单模糊查询 - 类似MySQL的LIKE '%关键词%'
     */
    public List<ProductDocument> searchProducts(String keyword) {
        // 构建查询条件
        Query query = new NativeSearchQueryBuilder()
            .withQuery(QueryBuilders.multiMatchQuery(keyword, "name", "description")
                .type(MultiMatchQueryBuilder.Type.MOST_FIELDS))
            .withPageable(PageRequest.of(0, 20))
            .build();
        
        SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
        return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
    }
}

4.2 高亮显示

/**
 * 带高亮显示的搜索
 */
public SearchResponse searchProductsWithHighlight(String keyword) {
    // 构建查询条件
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.multiMatchQuery(keyword, "name", "description")
            .type(MultiMatchQueryBuilder.Type.MOST_FIELDS))
        .withHighlightFields(
            new HighlightBuilder.Field("name").preTags("<em>").postTags("</em>"),
            new HighlightBuilder.Field("description").preTags("<em>").postTags("</em>")
        )
        .withPageable(PageRequest.of(0, 20))
        .build();
    
    SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
    
    // 处理高亮结果
    List<HighlightedProduct> results = searchHits.stream().map(hit -> {
        HighlightedProduct product = new HighlightedProduct();
        product.setProduct(hit.getContent());
        
        // 获取高亮字段
        Map<String, List<String>> highlightFields = hit.getHighlightFields();
        product.setHighlightedName(highlightFields.getOrDefault("name", Collections.emptyList()));
        product.setHighlightedDescription(highlightFields.getOrDefault("description", Collections.emptyList()));
        
        return product;
    }).collect(Collectors.toList());
    
    return new SearchResponse(results, searchHits.getTotalHits());
}

4.3 前缀匹配

/**
 * 前缀匹配查询 - 类似MySQL的LIKE '关键词%'
 */
public List<ProductDocument> searchProductsByPrefix(String prefix) {
    Query query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.multiMatchQuery(prefix, "name", "description")
            .type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX))
        .withPageable(PageRequest.of(0, 20))
        .build();
    
    SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
    return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}

4.4 通配符查询

/**
 * 通配符查询 - 支持*和?通配符
 */
public List<ProductDocument> searchProductsWithWildcard(String pattern) {
    Query query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.wildcardQuery("name", "*" + pattern + "*"))
        .withPageable(PageRequest.of(0, 20))
        .build();
    
    SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
    return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}

4.5 正则表达式查询

/**
 * 正则表达式查询
 */
public List<ProductDocument> searchProductsWithRegex(String regex) {
    Query query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.regexpQuery("name", regex))
        .withPageable(PageRequest.of(0, 20))
        .build();
    
    SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
    return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}

五、性能优化技巧

5.1 分词器优化

# 使用更合适的分词器
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "stop",
            "snowball"
          ]
        }
      }
    }
  }
}'

5.2 索引优化

# 调整索引设置以优化性能
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "refresh_interval": "30s",
    "number_of_replicas": 1,
    "translog.durability": "async",
    "translog.sync_interval": "30s"
  }
}'

5.3 查询优化

/**
 * 优化的搜索方法
 */
public SearchResponse optimizedSearch(String keyword, int page, int size) {
    // 使用bool查询组合多个条件
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
        .should(QueryBuilders.matchQuery("name", keyword).boost(2.0f))
        .should(QueryBuilders.matchQuery("description", keyword))
        .should(QueryBuilders.wildcardQuery("name", "*" + keyword + "*").boost(0.5f))
        .minimumShouldMatch(1);
    
    // 添加过滤条件提高性能
    BoolQueryBuilder filteredQuery = QueryBuilders.boolQuery()
        .must(boolQuery)
        .filter(QueryBuilders.rangeQuery("price").gte(0))
        .filter(QueryBuilders.existsQuery("name"));
    
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(filteredQuery)
        .withPageable(PageRequest.of(page, size))
        .withSort(SortBuilders.scoreSort().order(SortOrder.DESC))
        .withSort(SortBuilders.fieldSort("price").order(SortOrder.ASC))
        .build();
    
    SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
    
    return new SearchResponse(
        searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList()),
        searchHits.getTotalHits()
    );
}

六、实战案例

6.1 电商商品搜索系统

@RestController
@RequestMapping("/api/search")
public class ProductSearchController {
    
    @Autowired
    private ProductSearchService searchService;
    
    /**
     * 商品搜索接口
     */
    @GetMapping("/products")
    public ResponseEntity<SearchResponse> searchProducts(
            @RequestParam String q,
            @RequestParam(defaultValue = "0") int page,
            @RequestParam(defaultValue = "20") int size) {
        
        try {
            SearchResponse response = searchService.optimizedSearch(q, page, size);
            return ResponseEntity.ok(response);
        } catch (Exception e) {
            return ResponseEntity.status(500).body(new SearchResponse("搜索失败: " + e.getMessage()));
        }
    }
    
    /**
     * 自动补全接口
     */
    @GetMapping("/autocomplete")
    public ResponseEntity<List<String>> autocomplete(@RequestParam String q) {
        List<String> suggestions = searchService.getAutocompleteSuggestions(q);
        return ResponseEntity.ok(suggestions);
    }
}

6.2 数据同步策略

@Component
public class DataSyncService {
    
    @Autowired
    private ProductRepository productRepository;
    
    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;
    
    /**
     * 增量数据同步
     */
    @Scheduled(fixedRate = 300000) // 每5分钟执行一次
    public void syncIncrementalData() {
        // 获取最近5分钟内更新的数据
        Date fiveMinutesAgo = new Date(System.currentTimeMillis() - 300000);
        List<Product> recentProducts = productRepository.findByUpdatedAtAfter(fiveMinutesAgo);
        
        // 同步到Elasticsearch
        List<ProductDocument> documents = recentProducts.stream()
            .map(this::convertToDocument)
            .collect(Collectors.toList());
        
        elasticsearchTemplate.save(documents);
    }
    
    /**
     * 全量数据同步
     */
    public void fullSync() {
        elasticsearchTemplate.deleteIndex(ProductDocument.class);
        elasticsearchTemplate.createIndex(ProductDocument.class);
        elasticsearchTemplate.putMapping(ProductDocument.class);
        
        // 调用之前实现的全量同步方法
        productService.syncDataToElasticsearch();
    }
}

七、性能对比测试

让我们通过实际测试来看看Elasticsearch和MySQL的性能差异:

@SpringBootTest
public class SearchPerformanceTest {
    
    @Autowired
    private ProductRepository productRepository;
    
    @Autowired
    private ProductSearchService searchService;
    
    @Test
    public void testSearchPerformance() {
        String keyword = "123";
        
        // 测试MySQL LIKE查询
        long startTime = System.currentTimeMillis();
        List<Product> mysqlResults = productRepository.findByNameContaining(keyword);
        long mysqlTime = System.currentTimeMillis() - startTime;
        
        // 测试Elasticsearch查询
        startTime = System.currentTimeMillis();
        List<ProductDocument> esResults = searchService.searchProducts(keyword);
        long esTime = System.currentTimeMillis() - startTime;
        
        System.out.println("MySQL查询时间: " + mysqlTime + "ms, 结果数: " + mysqlResults.size());
        System.out.println("ES查询时间: " + esTime + "ms, 结果数: " + esResults.size());
        System.out.println("性能提升: " + (mysqlTime * 1.0 / esTime) + "倍");
    }
}

测试结果通常显示:

  • MySQL LIKE '%关键词%'查询:2000ms+
  • Elasticsearch查询:50ms以下
  • 性能提升:40倍以上

八、总结

通过今天的分析,我们了解到:

  1. MySQL LIKE查询的性能问题:使用%关键词%模式会导致全表扫描,性能极差
  2. Elasticsearch的优势:基于倒排索引的全文搜索引擎,支持高效的模糊查询
  3. 实现方案
    • 简单模糊查询(multi_match)
    • 高亮显示
    • 前缀匹配
    • 通配符查询
    • 正则表达式查询
  4. 性能优化技巧:分词器优化、索引设置、查询优化
  5. 实战应用:电商搜索系统、数据同步策略

掌握了这些技巧,相信你在面对模糊查询需求时会更加从容不迫,让你的搜索功能快如闪电!

今日思考:你们项目中是如何处理模糊查询需求的?有没有遇到过性能问题?欢迎在评论区分享你的经验!


如果你觉得这篇文章对你有帮助,欢迎分享给更多的朋友。关注"服务端技术精选",获取更多技术干货!


标题:Elasticsearch实现MySQL like%XX%实战:让你的模糊查询快如闪电!
作者:jiangyi
地址:http://jiangyi.space/articles/2025/12/21/1766304280901.html

    0 评论
avatar