SpringBoot + Whisper + FFmpeg：语音转文字服务接入，会议记录自动生成实战

2026-01-10 / 2026-01-10 / SpringBoot Whisper FFmpeg 语音转文字

语音转文字的痛点

在日常工作和项目开发中，你是否遇到过这样的场景：

会议结束后，需要手动整理会议记录，费时费力
录音文件格式不统一，难以处理
语音识别准确率不高，需要大量人工修正
需要处理各种音频格式，兼容性问题多

传统的人工整理方式不仅效率低下，还容易遗漏重要信息。现在有了AI语音识别技术，我们可以让这一切变得自动化。

解决方案思路

今天我们要解决的，就是如何用Whisper + FFmpeg构建一个高效的语音转文字服务。

核心思路是：

音频预处理：使用FFmpeg统一音频格式，提高识别质量
语音识别：使用Whisper模型进行高质量语音转文字
结果处理：对识别结果进行后处理和格式化
批量处理：支持批量音频文件转换

技术选型

SpringBoot：快速搭建应用
OpenAI Whisper：语音识别模型
FFmpeg：音频格式转换和预处理
Python：Whisper模型运行环境（或使用whisper.cpp优化版本）

核心实现思路

1. 环境准备

首先安装必要的工具：

# 安装FFmpeg
# Windows: 下载并添加到PATH
# Linux/Mac: apt-get install ffmpeg 或 brew install ffmpeg

# 安装Python依赖
pip install openai-whisper
# 或者使用whisper.cpp以获得更好的性能

2. 项目配置

在SpringBoot项目中添加必要的依赖：

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>
    <!-- 文件上传处理 -->
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
    </dependency>
</dependencies>

3. 音频预处理服务

创建音频预处理服务，使用FFmpeg标准化音频格式：

@Service
@Slf4j
public class AudioPreprocessingService {
    
    @Value("${audio.preprocess.path:/tmp/audio}")
    private String tempPath;
    
    /**
     * 使用FFmpeg预处理音频文件
     */
    public String preprocessAudio(String inputFilePath) throws IOException {
        // 创建临时文件
        File inputFile = new File(inputFilePath);
        String outputFileName = "preprocessed_" + System.currentTimeMillis() + ".wav";
        String outputPath = tempPath + "/" + outputFileName;
        
        // FFmpeg命令：转换为Whisper推荐的格式（16kHz, 单声道, WAV）
        String[] cmd = {
            "ffmpeg",
            "-i", inputFilePath,
            "-ar", "16000",  // 采样率16kHz
            "-ac", "1",      // 单声道
            "-c:a", "pcm_s16le", // 编码格式
            outputPath
        };
        
        ProcessBuilder processBuilder = new ProcessBuilder(cmd);
        Process process = processBuilder.start();
        
        try {
            int exitCode = process.waitFor();
            if (exitCode == 0) {
                log.info("音频预处理完成: {} -> {}", inputFilePath, outputPath);
                return outputPath;
            } else {
                throw new IOException("FFmpeg处理失败，退出码: " + exitCode);
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new IOException("音频处理被中断", e);
        }
    }
    
    /**
     * 验证音频文件基本信息
     */
    public AudioFileInfo getAudioInfo(String filePath) throws IOException {
        String[] cmd = {
            "ffprobe",
            "-v", "quiet",
            "-show_format",
            "-show_streams",
            "-print_format", "json",
            filePath
        };
        
        ProcessBuilder processBuilder = new ProcessBuilder(cmd);
        Process process = processBuilder.start();
        
        String result = new BufferedReader(
            new InputStreamReader(process.getInputStream())
        ).lines().collect(Collectors.joining("\n"));
        
        ObjectMapper objectMapper = new ObjectMapper();
        JsonNode rootNode = objectMapper.readTree(result);
        
        // 解析音频信息
        JsonNode streams = rootNode.get("streams");
        for (JsonNode stream : streams) {
            if ("audio".equals(stream.get("codec_type").asText())) {
                return AudioFileInfo.builder()
                    .duration(stream.get("duration").asDouble())
                    .sampleRate(stream.get("sample_rate").asInt())
                    .channels(stream.get("channels").asInt())
                    .codec(stream.get("codec_name").asText())
                    .build();
            }
        }
        
        return null;
    }
}

4. Whisper语音识别服务

创建语音识别服务，调用Whisper模型：

@Service
@Slf4j
public class WhisperTranscriptionService {
    
    @Value("${whisper.model.path:models/ggml-medium.bin}") // 使用whisper.cpp模型
    private String modelPath;
    
    @Value("${whisper.executable.path:./whisper/whisper}")
    private String whisperExecutable;
    
    /**
     * 使用Whisper进行语音识别
     */
    public TranscriptionResult transcribeAudio(String audioFilePath, String language) {
        try {
            String outputFileName = "transcript_" + System.currentTimeMillis();
            String outputPath = System.getProperty("java.io.tmpdir") + "/" + outputFileName;
            
            // 构建Whisper命令
            List<String> cmd = new ArrayList<>();
            cmd.add(whisperExecutable);
            cmd.add("--model");
            cmd.add(modelPath);
            cmd.add("--output-txt");
            cmd.add("--output-file");
            cmd.add(outputPath);
            cmd.add("--language");
            cmd.add(language != null ? language : "zh"); // 默认中文
            cmd.add(audioFilePath);
            
            ProcessBuilder processBuilder = new ProcessBuilder(cmd);
            Process process = processBuilder.start();
            
            // 等待处理完成
            int exitCode = process.waitFor();
            if (exitCode != 0) {
                log.error("Whisper处理失败，退出码: {}", exitCode);
                return TranscriptionResult.failure("语音识别失败，退出码: " + exitCode);
            }
            
            // 读取识别结果
            String txtFilePath = outputPath + ".txt";
            String transcript = readFileToString(txtFilePath);
            
            log.info("语音识别完成: {}", audioFilePath);
            return TranscriptionResult.success(transcript);
            
        } catch (Exception e) {
            log.error("语音识别过程出错", e);
            return TranscriptionResult.failure("语音识别失败: " + e.getMessage());
        }
    }
    
    /**
     * 批量处理音频文件
     */
    public List<TranscriptionResult> batchTranscribe(List<String> audioFiles, String language) {
        return audioFiles.parallelStream()
                .map(filePath -> transcribeAudio(filePath, language))
                .collect(Collectors.toList());
    }
    
    private String readFileToString(String filePath) throws IOException {
        return new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
    }
}

5. 会议记录生成服务

创建会议记录生成和处理服务：

@Service
@Slf4j
public class MeetingRecordService {
    
    @Autowired
    private AudioPreprocessingService preprocessingService;
    
    @Autowired
    private WhisperTranscriptionService transcriptionService;
    
    /**
     * 生成会议记录
     */
    public MeetingRecord generateMeetingRecord(MultipartFile audioFile, String meetingTitle) {
        try {
            // 1. 保存上传的音频文件
            String originalFilePath = saveUploadedFile(audioFile);
            
            // 2. 预处理音频
            String processedFilePath = preprocessingService.preprocessAudio(originalFilePath);
            
            // 3. 语音识别
            TranscriptionResult result = transcriptionService.transcribeAudio(processedFilePath, "zh");
            
            if (!result.isSuccess()) {
                throw new RuntimeException("语音识别失败: " + result.getErrorMessage());
            }
            
            // 4. 生成会议记录
            MeetingRecord record = new MeetingRecord();
            record.setTitle(meetingTitle);
            record.setOriginalAudioPath(originalFilePath);
            record.setProcessedAudioPath(processedFilePath);
            record.setRawTranscript(result.getText());
            record.setProcessedTranscript(postProcessTranscript(result.getText()));
            record.setCreatedAt(LocalDateTime.now());
            
            // 5. 清理临时文件
            cleanupTempFiles(processedFilePath);
            
            return record;
            
        } catch (Exception e) {
            log.error("生成会议记录失败", e);
            throw new RuntimeException("会议记录生成失败: " + e.getMessage());
        }
    }
    
    /**
     * 后处理识别结果
     */
    private String postProcessTranscript(String rawTranscript) {
        // 移除时间戳
        String processed = rawTranscript.replaceAll("\\[\\d{2}:\\d{2}.\\d{3} --> \\d{2}:\\d{2}.\\d{3}\\]", "");
        
        // 清理多余的空白字符
        processed = processed.replaceAll("\\s+", " ").trim();
        
        // 按句子分割，便于后续处理
        String[] sentences = processed.split("[。！？.!?]");
        
        StringBuilder formatted = new StringBuilder();
        for (String sentence : sentences) {
            sentence = sentence.trim();
            if (!sentence.isEmpty()) {
                formatted.append(sentence).append("。\n");
            }
        }
        
        return formatted.toString();
    }
    
    private String saveUploadedFile(MultipartFile file) throws IOException {
        String fileName = "audio_" + System.currentTimeMillis() + "_" + file.getOriginalFilename();
        String filePath = System.getProperty("java.io.tmpdir") + "/" + fileName;
        
        try (FileOutputStream fos = new FileOutputStream(filePath)) {
            file.transferTo(fos);
        }
        
        return filePath;
    }
    
    private void cleanupTempFiles(String... filePaths) {
        for (String filePath : filePaths) {
            try {
                Files.deleteIfExists(Paths.get(filePath));
            } catch (IOException e) {
                log.warn("删除临时文件失败: {}", filePath, e);
            }
        }
    }
}

6. REST API接口

提供API接口供前端或其他服务调用：

@RestController
@RequestMapping("/api/meeting-record")
public class MeetingRecordController {
    
    @Autowired
    private MeetingRecordService meetingRecordService;
    
    /**
     * 上传音频文件生成会议记录
     */
    @PostMapping("/generate")
    public ResponseEntity<Result<MeetingRecord>> generateRecord(
            @RequestParam("audio") MultipartFile audioFile,
            @RequestParam(value = "title", required = false) String title) {
        
        try {
            if (audioFile.isEmpty()) {
                return ResponseEntity.badRequest()
                    .body(Result.error("音频文件不能为空"));
            }
            
            // 验证文件类型
            String contentType = audioFile.getContentType();
            if (!isValidAudioFormat(contentType)) {
                return ResponseEntity.badRequest()
                    .body(Result.error("不支持的音频格式，请上传mp3, wav, m4a等格式"));
            }
            
            MeetingRecord record = meetingRecordService.generateMeetingRecord(
                audioFile, 
                title != null ? title : "会议记录_" + LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))
            );
            
            return ResponseEntity.ok(Result.success(record));
            
        } catch (Exception e) {
            log.error("生成会议记录失败", e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(Result.error("生成会议记录失败: " + e.getMessage()));
        }
    }
    
    /**
     * 批量处理会议录音
     */
    @PostMapping("/batch-generate")
    public ResponseEntity<Result<List<MeetingRecord>>> batchGenerateRecords(
            @RequestParam("audioFiles") MultipartFile[] audioFiles) {
        
        List<MeetingRecord> records = new ArrayList<>();
        
        for (MultipartFile file : audioFiles) {
            try {
                MeetingRecord record = meetingRecordService.generateMeetingRecord(
                    file, 
                    "批量处理_" + file.getOriginalFilename()
                );
                records.add(record);
            } catch (Exception e) {
                log.error("处理音频文件失败: {}", file.getOriginalFilename(), e);
            }
        }
        
        return ResponseEntity.ok(Result.success(records));
    }
    
    /**
     * 获取会议记录详情
     */
    @GetMapping("/{id}")
    public ResponseEntity<Result<MeetingRecord>> getRecord(@PathVariable Long id) {
        // 实现获取记录逻辑
        return ResponseEntity.ok(Result.success(null)); // 简化实现
    }
    
    private boolean isValidAudioFormat(String contentType) {
        return contentType != null && (
            contentType.startsWith("audio/") ||
            contentType.equals("video/mp4") ||  // MP4也包含音频
            contentType.equals("video/x-msvideo") // AVI也包含音频
        );
    }
}

7. 任务队列和异步处理

对于大文件或批量处理，使用异步处理：

@Service
@Slf4j
public class AsyncTranscriptionService {
    
    @Autowired
    private MeetingRecordService meetingRecordService;
    
    @Async
    @EventListener
    public void handleTranscriptionRequest(TranscriptionEvent event) {
        try {
            MeetingRecord record = meetingRecordService.generateMeetingRecord(
                event.getAudioFile(), 
                event.getTitle()
            );
            
            // 发送完成事件
            applicationEventPublisher.publishEvent(
                new TranscriptionCompletedEvent(record, event.getCallbackUrl())
            );
            
        } catch (Exception e) {
            log.error("异步转录失败", e);
            applicationEventPublisher.publishEvent(
                new TranscriptionFailedEvent(event.getOriginalRequestId(), e.getMessage())
            );
        }
    }
}

性能优化策略

1. 缓存机制

@Service
public class CachedTranscriptionService {
    
    @Cacheable(value = "transcriptions", key = "#audioFilePath + '_' + #language")
    public TranscriptionResult getTranscription(String audioFilePath, String language) {
        return transcriptionService.transcribeAudio(audioFilePath, language);
    }
}

2. 并发处理

@Configuration
@EnableAsync
public class AsyncConfig implements AsyncConfigurer {
    
    @Override
    public Executor getAsyncExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(2);
        executor.setMaxPoolSize(5);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("transcription-");
        executor.initialize();
        return executor;
    }
}

优势分析

相比传统的人工整理方式，这种方案的优势明显：

效率提升：几小时内会议内容几分钟内完成转录
准确性高：现代语音识别模型准确率可达90%以上
格式统一：自动输出标准化的会议记录格式
批量处理：支持批量音频文件处理
多语言支持：支持多种语言的语音识别

注意事项

硬件要求：语音识别需要一定的计算资源
音频质量：原始音频质量直接影响识别准确率
隐私安全：敏感会议内容建议本地处理
网络依赖：如果是在线API，需要稳定的网络连接
模型选择：根据准确率和性能需求选择合适的模型

总结

通过SpringBoot + Whisper + FFmpeg的技术组合，我们可以构建一个高效、准确的语音转文字服务。这不仅能大幅提升会议记录整理效率，还能释放人力资源去做更有价值的工作。

在实际项目中，建议根据具体需求调整模型参数和处理流程，以达到最佳的准确率和性能平衡。

服务端技术精选，专注分享后端开发实战技术，助力你的技术成长！

标题：SpringBoot + Whisper + FFmpeg：语音转文字服务接入，会议记录自动生成实战
作者：jiangyi
地址：http://jiangyi.space/articles/2026/01/10/1768029360917.html
公众号：服务端技术精选

语音转文字的痛点
解决方案思路
技术选型
核心实现思路
1. 环境准备
2. 项目配置
3. 音频预处理服务
4. Whisper语音识别服务
5. 会议记录生成服务
6. REST API接口
7. 任务队列和异步处理
性能优化策略
1. 缓存机制
2. 并发处理
优势分析
注意事项
总结

0 评论