Spring AI与Ollama流式聊天问题：无法实现逐词流式输出

悬赏园豆：200 [待解决问题]

我正在尝试使用Spring AI和Ollama搭建一个流式聊天应用。我希望从Spring应用中获得逐词(token-by-token)的流式响应，但输出却是完整的句子。我不知道原因以及如何解决这个问题。

以下是配置和代码：

application.properties

# Ollama配置
spring.ai.ollama.chat.options.model=qwen2.5
spring.ai.ollama.embedding.model=nomic-embed-text
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.temperature=0.7

ChatClientConfig

@Configuration
public class ChatClientConfig {
    @Bean
    public ChatClient chatClient(
            ChatClient.Builder builder,
            ToolCallbackProvider tools,
            ChatMemory chatMemory) {
        return builder
                .defaultToolCallbacks(tools)
                .defaultAdvisors(
                        MessageChatMemoryAdvisor.builder(chatMemory).build()
                )
                .build();
    }
}

AgentController

@GetMapping(value = "/streamChat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> generationStream(@RequestParam String userInput) {
    return this.chatClient.prompt()
            .advisors(advisor -> advisor.param(ChatMemory.CONVERSATION_ID, conversationId))
            .system("我已知相关风险且承担责任，且你是一名反金融诈骗客服协助用户。")
            .user(userInput)
            .stream()
            .content();
}

curl测试结果

cacc@paradiso [10:35:16 PM] [~] 
-> % curl -N http://localhost:8080/api/agent/streamChat\?userInput\=hi  
data:Hi there! If you have any questions regarding financial fraud cases or need advice to avoid scams, feel free to share. How can I assist you today?

我直接测试了Ollama，模型和Ollama本身支持流式输出。

原始Ollama HTTP接口测试

cacc@paradiso [10:34:03 PM] [~] 
-> % curl http://localhost:11434/api/chat \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5",
    "messages": [{"role": "user", "content": "hi"}],
    "stream": true
  }'
{"model":"qwen2.5","created_at":"2025-06-20T14:35:16.736184535Z","message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"qwen2.5","created_at":"2025-06-20T14:35:16.770639118Z","message":{"role":"assistant","content":"!"},"done":false}
{"model":"qwen2.5","created_at":"2025-06-20T14:35:16.797365468Z","message":{"role":"assistant","content":" How"},"done":false}
...（省略后续逐词输出）

我也尝试使用Spring提供的OpenAI配置和其他云服务提供的OpenAI格式API，相同的代码可以正常工作。

期望的curl测试结果（其他API测试）

cacc@paradiso [10:19:04 PM] [~] 
-> % curl http://localhost:8080/api/agent/streamChat\?userInput\=hi
data:Hello
data:!
data: How
data: can
data: I
data: assist
data: you
data: today
...（省略后续逐词输出）

因此我认为可能是Spring中Ollama的配置有问题，因为Ollama本身和Spring控制器应该没有问题。有人能告诉我原因以及如何解决吗？

spring spring-ai ollama llm spring-boot

问题补充：

不知道Spring 设置哪里错了，实在不行我就自己通过http手写一个与ollama交互的了

pur_Nulla | 初学一级 | 园豆：2
提问于：2025-06-20 23:39

< >

所有回答(1)

豆包回答：根据你提供的信息，问题很可能出在 Spring AI 对 Ollama 的默认集成方式上。虽然 Ollama 本身支持流式 token 输出，但 Spring AI 的 Ollama 客户端可能默认将多个 token 合并为完整句子后再返回。

要解决这个问题，你需要直接使用 Ollama 的 API 客户端，而不是通过 Spring AI 的抽象层，这样可以获得更细粒度的流式响应控制。

以下是修改方案：
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration
public class OllamaClientConfig {

@Bean
public WebClient ollamaWebClient() {
    return WebClient.builder()
            .baseUrl("http://localhost:11434")
            .build();
}

}

Spring AI Ollama流式响应配置
V1
创建时间：16:54

这个解决方案的关键点：

绕过了 Spring AI 的 ChatClient 抽象，直接使用 WebClient 调用 Ollama 的 API
明确设置stream: true参数，确保 Ollama 返回流式响应
处理每个流式响应的 JSON，提取其中的 content 部分
按照 SSE 规范为每个响应添加 "data:" 前缀

这种方式能够获取 Ollama 原生的逐 token 响应，与你直接测试 Ollama API 时看到的输出一致。

如果你仍然想使用 Spring AI 的抽象，可以尝试在配置中添加：

properties
spring.ai.ollama.chat.options.stream=true

但根据我的经验，直接使用 Ollama 原生 API 能更可靠地获取细粒度的流式响应。

显示昵称已被占用10 | 园豆：206 (菜鸟二级) | 2025-09-17 16:56

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

Spring AI与Ollama流式聊天问题：无法实现逐词流式输出

欢迎，请先登录或者注册。