我正在尝试使用Spring AI和Ollama搭建一个流式聊天应用。我希望从Spring应用中获得逐词(token-by-token)的流式响应,但输出却是完整的句子。我不知道原因以及如何解决这个问题。
以下是配置和代码:
application.properties
# Ollama配置
spring.ai.ollama.chat.options.model=qwen2.5
spring.ai.ollama.embedding.model=nomic-embed-text
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.temperature=0.7
ChatClientConfig
@Configuration
public class ChatClientConfig {
@Bean
public ChatClient chatClient(
ChatClient.Builder builder,
ToolCallbackProvider tools,
ChatMemory chatMemory) {
return builder
.defaultToolCallbacks(tools)
.defaultAdvisors(
MessageChatMemoryAdvisor.builder(chatMemory).build()
)
.build();
}
}
AgentController
@GetMapping(value = "/streamChat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> generationStream(@RequestParam String userInput) {
return this.chatClient.prompt()
.advisors(advisor -> advisor.param(ChatMemory.CONVERSATION_ID, conversationId))
.system("我已知相关风险且承担责任,且你是一名反金融诈骗客服协助用户。")
.user(userInput)
.stream()
.content();
}
curl测试结果
cacc@paradiso [10:35:16 PM] [~]
-> % curl -N http://localhost:8080/api/agent/streamChat\?userInput\=hi
data:Hi there! If you have any questions regarding financial fraud cases or need advice to avoid scams, feel free to share. How can I assist you today?
我直接测试了Ollama,模型和Ollama本身支持流式输出。
原始Ollama HTTP接口测试
cacc@paradiso [10:34:03 PM] [~]
-> % curl http://localhost:11434/api/chat \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5",
"messages": [{"role": "user", "content": "hi"}],
"stream": true
}'
{"model":"qwen2.5","created_at":"2025-06-20T14:35:16.736184535Z","message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"qwen2.5","created_at":"2025-06-20T14:35:16.770639118Z","message":{"role":"assistant","content":"!"},"done":false}
{"model":"qwen2.5","created_at":"2025-06-20T14:35:16.797365468Z","message":{"role":"assistant","content":" How"},"done":false}
...(省略后续逐词输出)
我也尝试使用Spring提供的OpenAI配置和其他云服务提供的OpenAI格式API,相同的代码可以正常工作。
期望的curl测试结果(其他API测试)
cacc@paradiso [10:19:04 PM] [~]
-> % curl http://localhost:8080/api/agent/streamChat\?userInput\=hi
data:Hello
data:!
data: How
data: can
data: I
data: assist
data: you
data: today
...(省略后续逐词输出)
因此我认为可能是Spring中Ollama的配置有问题,因为Ollama本身和Spring控制器应该没有问题。有人能告诉我原因以及如何解决吗?
不知道Spring 设置哪里错了,实在不行我就自己通过http手写一个与ollama交互的了