Experiment - Streaming responses
Problem:You have a language model that generates text responses all at once after processing the input. This causes delays and a less interactive experience.
Current Metrics:Response latency: 3 seconds per query; User engagement score: 60%
Issue:The model does not stream output tokens as they are generated, leading to high latency and lower user engagement.