Experiment - Streaming responses to users
Problem:You have a language model that generates answers to user questions. Currently, the model waits until the entire answer is generated before showing it to the user. This causes delays and a less engaging experience.
Current Metrics:Average response latency: 5 seconds; User engagement score: 60/100
Issue:The model does not stream partial outputs, causing high latency and lower user engagement.