Model Pipeline - Streaming responses
This pipeline shows how a model generates answers step-by-step, sending parts of the response as soon as they are ready. This helps users get quick feedback instead of waiting for the full answer.
Jump into concepts and practice - no test required
This pipeline shows how a model generates answers step-by-step, sending parts of the response as soon as they are ready. This helps users get quick feedback instead of waiting for the full answer.
Epoch 1 | ******************** (2.3) Epoch 2 | *************** (1.8) Epoch 3 | ********** (1.2) Epoch 4 | ******* (0.8) Epoch 5 | **** (0.5)
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 2.3 | 0.15 | Model starts learning to predict next tokens, loss is high. |
| 2 | 1.8 | 0.30 | Loss decreases, model improves token prediction. |
| 3 | 1.2 | 0.50 | Model learns better context, accuracy rises. |
| 4 | 0.8 | 0.65 | Loss continues to drop, predictions more accurate. |
| 5 | 0.5 | 0.80 | Model converges well, ready for streaming generation. |
stream=True.response = model.generate(prompt, stream=True)
for chunk in response:
print(chunk)stream=True, the response is an iterable that yields chunks as they arrive.response = model.generate(prompt, stream=True) print(response)
stream=True, the response is an iterable, not a complete string.stream=True allows receiving data chunks as they are generated.stream=True but collecting all chunks in a list before printing defeats real-time display. Setting stream=False waits for the full response. Using a timer without streaming is inefficient.