Model Pipeline - Document loaders
This pipeline shows how document loaders bring text data into a machine learning system. It starts with raw documents, processes them into clean text, and prepares them for further analysis or model training.
Jump into concepts and practice - no test required
This pipeline shows how document loaders bring text data into a machine learning system. It starts with raw documents, processes them into clean text, and prepares them for further analysis or model training.
Loss
1.0 |*****
0.8 |****
0.6 |***
0.4 |**
0.2 |*
0.0 +-----
1 2 3 4 5 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.85 | 0.60 | Initial training with raw loaded text |
| 2 | 0.65 | 0.72 | Improved after text cleaning and chunking |
| 3 | 0.50 | 0.80 | Model learns better representations from cleaned chunks |
| 4 | 0.40 | 0.85 | Continued improvement with more epochs |
| 5 | 0.35 | 0.88 | Training converges with good accuracy |
documents after loading a text file?from langchain.document_loaders import TextLoader
loader = TextLoader('sample.txt')
documents = loader.load()from langchain.document_loaders import PDFLoader
loader = PDFLoader('document.txt')
docs = loader.load()