Introduction
When working with large amounts of text, it can be hard to process or understand everything at once. Breaking text into smaller, manageable pieces helps computers and people handle information more easily and accurately.
Imagine you have a long storybook to share with friends. You can cut it into equal pages, split it by chapters, group parts by themes, or share some sentences twice between friends to keep the story connected.
┌───────────────┐
│ Full Text │
└──────┬────────┘
│
▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Fixed-size │ │ Sentence-based │ │ Semantic │ │ Overlap │
│ chunks │ │ chunks │ │ chunks │ │ chunks │
│ [equal parts] │ │ [by sentences]│ │ [by meaning] │ │ [shared text] │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘