Tokenization is the process of breaking a text string into smaller parts called tokens, usually words. We start with a text string, then split it by spaces to get tokens. For example, splitting "Hello world!" by spaces gives two tokens: "Hello" and "world!". Note that punctuation like the exclamation mark stays attached to the word because split() only separates by spaces. Variables like 'text' hold the original string, and 'tokens' hold the list of words after splitting. This process is important for analyzing text step-by-step in data science.