Experiment - Tokenization in spaCy
Problem:You want to split sentences into words or tokens using spaCy, but your current tokenization splits contractions incorrectly and includes punctuation as separate tokens.
Current Metrics:Example input: "I'm learning spaCy!" Current tokens: ['I', ''', 'm', 'learning', 'spaCy', '!']
Issue:The tokenizer splits contractions like "I'm" into separate tokens ('I', ''', 'm') and treats punctuation as separate tokens, which may not be desired for your application.