Experiment - Regular expressions for text cleaning
Problem:You have a text dataset with noisy data including extra spaces, special characters, and inconsistent capitalization. This noise makes it hard for your model to learn well.
Current Metrics:Text cleaning accuracy: 70% (measured by how well cleaned text matches expected clean text samples)
Issue:The current cleaning method misses many unwanted characters and does not normalize text well, causing poor data quality.