Experiment - BERT tokenization (WordPiece)
Problem:You want to tokenize sentences using BERT's WordPiece tokenizer to prepare text data for a BERT model.
Current Metrics:Tokenization is done but the tokens do not match expected WordPiece tokens, causing poor model input quality.
Issue:The current tokenization uses a simple whitespace split instead of WordPiece, leading to incorrect subword tokens and poor model understanding.