Experiment - Why similarity measures find related text
Problem:We want to find how well similarity measures can identify related text pairs. Currently, using cosine similarity on simple word count vectors, the model sometimes fails to rank truly related texts higher.
Current Metrics:Accuracy of identifying related text pairs: 65%. Precision: 60%. Recall: 70%.
Issue:The similarity measure is too simple and does not capture deeper meaning, causing moderate accuracy and some false matches.