Overview - Weight decay (L2 regularization)
What is it?
Weight decay, also known as L2 regularization, is a technique used in machine learning to keep model weights small. It adds a penalty to the loss function based on the size of the weights, encouraging the model to prefer simpler solutions. This helps prevent the model from fitting noise in the training data, which is called overfitting. By controlling weight sizes, the model generalizes better to new, unseen data.
Why it matters
Without weight decay, models can become too complex and memorize training data instead of learning general patterns. This leads to poor performance on new data, which is a big problem in real-world applications like image recognition or speech processing. Weight decay helps models stay simple and reliable, making AI systems more trustworthy and effective in everyday tasks.
Where it fits
Before learning weight decay, you should understand basic neural networks, loss functions, and gradient descent optimization. After mastering weight decay, you can explore other regularization methods like dropout and batch normalization, and advanced optimization techniques that improve training stability and speed.