For regression tasks, common metrics are Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²). These metrics measure how close the predicted values are to the actual values. When handling non-linearity, these metrics help us see if the model captures complex patterns better than simple linear regression.
Why advanced regression handles non-linearity in ML Python - Why Metrics Matter
Regression does not use a confusion matrix. Instead, we look at error values. For example, if actual values are [3, 5, 7] and predicted are [2.8, 5.1, 6.9], errors are small, showing good fit. A simple table of actual vs predicted helps visualize this:
Actual: 3.0 5.0 7.0 Predicted: 2.8 5.1 6.9 Error: 0.2 -0.1 0.1
In regression, the tradeoff is between bias and variance. Simple linear regression has high bias and low variance, so it misses non-linear patterns (underfitting). Advanced regression methods (like polynomial regression, decision trees, or kernel methods) reduce bias by fitting curves but can increase variance (overfitting). The goal is to balance this to capture non-linearity without fitting noise.
Example: Predicting house prices that rise sharply after a certain size. Linear regression misses this curve, advanced regression fits it better but might overfit if too complex.
Good regression model:
- Low MSE or RMSE (errors close to zero)
- High R² (close to 1), meaning predictions explain most of the variation
Bad regression model:
- High MSE or RMSE (large errors)
- Low or negative R², meaning predictions are worse than just guessing the average
Advanced regression models that handle non-linearity usually show better metrics on complex data than simple linear models.
- Ignoring non-linearity: Using linear regression on non-linear data leads to poor fit and misleading metrics.
- Overfitting: Advanced models may fit training data perfectly but fail on new data, causing low test performance.
- Data leakage: Using future or target information in training inflates metrics falsely.
- Relying on a single metric: Always check multiple metrics and visualize predictions to understand model behavior.
Your advanced regression model has an R² of 0.95 on training data but only 0.60 on test data. Is it good at handling non-linearity? Why or why not?
Answer: The model fits training data well, capturing non-linearity, but the drop on test data suggests overfitting. It handles non-linearity but needs better generalization.