Python ML Program to Predict Stock Price Using sklearn
LinearRegression to train a model on historical stock prices and predict future prices with code like model = LinearRegression(); model.fit(X_train, y_train); predictions = model.predict(X_test).Examples
How to Think About It
Algorithm
Code
import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # Sample historical prices prices = np.array([100, 101, 102, 103, 104, 105, 106]) # Prepare features (previous day's price) and targets (current day's price) X = prices[:-1].reshape(-1, 1) y = prices[1:] # Train model model = LinearRegression() model.fit(X, y) # Predict next price given last price last_price = np.array([[prices[-1]]]) predicted_price = model.predict(last_price) print(f"Predicted next price: {predicted_price[0]:.2f}")
Dry Run
Let's trace the example with prices [100, 101, 102, 103, 104, 105, 106] through the code.
Prepare features and targets
X = [[100], [101], [102], [103], [104], [105]], y = [101, 102, 103, 104, 105, 106]
Train Linear Regression model
Model learns the relationship: next_price ≈ previous_price + 1
Predict next price
Input last_price = [[106]], model predicts ≈ 107
| Previous Price (X) | Next Price (y) |
|---|---|
| 100 | 101 |
| 101 | 102 |
| 102 | 103 |
| 103 | 104 |
| 104 | 105 |
| 105 | 106 |
Why This Works
Step 1: Data Preparation
We use the previous day's price as input X and the current day's price as target y to teach the model the price trend.
Step 2: Model Training
The LinearRegression model learns the linear relationship between past and future prices by minimizing prediction errors.
Step 3: Prediction
After training, the model predicts the next price by applying the learned pattern to the latest known price.
Alternative Approaches
import numpy as np from sklearn.ensemble import RandomForestRegressor prices = np.array([100, 101, 102, 103, 104, 105, 106]) X = prices[:-1].reshape(-1, 1) y = prices[1:] model = RandomForestRegressor(n_estimators=10, random_state=42) model.fit(X, y) last_price = np.array([[prices[-1]]]) predicted_price = model.predict(last_price) print(f"Predicted next price: {predicted_price[0]:.2f}")
import numpy as np from sklearn.svm import SVR prices = np.array([100, 101, 102, 103, 104, 105, 106]) X = prices[:-1].reshape(-1, 1) y = prices[1:] model = SVR(kernel='rbf') model.fit(X, y) last_price = np.array([[prices[-1]]]) predicted_price = model.predict(last_price) print(f"Predicted next price: {predicted_price[0]:.2f}")
Complexity: O(n) time, O(n) space
Time Complexity
Training linear regression takes O(n) time where n is the number of data points, as it fits a simple linear model.
Space Complexity
The model stores coefficients and input data, so space is O(n) for data and O(1) for model parameters.
Which Approach is Fastest?
Linear regression is fastest and simplest; Random Forest and SVR are slower but can capture complex patterns.
| Approach | Time | Space | Best For |
|---|---|---|---|
| Linear Regression | O(n) | O(n) | Simple linear trends, fast training |
| Random Forest Regression | O(n * trees) | O(n * trees) | Non-linear patterns, better accuracy |
| Support Vector Regression | O(n^2) to O(n^3) | O(n^2) | Complex relationships, small datasets |