MlopsProgramBeginner · 2 min read

Python ML Program to Predict Stock Price Using sklearn

Use sklearn's LinearRegression to train a model on historical stock prices and predict future prices with code like model = LinearRegression(); model.fit(X_train, y_train); predictions = model.predict(X_test).

📋

Examples

InputHistorical prices: [100, 101, 102, 103, 104]

OutputPredicted next price: approximately 105

InputHistorical prices: [200, 198, 197, 195, 193]

OutputPredicted next price: approximately 191

InputHistorical prices: [50, 50, 50, 50, 50]

OutputPredicted next price: approximately 50

🧠

How to Think About It

To predict stock prices, first collect past price data and prepare it as input features and target values. Then, choose a simple model like linear regression to learn the pattern from past prices. Finally, use the trained model to predict future prices based on recent data.

📐

Algorithm

Collect historical stock price data.

Prepare the data by creating input features and target labels.

Split the data into training and testing sets.

Train a Linear Regression model on the training data.

Use the trained model to predict stock prices on the test data.

Evaluate the model's performance using metrics like mean squared error.

💻

Code

sklearn

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Sample historical prices
prices = np.array([100, 101, 102, 103, 104, 105, 106])

# Prepare features (previous day's price) and targets (current day's price)
X = prices[:-1].reshape(-1, 1)
y = prices[1:]

# Train model
model = LinearRegression()
model.fit(X, y)

# Predict next price given last price
last_price = np.array([[prices[-1]]])
predicted_price = model.predict(last_price)

print(f"Predicted next price: {predicted_price[0]:.2f}")

Output

Predicted next price: 107.00

🔍

Dry Run

Let's trace the example with prices [100, 101, 102, 103, 104, 105, 106] through the code.

Prepare features and targets

X = [[100], [101], [102], [103], [104], [105]], y = [101, 102, 103, 104, 105, 106]

Train Linear Regression model

Model learns the relationship: next_price ≈ previous_price + 1

Predict next price

Input last_price = [[106]], model predicts ≈ 107

Previous Price (X)	Next Price (y)
100	101
101	102
102	103
103	104
104	105
105	106

💡

Why This Works

Step 1: Data Preparation

We use the previous day's price as input X and the current day's price as target y to teach the model the price trend.

Step 2: Model Training

The LinearRegression model learns the linear relationship between past and future prices by minimizing prediction errors.

Step 3: Prediction

After training, the model predicts the next price by applying the learned pattern to the latest known price.

🔄

Alternative Approaches

Random Forest Regression

sklearn

import numpy as np
from sklearn.ensemble import RandomForestRegressor

prices = np.array([100, 101, 102, 103, 104, 105, 106])
X = prices[:-1].reshape(-1, 1)
y = prices[1:]
model = RandomForestRegressor(n_estimators=10, random_state=42)
model.fit(X, y)
last_price = np.array([[prices[-1]]])
predicted_price = model.predict(last_price)
print(f"Predicted next price: {predicted_price[0]:.2f}")

Random Forest can capture non-linear patterns but is slower and less interpretable than linear regression.

Support Vector Regression (SVR)

sklearn

import numpy as np
from sklearn.svm import SVR

prices = np.array([100, 101, 102, 103, 104, 105, 106])
X = prices[:-1].reshape(-1, 1)
y = prices[1:]
model = SVR(kernel='rbf')
model.fit(X, y)
last_price = np.array([[prices[-1]]])
predicted_price = model.predict(last_price)
print(f"Predicted next price: {predicted_price[0]:.2f}")

SVR can model complex relationships but requires tuning and more computation.

⚡

Complexity: O(n) time, O(n) space

Time Complexity

Training linear regression takes O(n) time where n is the number of data points, as it fits a simple linear model.

Space Complexity

The model stores coefficients and input data, so space is O(n) for data and O(1) for model parameters.

Which Approach is Fastest?

Linear regression is fastest and simplest; Random Forest and SVR are slower but can capture complex patterns.

Approach	Time	Space	Best For
Linear Regression	O(n)	O(n)	Simple linear trends, fast training
Random Forest Regression	O(n * trees)	O(n * trees)	Non-linear patterns, better accuracy
Support Vector Regression	O(n^2) to O(n^3)	O(n^2)	Complex relationships, small datasets

💡

Always scale or normalize your input features when using models sensitive to feature scale.

⚠️

Beginners often forget to reshape input data to 2D arrays before fitting sklearn models, causing errors.