0
0
SciPydata~10 mins

SciPy with scikit-learn pipeline - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - scikit-learn pipeline
Load Data
Define Pipeline Steps
Create Pipeline Object
Fit Pipeline on Training Data
Predict or Transform Data
Evaluate or Use Results
This flow shows how data is loaded, a pipeline is created with steps, then fitted and used for prediction or transformation.
Execution Sample
SciPy
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
  ('scaler', StandardScaler()),
  ('logreg', LogisticRegression())
])

pipe.fit(X_train, y_train)
preds = pipe.predict(X_test)
This code creates a pipeline that scales data then fits a logistic regression model, then predicts on test data.
Execution Table
StepActionInput Data ShapeOutput Data ShapeNotes
1Load X_train, y_train(100, 4), (100,)(100, 4), (100,)Data loaded with 100 samples, 4 features
2Create Pipeline with scaler and logistic regressionN/APipeline object createdPipeline ready with two steps
3Fit pipeline on X_train, y_train(100, 4), (100,)Model fittedScaler fit and applied, logistic regression trained
4Predict on X_test(20, 4)(20,)Predictions generated for 20 test samples
5Output predictions(20,)(20,)Final predicted labels array
6EndN/AN/APipeline execution complete
💡 All steps completed; pipeline fit and predictions done
Variable Tracker
VariableStartAfter Step 1After Step 3After Step 4Final
X_trainundefined(100, 4)(100, 4)(100, 4)(100, 4)
y_trainundefined(100,)(100,)(100,)(100,)
pipeundefinedPipeline objectFitted pipelineFitted pipelineFitted pipeline
predsundefinedundefinedundefined(20,)(20,)
Key Moments - 3 Insights
Why do we fit the pipeline instead of fitting scaler and model separately?
Fitting the pipeline (see execution_table step 3) ensures the scaler is fit only on training data and the same scaling is applied during prediction, avoiding data leakage.
What shape does the data have after scaling inside the pipeline?
The shape remains the same (100 samples, 4 features) as scaling changes values but not dimensions, shown in variable_tracker for X_train after step 3.
Why do predictions have shape (20,) after step 4?
Because predictions are labels for each test sample, one per sample, so 20 samples produce 20 predicted labels, as shown in execution_table step 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 3, what happens during pipeline fitting?
AScaler is fit on test data
BOnly logistic regression is fit, scaler is ignored
CScaler and logistic regression are both fit on training data
DPipeline predicts without fitting
💡 Hint
Refer to execution_table row with Step 3 describing fitting process
According to variable_tracker, what is the shape of preds after step 4?
A(20,)
B(100, 4)
C(100,)
D(20, 4)
💡 Hint
Check preds variable column 'After Step 4' in variable_tracker
If we skip scaling in the pipeline, how would execution_table step 3 change?
APipeline would fail to fit
BScaler fit step would be missing, model fit remains same
CPredictions would be shape (100, 4)
DData shape would change to (20,)
💡 Hint
Think about what happens if scaler step is removed from pipeline steps
Concept Snapshot
scikit-learn pipeline:
- Use Pipeline to chain steps like scaling and modeling
- Fit pipeline on training data to avoid data leakage
- Predict or transform data using pipeline
- Keeps code clean and reproducible
- Data shape stays consistent through pipeline steps
Full Transcript
This visual execution shows how to use a scikit-learn pipeline. First, data is loaded with 100 samples and 4 features. Then a pipeline is created with two steps: StandardScaler and LogisticRegression. The pipeline is fit on training data, which fits the scaler and model in order. After fitting, predictions are made on test data with 20 samples. Variables like X_train, y_train, pipeline object, and predictions change state through the steps. Key moments clarify why fitting the pipeline is important to avoid data leakage, how data shape remains the same after scaling, and why predictions have shape matching test samples. The quiz tests understanding of pipeline fitting, prediction shapes, and effects of skipping scaling. The snapshot summarizes pipeline usage for clean, reproducible modeling.