0
0
SciPydata~30 mins

SciPy with scikit-learn pipeline - Mini Project: Build & Apply

Choose your learning style9 modes available
Build a SciPy and scikit-learn Pipeline for Data Transformation and Modeling
📖 Scenario: You are working as a data analyst for a small company. You have some data about customers' ages and incomes, and you want to predict their spending score. To do this, you will prepare the data using SciPy and then build a simple model using scikit-learn's pipeline feature.
🎯 Goal: Create a data dictionary with customer data, set up a configuration variable for a threshold, build a scikit-learn pipeline that uses a SciPy function to transform data and a simple model, then output the transformed data and model predictions.
📋 What You'll Learn
Create a dictionary called customer_data with keys 'age' and 'income' and the exact lists of values provided.
Create a variable called income_threshold and set it to the exact value 50000.
Build a scikit-learn pipeline named pipeline that uses a SciPy function to apply a logarithm transformation to income and a simple linear regression model.
Print the transformed income data and the model predictions exactly as specified.
💡 Why This Matters
🌍 Real World
Data scientists often need to preprocess data using mathematical functions from libraries like SciPy before feeding it into machine learning models. Pipelines help organize these steps cleanly.
💼 Career
Understanding how to combine data transformations and models in a pipeline is a key skill for data analysts and data scientists working on predictive modeling tasks.
Progress0 / 4 steps
1
DATA SETUP: Create the customer data dictionary
Create a dictionary called customer_data with two keys: 'age' and 'income'. Set 'age' to the list [25, 32, 47, 51, 62] and 'income' to the list [40000, 52000, 61000, 58000, 72000].
SciPy
Need a hint?

Use curly braces to create a dictionary. The keys are 'age' and 'income'. Assign the exact lists to each key.

2
CONFIGURATION: Set the income threshold
Create a variable called income_threshold and set it to the integer 50000.
SciPy
Need a hint?

Just assign the number 50000 to the variable named income_threshold.

3
CORE LOGIC: Build the SciPy and scikit-learn pipeline
Import FunctionTransformer from sklearn.preprocessing, LinearRegression from sklearn.linear_model, and log from scipy.special. Then create a pipeline called pipeline that first applies the logarithm transformation to the income data using FunctionTransformer with log, and then fits a LinearRegression model.
SciPy
Need a hint?

Use Pipeline with two steps: a FunctionTransformer that applies log, and a LinearRegression model.

4
OUTPUT: Transform income and predict spending score
Use the pipeline to fit the model using the income data reshaped as a 2D array. Then print the transformed income data after the log transform step and print the predictions from the linear regression model. Use print(transformed_income) and print(predictions) exactly.
SciPy
Need a hint?

Use np.array and reshape(-1, 1) to prepare income data. Fit the pipeline with income and age. Use pipeline.named_steps['log_transform'].transform() to get transformed income. Use pipeline.predict() for predictions. Print both results.