Build a SciPy and scikit-learn Pipeline for Data Transformation and Modeling
📖 Scenario: You are working as a data analyst for a small company. You have some data about customers' ages and incomes, and you want to predict their spending score. To do this, you will prepare the data using SciPy and then build a simple model using scikit-learn's pipeline feature.
🎯 Goal: Create a data dictionary with customer data, set up a configuration variable for a threshold, build a scikit-learn pipeline that uses a SciPy function to transform data and a simple model, then output the transformed data and model predictions.
📋 What You'll Learn
Create a dictionary called
customer_data with keys 'age' and 'income' and the exact lists of values provided.Create a variable called
income_threshold and set it to the exact value 50000.Build a scikit-learn pipeline named
pipeline that uses a SciPy function to apply a logarithm transformation to income and a simple linear regression model.Print the transformed income data and the model predictions exactly as specified.
💡 Why This Matters
🌍 Real World
Data scientists often need to preprocess data using mathematical functions from libraries like SciPy before feeding it into machine learning models. Pipelines help organize these steps cleanly.
💼 Career
Understanding how to combine data transformations and models in a pipeline is a key skill for data analysts and data scientists working on predictive modeling tasks.
Progress0 / 4 steps