0
0
NumPydata~30 mins

NumPy with machine learning libraries - Mini Project: Build & Apply

Choose your learning style9 modes available
Using NumPy with Machine Learning Libraries
📖 Scenario: You are working on a simple machine learning task where you need to prepare data using NumPy arrays before feeding it into a machine learning library.Imagine you have collected data about students' study hours and their exam scores. You want to organize this data and calculate the average score for students who studied more than a certain number of hours.
🎯 Goal: Build a small program that creates a NumPy array with students' study hours and scores, sets a threshold for study hours, filters the students who studied more than the threshold, and calculates the average score of those students.
📋 What You'll Learn
Create a NumPy array with exact data for study hours and scores
Create a variable for the study hours threshold
Use NumPy to filter the array based on the threshold
Calculate the average score of filtered students
Print the average score
💡 Why This Matters
🌍 Real World
Data scientists often use NumPy to prepare and filter data before applying machine learning models. Filtering data based on conditions is a common step in data cleaning and feature selection.
💼 Career
Understanding how to manipulate data with NumPy and integrate it with machine learning workflows is essential for roles like data analyst, data scientist, and machine learning engineer.
Progress0 / 4 steps
1
Create the NumPy array with study hours and scores
Import NumPy as np and create a NumPy array called data with these exact rows: [2, 50], [5, 80], [1, 40], [7, 90], [3, 60]. Each row represents [study_hours, score].
NumPy
Need a hint?

Use np.array() to create the array with the exact rows inside a list.

2
Set the study hours threshold
Create a variable called threshold and set it to the integer 4. This will be the minimum study hours to filter students.
NumPy
Need a hint?

Just assign the number 4 to the variable threshold.

3
Filter students who studied more than the threshold
Use NumPy boolean indexing to create a new array called filtered_scores that contains only the scores of students whose study hours are greater than threshold. Use data[:, 0] to access study hours and data[:, 1] to access scores.
NumPy
Need a hint?

Use data[:, 0] > threshold to get a boolean mask, then select scores with data[mask, 1].

4
Calculate and print the average score
Calculate the average of filtered_scores using np.mean() and store it in average_score. Then print average_score.
NumPy
Need a hint?

Use np.mean(filtered_scores) to get the average, then print it.