Before diving deeper into real data, let's practice calculating statistics with a simple list of test scores.
Remember, you don't have to be perfect at writing these functions. As we move to using Data Science libraries such as Pandas, these functions will be written for you! But familiarity with working with lists and understanding these aggregates is important.
# Our toy dataset - test scores from a small class
scores = [72, 85, 90, 78, 92, 88, 75, 95, 82, 88, 91, 79, 86, 90, 83]
# Let's see what we're working with
print(f"Number of scores: {len(scores)}")
print(f"Scores: {scores}")
# TODO: Calculate the mean score
# Step 1: Get the sum of all scores
# Step 2: Divide by the number of scores
# Step 3: Print the result
mean_score = # YOUR CODE HERE
print(f"Mean score: {mean_score}")
# TODO: Calculate the median score
# Step 1: Sort the scores (hint: sorted() returns a new sorted list)
# Step 2: Find the middle position
# Step 3: Get the value at that position
sorted_scores = # YOUR CODE HERE
middle_index = # YOUR CODE HERE
median_score = # YOUR CODE HERE
print(f"Median score: {median_score}")
Quartiles split the data into 4 equal parts. Find Q1 (25th percentile), Q2 (50th percentile), and Q3 (75th percentile).
//
to get whole number indices//
automatically floors: 15 // 4
→ 3
(not 3.75)3 * n // 4
gives you three-quarters through the list# TODO: Calculate quartiles
# First, make sure you have sorted scores from Practice 2!
n = len(sorted_scores)
# Q1 is at position n//4
q1_index = # YOUR CODE HERE
q1 = # YOUR CODE HERE
# Q2 is at position n//2 (this is the median!)
q2_index = # YOUR CODE HERE
q2 = # YOUR CODE HERE
# Q3 is at position 3*n//4
q3_index = # YOUR CODE HERE
q3 = # YOUR CODE HERE
print(f"Q1 (25th percentile): {q1}")
print(f"Q2 (50th percentile): {q2}")
print(f"Q3 (75th percentile): {q3}")
Write code that can find ANY percentile (like the 10th or 90th percentile).
(percentile / 100) * (len(data) - 1)
scores[3.75]
gives an error!int()
to convert: int(3.75)
→ 3
(rounds down/floors)import math
then use math.floor(3.75)
→ 3
def get_percentile(data, percentile):
"""
Find the value at a given percentile.
percentile should be between 0 and 100.
"""
sorted_data = sorted(data)
# TODO: Calculate the index for this percentile
# If percentile is 25, we want the value 25% through the sorted list
# IMPORTANT: This calculation might give you a decimal like 3.75
# But list indices must be whole numbers!
position = # YOUR CODE HERE (this might be a float)
index = # YOUR CODE HERE (convert to an integer)
return sorted_data[index]
# Test it out
print(f"10th percentile: {get_percentile(scores, 10)}")
print(f"90th percentile: {get_percentile(scores, 90)}")
print(f"50th percentile: {get_percentile(scores, 50)}") # Should match median!
Given a score, figure out what percentile it's at (the reverse of Practice 4).
count_below += 1
inside the if statement:.1f
in f-strings to show 1 decimal placedef find_percentile_rank(data, value):
"""
Find what percentile a value is at.
Example: If 80% of scores are below 90, then 90 is at the 80th percentile.
"""
sorted_data = sorted(data)
# TODO: Count how many values are less than our target value
count_below = 0
for score in sorted_data:
if score < value:
# YOUR CODE HERE
# TODO: Calculate the percentile
# percentile = (count_below / total_count) * 100
percentile = # YOUR CODE HERE
return percentile
# Test it
print(f"A score of 85 is at the {find_percentile_rank(scores, 85):.1f} percentile")
print(f"A score of 90 is at the {find_percentile_rank(scores, 90):.1f} percentile")
print(f"A score of 75 is at the {find_percentile_rank(scores, 75):.1f} percentile")