Assignment: California House Price Analysis¶

Your Task¶

You'll analyze real California house prices to understand wealth distribution. Calculate:

Mean and median house prices
Quartiles (Q1, Q2, Q3) to understand the distribution
Create a histogram using print statements
Write a function that tells you what percentile a house price falls in

This will show you why median home price is more meaningful than average!

Setup¶

Step 1: Getting Access to the Dataset¶

First, you'll need to access our shared datasets folder and add it to your Drive.

Open this link to our shared datasets folder in a new tab
Click the dropdown arrow next to "datasets" at the top
Select "Organize" → "Add shortcut"
Choose All Locations and then "My Drive" as the location
Click "Add"

You should now see "datasets" in your Google Drive!

Step 2: Connect Colab to Your Drive¶

In your Colab notebook, run this code to access your Google Drive:

# Connect to Google Drive
from google.colab import drive
drive.mount("/content/gdrive")

When prompted:

Click the link
Choose your Google account
Allow access
Copy the authorization code back to Colab

Step 3: Load the House Price Data¶

# Open and read the file
file_content = open("/content/gdrive/MyDrive/datasets/house_prices.txt", "r").read()

# Convert file lines to numbers
prices = []
for line in file_content.strip().split('\n'):
    prices.append(float(line))

print(f"Loaded {len(prices)} house prices")
print(f"First 5 prices: {prices[:5]}")

Assignment¶

Task 1: Calculate Mean and Median¶

# Calculate the mean
# TODO: Sum all prices and divide by count

# Calculate the median  
# TODO: Sort the prices first
# TODO: Find the middle value (remember even vs odd count!)

# Print your results
print(f"Mean house price: ${mean_price:,.2f}")
print(f"Median house price: ${median_price:,.2f}")

What do you notice about the difference?

Task 2: Find the Quartiles¶

Quartiles divide your data into four equal parts:

Q1: 25th percentile (25% of houses cost less)
Q2: 50th percentile (the median)
Q3: 75th percentile (75% of houses cost less)

# Sort the data first!
sorted_prices = sorted(prices)
n = len(sorted_prices)

# Calculate quartile positions
# TODO: Q1 is at position n//4
# TODO: Q2 is at position n//2  
# TODO: Q3 is at position 3*n//4

print(f"Q1 (25th percentile): ${q1:,.2f}")
print(f"Q2 (50th percentile): ${q2:,.2f}")
print(f"Q3 (75th percentile): ${q3:,.2f}")

Task 3: Percentile Calculator¶

Write a function that tells you what percentile a given house price is at:

def find_percentile(house_price, all_prices):
    sorted_prices = sorted(all_prices)

    # Count how many prices are below this price
    # TODO: Loop through sorted_prices and count

    # Calculate the percentile
    # TODO: (count_below / total_count) * 100

    return percentile

# Test your function
my_house = 450000
result = find_percentile(my_house, prices)
print(f"A ${my_house:,} house is at the {result:.1f} percentile")

Task 4: Create a Histogram¶

Visualize the distribution using print statements where each █ represents 20 houses.

# Define price ranges (bins)
bin_0_50k = 0
bin_50_100k = 0
bin_100_150k = 0
# Etc.


# Loop through the prices and, for each one,
# use an if/elif/else statement to count the house
# in the correct bin


# This function, which makes a bar of the correct length,
# has been written for you.
def make_bar(count):
    bar = ""
    for i in range(size // 20):
        bar += "█"
    return bar


# Print the histogram
print("California House Price Distribution:")
print(f"$0-50k:   {make_bar(bin_0_50k)}")
print(f"$50-100k: {make_bar(bin_50_100k)}")
print(f"$100-150k: {make_bar(bin_100_150k)}")
# TODO: Print the other bins