Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is locateddata/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartmentsdata/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed# We've loaded your first package for you! You can add as many cells as you need.
import numpy as np
import pandas as pd
# Begin coding here ...
price = pd.read_csv("datasets/airbnb_price.csv")
room = pd.read_excel("datasets/airbnb_room_type.xlsx")
review = pd.read_csv("datasets/airbnb_last_review.tsv", sep='\t')
price.head()
listing_id | price | nbhood_full | |
---|---|---|---|
0 | 2595 | 225 dollars | Manhattan, Midtown |
1 | 3831 | 89 dollars | Brooklyn, Clinton Hill |
2 | 5099 | 200 dollars | Manhattan, Murray Hill |
3 | 5178 | 79 dollars | Manhattan, Hell's Kitchen |
4 | 5238 | 150 dollars | Manhattan, Chinatown |
room.head()
listing_id | description | room_type | |
---|---|---|---|
0 | 2595 | Skylit Midtown Castle | Entire home/apt |
1 | 3831 | Cozy Entire Floor of Brownstone | Entire home/apt |
2 | 5099 | Large Cozy 1 BR Apartment In Midtown East | Entire home/apt |
3 | 5178 | Large Furnished Room Near B'way | private room |
4 | 5238 | Cute & Cozy Lower East Side 1 bdrm | Entire home/apt |
review.head()
listing_id | host_name | last_review | |
---|---|---|---|
0 | 2595 | Jennifer | May 21 2019 |
1 | 3831 | LisaRoxanne | July 05 2019 |
2 | 5099 | Chris | June 22 2019 |
3 | 5178 | Shunichi | June 24 2019 |
4 | 5238 | Ben | June 09 2019 |
review["last_review"] = pd.to_datetime(review["last_review"], infer_datetime_format=True, errors='coerce')
review["last_review"].head()
0 2019-05-21 1 2019-07-05 2 2019-06-22 3 2019-06-24 4 2019-06-09 Name: last_review, dtype: datetime64[ns]
first_review_date = review["last_review"].min()
last_review_date = review["last_review"].max()
display(first_review_date, last_review_date)
Timestamp('2019-01-01 00:00:00')
Timestamp('2019-07-09 00:00:00')
room["room_type"] = room["room_type"].str.lower()
room["room_type"].value_counts()
entire home/apt 13266 private room 11356 shared room 587 Name: room_type, dtype: int64
private_rooms = room["room_type"].value_counts()["private room"]
display(private_rooms)
11356
price["price"].head()
0 225 dollars 1 89 dollars 2 200 dollars 3 79 dollars 4 150 dollars Name: price, dtype: object
price["price"] = price["price"].str.replace(" dollars", "")
price["price"].head()
0 225 1 89 2 200 3 79 4 150 Name: price, dtype: object
price["price"] = price["price"].astype("float64")
average_price = round(price["price"].mean(), 2)
display(average_price)
141.78
results = {
"first_reviewed": [first_review_date],
"last_reviewed": [last_review_date],
"nb_private_rooms": [private_rooms],
"avg_price": [average_price]
}
review_dates = pd.DataFrame(results)
display(review_dates)
first_reviewed | last_reviewed | nb_private_rooms | avg_price | |
---|---|---|---|---|
0 | 2019-01-01 | 2019-07-09 | 11356 | 141.78 |