In this project, we will use SQL to answer business questions on the Chinook company database. Chinook is an online music store that sells music in a variety of formats. This database contains information about Chinook's sales, employees, and customers.
We will answer the following questions:
%%capture
%load_ext sql
%sql sqlite:///chinook.db
'Connected: None@chinook.db'
%%sql
SELECT
name,
type
FROM sqlite_master
WHERE type in ('table', 'VIEW');
Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
Chinook's database contains 11 tables, as listed above.
%%sql
SELECT *
FROM invoice_line
LIMIT 5;
Done.
invoice_line_id | invoice_id | track_id | unit_price | quantity |
---|---|---|---|---|
1 | 1 | 1158 | 0.99 | 1 |
2 | 1 | 1159 | 0.99 | 1 |
3 | 1 | 1160 | 0.99 | 1 |
4 | 1 | 1161 | 0.99 | 1 |
5 | 1 | 1162 | 0.99 | 1 |
The Chinook record store has just signed a deal with a new record label, and you've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce. The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.
%%sql
WITH usa AS
(
SELECT *
FROM customer c
INNER JOIN invoice i ON i.customer_id = c.customer_id
INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
WHERE country = 'USA'
)
SELECT
g.name AS genre,
COUNT(usa.invoice_line_id) AS total_tracks,
CAST(COUNT(usa.invoice_line_id) as float) / (SELECT COUNT(*) FROM usa) * 100 percentage
FROM usa
INNER JOIN track t ON t.track_id = usa.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC;
Done.
genre | total_tracks | percentage |
---|---|---|
Rock | 561 | 53.37773549000951 |
Alternative & Punk | 130 | 12.369172216936251 |
Metal | 124 | 11.798287345385347 |
R&B/Soul | 53 | 5.042816365366318 |
Blues | 36 | 3.425309229305423 |
Alternative | 35 | 3.3301617507136063 |
Latin | 22 | 2.093244529019981 |
Pop | 22 | 2.093244529019981 |
Hip Hop/Rap | 20 | 1.9029495718363463 |
Jazz | 14 | 1.3320647002854424 |
Easy Listening | 13 | 1.236917221693625 |
Reggae | 6 | 0.570884871550904 |
Electronica/Dance | 5 | 0.47573739295908657 |
Classical | 4 | 0.3805899143672693 |
Heavy Metal | 3 | 0.285442435775452 |
Soundtrack | 2 | 0.19029495718363465 |
TV Shows | 1 | 0.09514747859181732 |
The table above shows Chinook's best selling genres in the US. The genres of the potential new artists are Hip Hop, Punk, Pop, and Blues. Since this data shows that Punk, Blues, and Pop are the 3 best selling of these 4 genres, Chinook may want to purchase the albums from Red Tone (Punk), Meteor and the Girls (Pop), and Slim Jim Bites (Blues). It is worth noting that there was very little difference in the popularity of Pop vs. Hip Hop (.15%), and each of these genres made up about 2% of total sales. Together, these 4 genres made up only about 20% of total US sales. However, Rock is by far the best selling genre, accounting for 53% of sales alone, so if a Rock album became available in the future Chinook may want to consider purchasing it.
%%sql
WITH total_per_agent AS
(
SELECT
c.customer_id,
c.support_rep_id,
SUM(i.total) agent_total
FROM customer c
LEFT JOIN invoice i ON i.customer_id = c.customer_id
GROUP BY 2
)
SELECT
e.first_name || ' ' || e.last_name AS employee_name,
ROUND(t.agent_total, 2) AS total_sales,
e.title AS employee_role,
e.hire_date AS hire_date
FROM employee e
LEFT JOIN total_per_agent t ON t.support_rep_id = e.employee_id
ORDER BY 2 DESC;
Done.
employee_name | total_sales | employee_role | hire_date |
---|---|---|---|
Jane Peacock | 1731.51 | Sales Support Agent | 2017-04-01 00:00:00 |
Margaret Park | 1584.0 | Sales Support Agent | 2017-05-03 00:00:00 |
Steve Johnson | 1393.92 | Sales Support Agent | 2017-10-17 00:00:00 |
Andrew Adams | None | General Manager | 2016-08-14 00:00:00 |
Nancy Edwards | None | Sales Manager | 2016-05-01 00:00:00 |
Michael Mitchell | None | IT Manager | 2016-10-17 00:00:00 |
Robert King | None | IT Staff | 2017-01-02 00:00:00 |
Laura Callahan | None | IT Staff | 2017-03-04 00:00:00 |
Of Chinook's 8 employees, 3 are Sales Support Agents. The difference in sales between the employees may be a result of their hire date - the employee with the earliest hire date has the most sales, and the employee with the most recent hire date has the least sales, with a total difference of about 20%.
%%sql
WITH country_or_other AS
(
SELECT
CASE
WHEN (SELECT COUNT(*)
FROM customer
WHERE country = c.country) = 1
THEN 'Other'
ELSE c.country
END AS country,
c.customer_id,
il.*
FROM invoice_line il
INNER JOIN invoice i ON i.invoice_id = il.invoice_id
INNER JOIN customer c on c.customer_id = i.customer_id
)
SELECT
country,
num_cust,
total_sales,
avg_total_sales_per_cust,
avg_order_val
FROM
(SELECT
country,
COUNT(DISTINCT customer_id) num_cust,
ROUND(SUM(unit_price), 2) total_sales,
ROUND(CAST(SUM(unit_price) as float) / CAST (COUNT(DISTINCT customer_id) as float), 2) avg_total_sales_per_cust,
ROUND(CAST(SUM(unit_price) as float) / CAST(COUNT(DISTINCT invoice_id) as float), 2) avg_order_val,
CASE
WHEN country = 'Other' THEN 1
ELSE 0
END AS sort
FROM country_or_other
GROUP BY 1
ORDER BY sort, 5 DESC);
Done.
country | num_cust | total_sales | avg_total_sales_per_cust | avg_order_val |
---|---|---|---|---|
Czech Republic | 2 | 273.24 | 136.62 | 9.11 |
United Kingdom | 3 | 245.52 | 81.84 | 8.77 |
India | 2 | 183.15 | 91.58 | 8.72 |
Germany | 4 | 334.62 | 83.66 | 8.16 |
USA | 13 | 1040.49 | 80.04 | 7.94 |
France | 5 | 389.07 | 77.81 | 7.78 |
Canada | 8 | 535.59 | 66.95 | 7.05 |
Brazil | 5 | 427.68 | 85.54 | 7.01 |
Portugal | 2 | 185.13 | 92.57 | 6.38 |
Other | 15 | 1094.94 | 73.0 | 7.45 |
The largest portion of Chinook's customers and total sales is concentrated in the North America region, specifically the USA and Canada. Since this market accounts for a large volume of sales, it may be important to consider their purchasing patterns when looking at new artists or albums.
However, other specific countries may offer good business opportunites. The Czech Republic, Portugal, and India have the highest average lifetime values per customer. The Czech Republic, UK, India, and Germany have the highest average order value. These observations are based on limited data, as each of these countries represents only 2-4 total customers.
%%sql
WITH invoice_info AS
(
SELECT
il.invoice_id,
t.album_id,
COUNT(il.track_id) num_tracks
FROM invoice_line il
LEFT JOIN track t on t.track_id = il.track_id
GROUP BY invoice_id, album_id
),
album_info AS
(
SELECT
album_id,
COUNT(track_id) num_tracks
FROM track
GROUP BY 1
)
SELECT
CASE
WHEN (SELECT num_tracks FROM album_info WHERE album_id = iv.album_id) = iv.num_tracks AND al.num_tracks > 2
THEN 'Album'
ELSE 'Not album'
END AS album_purchase,
COUNT(DISTINCT invoice_id) num_invoices,
ROUND(CAST(COUNT(DISTINCT invoice_id) as float) / (SELECT COUNT(DISTINCT invoice_id) FROM invoice_info) * 100, 2) percent_invoices
FROM invoice_info iv
LEFT JOIN album_info al on al.album_id = iv.album_id
GROUP BY album_purchase
LIMIT 20;
Done.
album_purchase | num_invoices | percent_invoices |
---|---|---|
Album | 111 | 18.08 |
Not album | 503 | 81.92 |
Purchases of full albums account for about 20% of total sales. Based on this, I would recommend to continue selling full albums, as this accounts for 1/5 of total purchases. Removing whole albums might signficantly cut revene as a result.
%%sql
SELECT
a.name artist_name,
COUNT(a.name) num_tracks,
g.name genre
FROM artist a
LEFT JOIN album al on al.artist_id = a.artist_id
LEFT JOIN track t on t.album_id = al.album_id
LEFT JOIN playlist_track pt on pt.track_id = t.track_id
LEFT JOIN genre g on g.genre_id = t.genre_id
GROUP BY artist_name
ORDER BY 2 DESC
LIMIT 10;
Done.
artist_name | num_tracks | genre |
---|---|---|
Iron Maiden | 516 | Rock |
U2 | 333 | Pop |
Metallica | 296 | Metal |
Led Zeppelin | 252 | Rock |
Deep Purple | 226 | Rock |
Lost | 184 | Drama |
Pearl Jam | 177 | Rock |
Eric Clapton | 145 | Latin |
Faith No More | 145 | Alternative & Punk |
Lenny Kravitz | 143 | Metal |
%%sql
SELECT
a.name artist_name,
COUNT(DISTINCT pt.playlist_id) num_playlists,
g.name genre,
COUNT(pt.track_id) num_tracks
FROM artist a
LEFT JOIN album al on al.artist_id = a.artist_id
LEFT JOIN track t on t.album_id = al.album_id
LEFT JOIN playlist_track pt on pt.track_id = t.track_id
LEFT JOIN genre g on g.genre_id = t.genre_id
GROUP BY artist_name
ORDER BY 2 DESC
LIMIT 10;
Done.
artist_name | num_playlists | genre | num_tracks |
---|---|---|---|
Eugene Ormandy | 7 | Classical | 15 |
Academy of St. Martin in the Fields & Sir Neville Marriner | 6 | Classical | 10 |
Berliner Philharmoniker & Herbert Von Karajan | 6 | Classical | 13 |
English Concert & Trevor Pinnock | 6 | Classical | 9 |
The King's Singers | 6 | Classical | 10 |
Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart | 5 | Classical | 5 |
Adrian Leaper & Doreen de Feis | 5 | Classical | 5 |
Alberto Turco & Nova Schola Gregoriana | 5 | Classical | 5 |
Antal Doráti & London Symphony Orchestra | 5 | Classical | 5 |
Barry Wordsworth & BBC Concert Orchestra | 5 | Classical | 5 |
%%sql
SELECT
pl.playlist_id,
pl.name playlist_name,
ar.name artist,
COUNT(pt.track_id) num_tracks
FROM playlist pl
LEFT JOIN playlist_track pt on pt.playlist_id = pl.playlist_id
LEFT JOIN track t on t.track_id = pt.track_id
LEFT JOIN album a on a.album_id = t.album_id
LEFT JOIN artist ar on ar.artist_id = a.artist_id
WHERE ar.name = 'Iron Maiden'
GROUP BY pt.playlist_id;
Done.
playlist_id | playlist_name | artist | num_tracks |
---|---|---|---|
1 | Music | Iron Maiden | 213 |
5 | 90’s Music | Iron Maiden | 84 |
8 | Music | Iron Maiden | 213 |
17 | Heavy Metal Classic | Iron Maiden | 6 |
%%sql
SELECT
pl.playlist_id,
pl.name,
COUNT(pt.track_id)
FROM playlist pl
LEFT JOIN playlist_track pt on pt.playlist_id = pl.playlist_id
GROUP BY pl.playlist_id
Done.
playlist_id | name | COUNT(pt.track_id) |
---|---|---|
1 | Music | 3290 |
2 | Movies | 0 |
3 | TV Shows | 213 |
4 | Audiobooks | 0 |
5 | 90’s Music | 1477 |
6 | Audiobooks | 0 |
7 | Movies | 0 |
8 | Music | 3290 |
9 | Music Videos | 1 |
10 | TV Shows | 213 |
11 | Brazilian Music | 39 |
12 | Classical | 75 |
13 | Classical 101 - Deep Cuts | 25 |
14 | Classical 101 - Next Steps | 25 |
15 | Classical 101 - The Basics | 25 |
16 | Grunge | 15 |
17 | Heavy Metal Classic | 26 |
18 | On-The-Go 1 | 1 |
We can measure the most popular artists in playlists in different ways. First, we can look at which artist's tracks appear most frequently. We can also consider the number of playlists an artist appears in.
The most frequently appearing artist in playlists is Iron Maiden, whose tracks appeared 516 times. They are followed by U2, Metallica, Led Zeppelin, and Deep Purple. Of the top 10 most popular playlist artists, 4 are Rock and 2 are Metal; the other 6 are from a range of different genres. However, if we look at all the playlists in the Chinook database, we see that there are only 14 playlists with tracks added to them; their sizes range from 1 track to over 3,000 tracks. As a result, those large numbers of tracks may only be coming from several specific playlists. For example, Iron Maiden appears only on 4 playlists, including 213 tracks on two of those playlists. This may respresent big Iron Maiden fans who are adding all available Iron Maiden tracks to their playlist.
If we measure artist popularity in playlists by the number of playlists an artist appears on, we get very different results. The top 10 artists who appeared in the largest number of playlists were all classical music, appearing on 5-7 playlists, with 5-15 tracks each. Four playlists have "Classical" in their title. There may be demand for general classical music playlists, as opposed to fans of a specific classical artist.
%%sql
WITH no_purchased AS
(
SELECT
track_id,
COUNT(track_id) no_tracks
FROM invoice_line
GROUP BY 1
)
SELECT
(SELECT COUNT(*) FROM no_purchased) tracks_purchased,
(SELECT COUNT(*) FROM track) - (SELECT COUNT(*) FROM no_purchased) as tracks_not_purchased,
ROUND(CAST((SELECT COUNT(*) FROM no_purchased) as float) / CAST((SELECT COUNT(*) FROM track) as float) * 100, 2) percent_purchased,
ROUND(CAST((SELECT COUNT(*) FROM track) - (SELECT COUNT(*) FROM no_purchased) as float) / CAST((SELECT COUNT(*) FROM track) as float) * 100, 2) percent_not_purchased
FROM track t
LEFT JOIN no_purchased np on np.track_id = t.track_id
GROUP BY 1;
Done.
tracks_purchased | tracks_not_purchased | percent_purchased | percent_not_purchased |
---|---|---|---|
1806 | 1697 | 51.56 | 48.44 |
%%sql
SELECT
COUNT(*)
FROM track
Done.
COUNT(*) |
---|
3503 |
Out of 3503 tracks available in Chinook's store, 1806 have been purchased, and 1697 have not been. Almost half of the tracks available have yet to be purchased by any customer. Chinook may want to remove some of these tracks from its inventory.
%%sql
SELECT
t.name track_name,
ar.name artist,
COUNT(il.track_id) number_sold,
g.name genre
FROM track t
LEFT JOIN invoice_line il ON il.track_id = t.track_id
LEFT JOIN album al on al.album_id = t.album_id
LEFT JOIN artist ar on ar.artist_id = al.artist_id
LEFT JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 3 DESC
LIMIT 10
Done.
track_name | artist | number_sold | genre |
---|---|---|---|
War Pigs | Cake | 33 | Alternative |
Are You Experienced? | Jimi Hendrix | 14 | Rock |
Changes | Godsmack | 14 | Metal |
Highway Chile | Jimi Hendrix | 14 | Rock |
Hey Joe | Jimi Hendrix | 13 | Rock |
Put The Finger On You | AC/DC | 13 | Rock |
Third Stone From The Sun | Jimi Hendrix | 13 | Rock |
Drain You | Nirvana | 12 | Rock |
Love Or Confusion | Jimi Hendrix | 12 | Rock |
Radio/Video | System Of A Down | 12 | Metal |
The best selling track is War Pigs by Cake, with 33 having been sold. The rest of the ten best selling tracks sold between 12 and 14 copies. 5 of the 10 best selling tracks are by Jimi Hendrix. 7 out of 10 are Rock, 2 are Metal, and 1 is Alternative. Next, we will look at the best selling artists and albums.
%%sql
CREATE VIEW track_info AS
SELECT
il.track_id,
t.name track,
ar.name artist,
al.title album,
g.name genre,
t.media_type_id
FROM invoice_line il
LEFT JOIN track t on t.track_id = il.track_id
LEFT JOIN album al on al.album_id = t.album_id
LEFT JOIN artist ar on ar.artist_id = al.artist_id
LEFT JOIN genre g on g.genre_id = t.genre_id
(sqlite3.OperationalError) table track_info already exists [SQL: CREATE VIEW track_info AS SELECT il.track_id, t.name track, ar.name artist, al.title album, g.name genre, t.media_type_id FROM invoice_line il LEFT JOIN track t on t.track_id = il.track_id LEFT JOIN album al on al.album_id = t.album_id LEFT JOIN artist ar on ar.artist_id = al.artist_id LEFT JOIN genre g on g.genre_id = t.genre_id] (Background on this error at: http://sqlalche.me/e/e3q8)
%%sql
SELECT
artist,
COUNT(track_id) tracks_sold,
ROUND(CAST(COUNT(track_id) as float) / (SELECT COUNT(track_id) FROM track_info) * 100, 2) percentage_total_sales,
genre
FROM track_info
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
Done.
artist | tracks_sold | percentage_total_sales | genre |
---|---|---|---|
Queen | 192 | 4.04 | Rock |
Jimi Hendrix | 187 | 3.93 | Rock |
Nirvana | 130 | 2.73 | Rock |
Red Hot Chili Peppers | 130 | 2.73 | Rock |
Pearl Jam | 129 | 2.71 | Rock |
AC/DC | 124 | 2.61 | Rock |
Guns N' Roses | 124 | 2.61 | Metal |
Foo Fighters | 121 | 2.54 | Rock |
The Rolling Stones | 117 | 2.46 | Rock |
Metallica | 106 | 2.23 | Metal |
The table above shows the top 10 best selling artists at Chinook. 3 of these artists also have top-10 selling tracks at Chinook: Jimi Hendrix, Nirvana, and AC/DC. Each of these top-10 artists represents 2-4% of total track purchases at the store. 8 out of 10 of these artists are from the Rock genre; the remaining 2 are Metal. Chinook may want to add more tracks by these top artists to the store, as a number of customers are fans of these artists and would likely by more of their music.
%%sql
SELECT
album,
artist,
COUNT(track_id) tracks_sold,
ROUND(CAST(COUNT(track_id) as float) / (SELECT COUNT(track_id) FROM track_info) * 100, 2) percentage_total_sales,
genre
FROM track_info
GROUP BY 1
ORDER BY 3 DESC
LIMIT 10
Done.
album | artist | tracks_sold | percentage_total_sales | genre |
---|---|---|---|---|
Are You Experienced? | Jimi Hendrix | 187 | 3.93 | Rock |
Faceless | Godsmack | 96 | 2.02 | Metal |
Mezmerize | System Of A Down | 93 | 1.96 | Metal |
Get Born | JET | 90 | 1.89 | Alternative & Punk |
The Doors | The Doors | 83 | 1.74 | Rock |
Big Ones | Aerosmith | 80 | 1.68 | Rock |
Greatest Hits I | Queen | 80 | 1.68 | Rock |
The Police Greatest Hits | The Police | 80 | 1.68 | Rock |
From The Muddy Banks Of The Wishkah [live] | Nirvana | 78 | 1.64 | Rock |
My Generation - The Very Best Of The Who | The Who | 76 | 1.6 | Rock |
The table above shows the top selling albums, measured by total number of tracks sold from the album.
Once again, Jimi Hendrix is at the top of the list. He is Chinook's overall best selling artist, so opportunities to sell more of his music will likely be good for the business.
Rock, followed by Metal, once again dominate this list. As the top songs, albums, and artist are Rock musicians, followed by Metal musicians, Chinook may want to consider offering more music from these genres.
%%sql
SELECT
g.name genre,
COUNT(DISTINCT t.track_id) tracks_available,
ROUND(CAST(COUNT(DISTINCT t.track_id) as float) / CAST((SELECT COUNT(*) FROM track) as float) * 100, 2) percentage_tracks_avail,
COUNT(DISTINCT il.track_id) track_has_sold,
ROUND(CAST(COUNT(DISTINCT il.track_id) as float) / CAST(COUNT(DISTINCT t.track_id) as float) * 100, 2) as percent_has_sold,
COUNT(il.track_id) number_sold,
ROUND(CAST(COUNT(il.track_id) as float) / CAST((SELECT COUNT(track_id) FROM invoice_line) as float) * 100, 2) percentage_total_sales
FROM track t
LEFT JOIN genre g ON g.genre_id = t.genre_id
LEFT JOIN invoice_line il ON il.track_id = t.track_id
GROUP BY g.name
ORDER BY 6 DESC
Done.
genre | tracks_available | percentage_tracks_avail | track_has_sold | percent_has_sold | number_sold | percentage_total_sales |
---|---|---|---|---|---|---|
Rock | 1297 | 37.03 | 915 | 70.55 | 2635 | 55.39 |
Metal | 374 | 10.68 | 238 | 63.64 | 619 | 13.01 |
Alternative & Punk | 332 | 9.48 | 176 | 53.01 | 492 | 10.34 |
Latin | 579 | 16.53 | 119 | 20.55 | 167 | 3.51 |
R&B/Soul | 61 | 1.74 | 55 | 90.16 | 159 | 3.34 |
Blues | 81 | 2.31 | 56 | 69.14 | 124 | 2.61 |
Jazz | 130 | 3.71 | 61 | 46.92 | 121 | 2.54 |
Alternative | 40 | 1.14 | 34 | 85.0 | 117 | 2.46 |
Easy Listening | 24 | 0.69 | 24 | 100.0 | 74 | 1.56 |
Pop | 48 | 1.37 | 25 | 52.08 | 63 | 1.32 |
Electronica/Dance | 30 | 0.86 | 29 | 96.67 | 55 | 1.16 |
Classical | 74 | 2.11 | 16 | 21.62 | 47 | 0.99 |
Reggae | 58 | 1.66 | 22 | 37.93 | 35 | 0.74 |
Hip Hop/Rap | 35 | 1.0 | 21 | 60.0 | 33 | 0.69 |
Heavy Metal | 28 | 0.8 | 7 | 25.0 | 8 | 0.17 |
Soundtrack | 43 | 1.23 | 5 | 11.63 | 5 | 0.11 |
TV Shows | 93 | 2.65 | 2 | 2.15 | 2 | 0.04 |
Drama | 64 | 1.83 | 1 | 1.56 | 1 | 0.02 |
Bossa Nova | 15 | 0.43 | 0 | 0.0 | 0 | 0.0 |
Comedy | 17 | 0.49 | 0 | 0.0 | 0 | 0.0 |
Opera | 1 | 0.03 | 0 | 0.0 | 0 | 0.0 |
Rock And Roll | 12 | 0.34 | 0 | 0.0 | 0 | 0.0 |
Sci Fi & Fantasy | 26 | 0.74 | 0 | 0.0 | 0 | 0.0 |
Science Fiction | 13 | 0.37 | 0 | 0.0 | 0 | 0.0 |
World | 28 | 0.8 | 0 | 0.0 | 0 | 0.0 |
In general, the range of tracks available for sale was reflective of that genre's popularity.
For example, the 3 top selling genres had a larger variety of tracks available than other genres
Within the Rock genre, 70% of tracks available have sold. The same is true of 63% of Metal tracks and 53% of Alternative & Punk tracks. Rock was by far the most represented genre in the top ten selling artists, albums, and tracks, followed by metal. Chinook may want to continue to expand its music offerings in Rock and possibly Metal, or advertize its current catalog in these genres, as these are the best selling genres in the store. A relatively high percentage of Rock and Metal tracks sell, so additional purchases in these areas may be low-risk opportunities to increase business.
One noteable exception to this is the Latin genre, which represented only 3% of total sales but had 579 tracks available (16% of total tracks available for purchase). Moreover, only 20% of Latin tracks available at Chinook have ever been sold. As a result, Chinook may want to consider dropping some of these tracks from its collection.
There are 7 genres that have not sold any tracks. 4 additional genres account for less than .5% of sales each. Chinook may want to consider eliminating these offerings as well.
There are 4 genres where Chinook may want to experiment with expanding its offerings: Easy Listening, Electronic/Dance, R&B/Soul, and Alternative. Each of these genres currently accounts for a small percentage of total offerings (.69-1.74%) and sales (1-3% each). However, these genres are selling a disproportionately high number and percentage of their tracks as compared to other offerings in the store. Chinook could experiment with a small expansion of offerings and/or advertising in these genres.
%%sql
WITH media_info AS
(
SELECT
COUNT(DISTINCT track_id) num_tracks,
t.media_type_id,
mt.name media_type,
t.track_id
FROM track t
LEFT JOIN media_type mt on mt.media_type_id = t.media_type_id
GROUP BY 2
)
SELECT
mi.media_type media_type,
COUNT(ti.track_id) tracks_sold,
ROUND(CAST(COUNT(ti.track_id) as float) / (SELECT COUNT(*) FROM track_info) * 100, 2) percentage_total_sold,
ROUND(CAST(mi.num_tracks as float) / (SELECT COUNT(*) FROM track) * 100, 2) percentage_available_tracks,
CASE
WHEN mi.media_type LIKE '%Protected%'
THEN 'Protected'
ELSE 'Non-Protected'
END AS protected
FROM track_info ti
LEFT JOIN media_info mi on ti.media_type_id = mi.media_type_id
LEFT JOIN track t on t.track_id = ti.track_id
GROUP BY 1
ORDER BY 2 DESC
Done.
media_type | tracks_sold | percentage_total_sold | percentage_available_tracks | protected |
---|---|---|---|---|
MPEG audio file | 4259 | 89.53 | 86.61 | Non-Protected |
Protected AAC audio file | 439 | 9.23 | 6.77 | Protected |
Purchased AAC audio file | 35 | 0.74 | 0.2 | Non-Protected |
AAC audio file | 21 | 0.44 | 0.31 | Non-Protected |
Protected MPEG-4 video file | 3 | 0.06 | 6.11 | Protected |
Out of the 5 media types available, MPEG is by far the most popular, representing 89% of tracks sold and 86% of tracks available for purchase. This file format is non-protected. About 90% of tracks sold were non-protected, compared with about 87% of tracks available for purchase. Looking at this data, it is unclear WHY this format is by far the most popular. It could be because MPEG is the most common file type for sale at Chinook or because customers specifically like MPEG files. Since the other non-protected file formats made up only a tiny portion of Chinook's sales, customers may not care if a file is protected or not when purchasing music.
In this project, we analyzed sales data from the Chinook music store's database, using SQL queries.
I gained a lot of experience writing and tweaking SQL queries. I felt my fluency with SQL improving as a moved through this project, especially as I was able to ask and answer some of my own questions on this data. I learned how to construct more complex queries peice by peice, as an iterative process. I am now more confident in understanding exactly what a query will produce, and better able to tell if it is producing what I intended vs. an unreasonable response. I am beginning to understand how and when to use WITH, VIEW and CASE better, including to loop through a dataset looking for matches to a condition. Next, I'd like to continue to gain more fluency with SQL, especially with writing more complex queries and WITH, VIEW and CASE. I'd also like to gain confidence using SQL with a larger dataset.