Using SQL To Answer Business Questions on the Chinook Database

In this project, we will use SQL to answer business questions on the Chinook company database. Chinook is an online music store that sells music in a variety of formats. This database contains information about Chinook's sales, employees, and customers.

We will answer the following questions:

  • We have a choice of 4 new albums to purchase, each from a different genre. Which ones should we choose?
  • How can we compare the performance of our company's sales agents?
  • Where are most of our sales? Which countries offer the best business opportunities?
  • Should we keep selling full albums, or just sell individual tracks?
  • Which artist is used most in playlists?
  • How many tracks have been purchased vs. not purchased?
  • What tracks, albums, and artists are most popular?
  • Is the range of tracks in the store reflective of their sales popularity? Are there any under valued genres to expand in, or genres to cut back Chinook's offerings?
  • Do protected vs. non-protected music formats impact sales?

Read and Explore Database

In [ ]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db
Out[ ]:
'Connected: [email protected]'
In [ ]:
%%sql
SELECT 
    name, 
    type
FROM sqlite_master
WHERE type in ('table', 'VIEW');
Done.
Out[ ]:
name type
album table
artist table
customer table
employee table
genre table
invoice table
invoice_line table
media_type table
playlist table
playlist_track table
track table

Chinook's database contains 11 tables, as listed above.

In [ ]:
%%sql
SELECT * 
FROM invoice_line
LIMIT 5;
Done.
Out[ ]:
invoice_line_id invoice_id track_id unit_price quantity
1 1 1158 0.99 1
2 1 1159 0.99 1
3 1 1160 0.99 1
4 1 1161 0.99 1
5 1 1162 0.99 1

Selecting Albums to Purchase

The Chinook record store has just signed a deal with a new record label, and you've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce. The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.

In [ ]:
%%sql
WITH usa AS
    (
    SELECT *
    FROM customer c
    INNER JOIN invoice i ON i.customer_id = c.customer_id
    INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
    WHERE country = 'USA'
    )
    
SELECT 
    g.name AS genre,
    COUNT(usa.invoice_line_id) AS total_tracks,
    CAST(COUNT(usa.invoice_line_id) as float) / (SELECT COUNT(*) FROM usa) * 100 percentage
FROM usa
INNER JOIN track t ON t.track_id = usa.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC;
Done.
Out[ ]:
genre total_tracks percentage
Rock 561 53.37773549000951
Alternative & Punk 130 12.369172216936251
Metal 124 11.798287345385347
R&B/Soul 53 5.042816365366318
Blues 36 3.425309229305423
Alternative 35 3.3301617507136063
Latin 22 2.093244529019981
Pop 22 2.093244529019981
Hip Hop/Rap 20 1.9029495718363463
Jazz 14 1.3320647002854424
Easy Listening 13 1.236917221693625
Reggae 6 0.570884871550904
Electronica/Dance 5 0.47573739295908657
Classical 4 0.3805899143672693
Heavy Metal 3 0.285442435775452
Soundtrack 2 0.19029495718363465
TV Shows 1 0.09514747859181732

The table above shows Chinook's best selling genres in the US. The genres of the potential new artists are Hip Hop, Punk, Pop, and Blues. Since this data shows that Punk, Blues, and Pop are the 3 best selling of these 4 genres, Chinook may want to purchase the albums from Red Tone (Punk), Meteor and the Girls (Pop), and Slim Jim Bites (Blues). It is worth noting that there was very little difference in the popularity of Pop vs. Hip Hop (.15%), and each of these genres made up about 2% of total sales. Together, these 4 genres made up only about 20% of total US sales. However, Rock is by far the best selling genre, accounting for 53% of sales alone, so if a Rock album became available in the future Chinook may want to consider purchasing it.

Analyzing Employee Sales Performance

In [ ]:
%%sql
WITH total_per_agent AS
    (
    SELECT 
        c.customer_id,
        c.support_rep_id,
        SUM(i.total) agent_total
    FROM customer c
    LEFT JOIN invoice i ON i.customer_id = c.customer_id
    GROUP BY 2
    )

SELECT 
    e.first_name || ' ' || e.last_name AS employee_name,
    ROUND(t.agent_total, 2) AS total_sales,
    e.title AS employee_role,
    e.hire_date AS hire_date
FROM employee e
LEFT JOIN total_per_agent t ON t.support_rep_id = e.employee_id
ORDER BY 2 DESC;
Done.
Out[ ]:
employee_name total_sales employee_role hire_date
Jane Peacock 1731.51 Sales Support Agent 2017-04-01 00:00:00
Margaret Park 1584.0 Sales Support Agent 2017-05-03 00:00:00
Steve Johnson 1393.92 Sales Support Agent 2017-10-17 00:00:00
Andrew Adams None General Manager 2016-08-14 00:00:00
Nancy Edwards None Sales Manager 2016-05-01 00:00:00
Michael Mitchell None IT Manager 2016-10-17 00:00:00
Robert King None IT Staff 2017-01-02 00:00:00
Laura Callahan None IT Staff 2017-03-04 00:00:00

Of Chinook's 8 employees, 3 are Sales Support Agents. The difference in sales between the employees may be a result of their hire date - the employee with the earliest hire date has the most sales, and the employee with the most recent hire date has the least sales, with a total difference of about 20%.

Analyzing Sales by Country

In [ ]:
%%sql

WITH country_or_other AS 
    (
    SELECT 
        CASE
            WHEN (SELECT COUNT(*) 
                  FROM customer
                  WHERE country = c.country) = 1
            THEN 'Other'
            ELSE c.country
        END AS country,
        c.customer_id,
        il.*
    FROM invoice_line il
    INNER JOIN invoice i ON i.invoice_id = il.invoice_id
    INNER JOIN customer c on c.customer_id = i.customer_id
    )
    
SELECT 
    country,
    num_cust,
    total_sales,
    avg_total_sales_per_cust,
    avg_order_val
FROM
    (SELECT
        country,
        COUNT(DISTINCT customer_id) num_cust,
        ROUND(SUM(unit_price), 2) total_sales,
        ROUND(CAST(SUM(unit_price) as float) / CAST (COUNT(DISTINCT customer_id) as float), 2) avg_total_sales_per_cust,
        ROUND(CAST(SUM(unit_price) as float) / CAST(COUNT(DISTINCT invoice_id) as float), 2) avg_order_val,
        CASE
            WHEN country = 'Other' THEN 1
            ELSE 0
        END AS sort
    FROM country_or_other
    GROUP BY 1
    ORDER BY sort, 5 DESC);
Done.
Out[ ]:
country num_cust total_sales avg_total_sales_per_cust avg_order_val
Czech Republic 2 273.24 136.62 9.11
United Kingdom 3 245.52 81.84 8.77
India 2 183.15 91.58 8.72
Germany 4 334.62 83.66 8.16
USA 13 1040.49 80.04 7.94
France 5 389.07 77.81 7.78
Canada 8 535.59 66.95 7.05
Brazil 5 427.68 85.54 7.01
Portugal 2 185.13 92.57 6.38
Other 15 1094.94 73.0 7.45

The largest portion of Chinook's customers and total sales is concentrated in the North America region, specifically the USA and Canada. Since this market accounts for a large volume of sales, it may be important to consider their purchasing patterns when looking at new artists or albums.

However, other specific countries may offer good business opportunites. The Czech Republic, Portugal, and India have the highest average lifetime values per customer. The Czech Republic, UK, India, and Germany have the highest average order value. These observations are based on limited data, as each of these countries represents only 2-4 total customers.

Albums vs. Individual Tracks

In [ ]:
%%sql
WITH invoice_info AS 
    (
    SELECT 
        il.invoice_id,
        t.album_id,
        COUNT(il.track_id) num_tracks
    FROM invoice_line il
    LEFT JOIN track t on t.track_id = il.track_id
    GROUP BY invoice_id, album_id
    ),
    
album_info AS
    (
    SELECT 
        album_id,
        COUNT(track_id) num_tracks
    FROM track
    GROUP BY 1
    )
    
SELECT 
    CASE
        WHEN (SELECT num_tracks FROM album_info WHERE album_id = iv.album_id) = iv.num_tracks AND al.num_tracks > 2
        THEN 'Album'
        ELSE 'Not album'
        END AS album_purchase,
    COUNT(DISTINCT invoice_id) num_invoices,
    ROUND(CAST(COUNT(DISTINCT invoice_id) as float) / (SELECT COUNT(DISTINCT invoice_id) FROM invoice_info) * 100, 2) percent_invoices
FROM invoice_info iv
LEFT JOIN album_info al on al.album_id = iv.album_id
GROUP BY album_purchase
LIMIT 20;
Done.
Out[ ]:
album_purchase num_invoices percent_invoices
Album 111 18.08
Not album 503 81.92

Purchases of full albums account for about 20% of total sales. Based on this, I would recommend to continue selling full albums, as this accounts for 1/5 of total purchases. Removing whole albums might signficantly cut revene as a result.

Additional Questions

Which artist is used most in playlists?

In [ ]:
%%sql
SELECT 
    a.name artist_name,
    COUNT(a.name) num_tracks,
    g.name genre
FROM artist a
LEFT JOIN album al on al.artist_id = a.artist_id
LEFT JOIN track t on t.album_id = al.album_id
LEFT JOIN playlist_track pt on pt.track_id = t.track_id
LEFT JOIN genre g on g.genre_id = t.genre_id
GROUP BY artist_name
ORDER BY 2 DESC
LIMIT 10;
Done.
Out[ ]:
artist_name num_tracks genre
Iron Maiden 516 Rock
U2 333 Pop
Metallica 296 Metal
Led Zeppelin 252 Rock
Deep Purple 226 Rock
Lost 184 Drama
Pearl Jam 177 Rock
Eric Clapton 145 Latin
Faith No More 145 Alternative & Punk
Lenny Kravitz 143 Metal
In [ ]:
%%sql
SELECT 
    a.name artist_name,
    COUNT(DISTINCT pt.playlist_id) num_playlists,
    g.name genre,
    COUNT(pt.track_id) num_tracks
FROM artist a
LEFT JOIN album al on al.artist_id = a.artist_id
LEFT JOIN track t on t.album_id = al.album_id
LEFT JOIN playlist_track pt on pt.track_id = t.track_id
LEFT JOIN genre g on g.genre_id = t.genre_id
GROUP BY artist_name
ORDER BY 2 DESC
LIMIT 10;
Done.
Out[ ]:
artist_name num_playlists genre num_tracks
Eugene Ormandy 7 Classical 15
Academy of St. Martin in the Fields & Sir Neville Marriner 6 Classical 10
Berliner Philharmoniker & Herbert Von Karajan 6 Classical 13
English Concert & Trevor Pinnock 6 Classical 9
The King's Singers 6 Classical 10
Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart 5 Classical 5
Adrian Leaper & Doreen de Feis 5 Classical 5
Alberto Turco & Nova Schola Gregoriana 5 Classical 5
Antal Doráti & London Symphony Orchestra 5 Classical 5
Barry Wordsworth & BBC Concert Orchestra 5 Classical 5
In [ ]:
%%sql
SELECT 
    pl.playlist_id,
    pl.name playlist_name,
    ar.name artist,
    COUNT(pt.track_id) num_tracks
FROM playlist pl
LEFT JOIN playlist_track pt on pt.playlist_id = pl.playlist_id
LEFT JOIN track t on t.track_id = pt.track_id
LEFT JOIN album a on a.album_id = t.album_id
LEFT JOIN artist ar on ar.artist_id = a.artist_id
WHERE ar.name = 'Iron Maiden'
GROUP BY pt.playlist_id;
Done.
Out[ ]:
playlist_id playlist_name artist num_tracks
1 Music Iron Maiden 213
5 90’s Music Iron Maiden 84
8 Music Iron Maiden 213
17 Heavy Metal Classic Iron Maiden 6
In [ ]:
%%sql
SELECT 
    pl.playlist_id,
    pl.name,
    COUNT(pt.track_id)
FROM playlist pl
LEFT JOIN playlist_track pt on pt.playlist_id = pl.playlist_id
GROUP BY pl.playlist_id
Done.
Out[ ]:
playlist_id name COUNT(pt.track_id)
1 Music 3290
2 Movies 0
3 TV Shows 213
4 Audiobooks 0
5 90’s Music 1477
6 Audiobooks 0
7 Movies 0
8 Music 3290
9 Music Videos 1
10 TV Shows 213
11 Brazilian Music 39
12 Classical 75
13 Classical 101 - Deep Cuts 25
14 Classical 101 - Next Steps 25
15 Classical 101 - The Basics 25
16 Grunge 15
17 Heavy Metal Classic 26
18 On-The-Go 1 1

We can measure the most popular artists in playlists in different ways. First, we can look at which artist's tracks appear most frequently. We can also consider the number of playlists an artist appears in.

The most frequently appearing artist in playlists is Iron Maiden, whose tracks appeared 516 times. They are followed by U2, Metallica, Led Zeppelin, and Deep Purple. Of the top 10 most popular playlist artists, 4 are Rock and 2 are Metal; the other 6 are from a range of different genres. However, if we look at all the playlists in the Chinook database, we see that there are only 14 playlists with tracks added to them; their sizes range from 1 track to over 3,000 tracks. As a result, those large numbers of tracks may only be coming from several specific playlists. For example, Iron Maiden appears only on 4 playlists, including 213 tracks on two of those playlists. This may respresent big Iron Maiden fans who are adding all available Iron Maiden tracks to their playlist.

If we measure artist popularity in playlists by the number of playlists an artist appears on, we get very different results. The top 10 artists who appeared in the largest number of playlists were all classical music, appearing on 5-7 playlists, with 5-15 tracks each. Four playlists have "Classical" in their title. There may be demand for general classical music playlists, as opposed to fans of a specific classical artist.

How many tracks have been purchased vs not purchased?

In [ ]:
%%sql
WITH no_purchased AS
    (
    SELECT 
        track_id,
        COUNT(track_id) no_tracks
    FROM invoice_line
    GROUP BY 1
    )
    
SELECT
    (SELECT COUNT(*) FROM no_purchased) tracks_purchased,
    (SELECT COUNT(*) FROM track) - (SELECT COUNT(*) FROM no_purchased) as tracks_not_purchased,
    ROUND(CAST((SELECT COUNT(*) FROM no_purchased) as float) / CAST((SELECT COUNT(*) FROM track) as float) * 100, 2) percent_purchased,
    ROUND(CAST((SELECT COUNT(*) FROM track) - (SELECT COUNT(*) FROM no_purchased) as float) / CAST((SELECT COUNT(*) FROM track) as float) * 100, 2) percent_not_purchased
FROM track t
LEFT JOIN no_purchased np on np.track_id = t.track_id
GROUP BY 1;
Done.
Out[ ]:
tracks_purchased tracks_not_purchased percent_purchased percent_not_purchased
1806 1697 51.56 48.44
In [ ]:
%%sql
SELECT
    COUNT(*)
FROM track
Done.
Out[ ]:
COUNT(*)
3503

Out of 3503 tracks available in Chinook's store, 1806 have been purchased, and 1697 have not been. Almost half of the tracks available have yet to be purchased by any customer. Chinook may want to remove some of these tracks from its inventory.

What tracks, albums, and artists are most popular?

In [ ]:
%%sql
SELECT 
    t.name track_name,
    ar.name artist,
    COUNT(il.track_id) number_sold,
    g.name genre
FROM track t
LEFT JOIN invoice_line il ON il.track_id = t.track_id
LEFT JOIN album al on al.album_id = t.album_id
LEFT JOIN artist ar on ar.artist_id = al.artist_id
LEFT JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 3 DESC
LIMIT 10
Done.
Out[ ]:
track_name artist number_sold genre
War Pigs Cake 33 Alternative
Are You Experienced? Jimi Hendrix 14 Rock
Changes Godsmack 14 Metal
Highway Chile Jimi Hendrix 14 Rock
Hey Joe Jimi Hendrix 13 Rock
Put The Finger On You AC/DC 13 Rock
Third Stone From The Sun Jimi Hendrix 13 Rock
Drain You Nirvana 12 Rock
Love Or Confusion Jimi Hendrix 12 Rock
Radio/Video System Of A Down 12 Metal

The best selling track is War Pigs by Cake, with 33 having been sold. The rest of the ten best selling tracks sold between 12 and 14 copies. 5 of the 10 best selling tracks are by Jimi Hendrix. 7 out of 10 are Rock, 2 are Metal, and 1 is Alternative. Next, we will look at the best selling artists and albums.

In [ ]:
%%sql
CREATE VIEW track_info AS
    SELECT 
        il.track_id,
        t.name track,
        ar.name artist,
        al.title album,
        g.name genre,
        t.media_type_id
    FROM invoice_line il
    LEFT JOIN track t on t.track_id = il.track_id
    LEFT JOIN album al on al.album_id = t.album_id
    LEFT JOIN artist ar on ar.artist_id = al.artist_id
    LEFT JOIN genre g on g.genre_id = t.genre_id
(sqlite3.OperationalError) table track_info already exists
[SQL: CREATE VIEW track_info AS
    SELECT 
        il.track_id,
        t.name track,
        ar.name artist,
        al.title album,
        g.name genre,
        t.media_type_id
    FROM invoice_line il
    LEFT JOIN track t on t.track_id = il.track_id
    LEFT JOIN album al on al.album_id = t.album_id
    LEFT JOIN artist ar on ar.artist_id = al.artist_id
    LEFT JOIN genre g on g.genre_id = t.genre_id]
(Background on this error at: http://sqlalche.me/e/e3q8)
In [ ]:
%%sql
SELECT 
    artist,
    COUNT(track_id) tracks_sold,
    ROUND(CAST(COUNT(track_id) as float) / (SELECT COUNT(track_id) FROM track_info) * 100, 2) percentage_total_sales,
    genre
FROM track_info
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
Done.
Out[ ]:
artist tracks_sold percentage_total_sales genre
Queen 192 4.04 Rock
Jimi Hendrix 187 3.93 Rock
Nirvana 130 2.73 Rock
Red Hot Chili Peppers 130 2.73 Rock
Pearl Jam 129 2.71 Rock
AC/DC 124 2.61 Rock
Guns N' Roses 124 2.61 Metal
Foo Fighters 121 2.54 Rock
The Rolling Stones 117 2.46 Rock
Metallica 106 2.23 Metal

The table above shows the top 10 best selling artists at Chinook. 3 of these artists also have top-10 selling tracks at Chinook: Jimi Hendrix, Nirvana, and AC/DC. Each of these top-10 artists represents 2-4% of total track purchases at the store. 8 out of 10 of these artists are from the Rock genre; the remaining 2 are Metal. Chinook may want to add more tracks by these top artists to the store, as a number of customers are fans of these artists and would likely by more of their music.

In [ ]:
%%sql
SELECT 
    album, 
    artist,
    COUNT(track_id) tracks_sold,
    ROUND(CAST(COUNT(track_id) as float) / (SELECT COUNT(track_id) FROM track_info) * 100, 2) percentage_total_sales,
    genre
FROM track_info
GROUP BY 1
ORDER BY 3 DESC
LIMIT 10
Done.
Out[ ]:
album artist tracks_sold percentage_total_sales genre
Are You Experienced? Jimi Hendrix 187 3.93 Rock
Faceless Godsmack 96 2.02 Metal
Mezmerize System Of A Down 93 1.96 Metal
Get Born JET 90 1.89 Alternative & Punk
The Doors The Doors 83 1.74 Rock
Big Ones Aerosmith 80 1.68 Rock
Greatest Hits I Queen 80 1.68 Rock
The Police Greatest Hits The Police 80 1.68 Rock
From The Muddy Banks Of The Wishkah [live] Nirvana 78 1.64 Rock
My Generation - The Very Best Of The Who The Who 76 1.6 Rock

The table above shows the top selling albums, measured by total number of tracks sold from the album.

Once again, Jimi Hendrix is at the top of the list. He is Chinook's overall best selling artist, so opportunities to sell more of his music will likely be good for the business.

Rock, followed by Metal, once again dominate this list. As the top songs, albums, and artist are Rock musicians, followed by Metal musicians, Chinook may want to consider offering more music from these genres.

Is the range of tracks in the store reflective of their sales popularity?

In [ ]:
%%sql
SELECT
    g.name genre,
    COUNT(DISTINCT t.track_id) tracks_available,
    ROUND(CAST(COUNT(DISTINCT t.track_id) as float) / CAST((SELECT COUNT(*) FROM track) as float) * 100, 2) percentage_tracks_avail,
    COUNT(DISTINCT il.track_id) track_has_sold,
    ROUND(CAST(COUNT(DISTINCT il.track_id) as float) / CAST(COUNT(DISTINCT t.track_id) as float) * 100, 2) as percent_has_sold,
    COUNT(il.track_id) number_sold,
    ROUND(CAST(COUNT(il.track_id) as float) / CAST((SELECT COUNT(track_id) FROM invoice_line) as float) * 100, 2) percentage_total_sales
FROM track t
LEFT JOIN genre g ON g.genre_id = t.genre_id
LEFT JOIN invoice_line il ON il.track_id = t.track_id
GROUP BY g.name
ORDER BY 6 DESC
Done.
Out[ ]:
genre tracks_available percentage_tracks_avail track_has_sold percent_has_sold number_sold percentage_total_sales
Rock 1297 37.03 915 70.55 2635 55.39
Metal 374 10.68 238 63.64 619 13.01
Alternative & Punk 332 9.48 176 53.01 492 10.34
Latin 579 16.53 119 20.55 167 3.51
R&B/Soul 61 1.74 55 90.16 159 3.34
Blues 81 2.31 56 69.14 124 2.61
Jazz 130 3.71 61 46.92 121 2.54
Alternative 40 1.14 34 85.0 117 2.46
Easy Listening 24 0.69 24 100.0 74 1.56
Pop 48 1.37 25 52.08 63 1.32
Electronica/Dance 30 0.86 29 96.67 55 1.16
Classical 74 2.11 16 21.62 47 0.99
Reggae 58 1.66 22 37.93 35 0.74
Hip Hop/Rap 35 1.0 21 60.0 33 0.69
Heavy Metal 28 0.8 7 25.0 8 0.17
Soundtrack 43 1.23 5 11.63 5 0.11
TV Shows 93 2.65 2 2.15 2 0.04
Drama 64 1.83 1 1.56 1 0.02
Bossa Nova 15 0.43 0 0.0 0 0.0
Comedy 17 0.49 0 0.0 0 0.0
Opera 1 0.03 0 0.0 0 0.0
Rock And Roll 12 0.34 0 0.0 0 0.0
Sci Fi & Fantasy 26 0.74 0 0.0 0 0.0
Science Fiction 13 0.37 0 0.0 0 0.0
World 28 0.8 0 0.0 0 0.0

In general, the range of tracks available for sale was reflective of that genre's popularity.

For example, the 3 top selling genres had a larger variety of tracks available than other genres

  • Rock - 55% of total sales, 1297 tracks available (37% of tracks available for purchase)
  • Metal - 13% of total sales, 374 tracks available (10% of tracks available for purchase)
  • Alternative & Punk - 10% of total sales, 332 tracks available (9% of tracks available for purchase) Within the Rock genre, 70% of tracks available have sold. The same is true of 63% of Metal tracks and 53% of Alternative & Punk tracks. Rock was by far the most represented genre in the top ten selling artists, albums, and tracks, followed by metal. Chinook may want to continue to expand its music offerings in Rock and possibly Metal, or advertize its current catalog in these genres, as these are the best selling genres in the store. A relatively high percentage of Rock and Metal tracks sell, so additional purchases in these areas may be low-risk opportunities to increase business.

One noteable exception to this is the Latin genre, which represented only 3% of total sales but had 579 tracks available (16% of total tracks available for purchase). Moreover, only 20% of Latin tracks available at Chinook have ever been sold. As a result, Chinook may want to consider dropping some of these tracks from its collection.

There are 7 genres that have not sold any tracks. 4 additional genres account for less than .5% of sales each. Chinook may want to consider eliminating these offerings as well.

There are 4 genres where Chinook may want to experiment with expanding its offerings: Easy Listening, Electronic/Dance, R&B/Soul, and Alternative. Each of these genres currently accounts for a small percentage of total offerings (.69-1.74%) and sales (1-3% each). However, these genres are selling a disproportionately high number and percentage of their tracks as compared to other offerings in the store. Chinook could experiment with a small expansion of offerings and/or advertising in these genres.

Do protected vs non-protected media types have an effect on popularity?

In [ ]:
%%sql
WITH media_info AS
    (
    SELECT
        COUNT(DISTINCT track_id) num_tracks,
        t.media_type_id,
        mt.name media_type,
        t.track_id
    FROM track t
    LEFT JOIN media_type mt on mt.media_type_id = t.media_type_id
    GROUP BY 2
    )

SELECT 
    mi.media_type media_type,
    COUNT(ti.track_id) tracks_sold,
    ROUND(CAST(COUNT(ti.track_id) as float) / (SELECT COUNT(*) FROM track_info) * 100, 2) percentage_total_sold,
    ROUND(CAST(mi.num_tracks as float) / (SELECT COUNT(*) FROM track) * 100, 2) percentage_available_tracks, 
    CASE
        WHEN mi.media_type LIKE '%Protected%'
            THEN 'Protected'
        ELSE 'Non-Protected'
        END AS protected
FROM track_info ti
LEFT JOIN media_info mi on ti.media_type_id = mi.media_type_id
LEFT JOIN track t on t.track_id = ti.track_id
GROUP BY 1
ORDER BY 2 DESC
Done.
Out[ ]:
media_type tracks_sold percentage_total_sold percentage_available_tracks protected
MPEG audio file 4259 89.53 86.61 Non-Protected
Protected AAC audio file 439 9.23 6.77 Protected
Purchased AAC audio file 35 0.74 0.2 Non-Protected
AAC audio file 21 0.44 0.31 Non-Protected
Protected MPEG-4 video file 3 0.06 6.11 Protected

Out of the 5 media types available, MPEG is by far the most popular, representing 89% of tracks sold and 86% of tracks available for purchase. This file format is non-protected. About 90% of tracks sold were non-protected, compared with about 87% of tracks available for purchase. Looking at this data, it is unclear WHY this format is by far the most popular. It could be because MPEG is the most common file type for sale at Chinook or because customers specifically like MPEG files. Since the other non-protected file formats made up only a tiny portion of Chinook's sales, customers may not care if a file is protected or not when purchasing music.

Project Summary

In this project, we analyzed sales data from the Chinook music store's database, using SQL queries.

Some specific business questions we answered include:

  • What new albums should Chinook purchase, based on sales data for different genres of music?
  • What artists, tracks, and albums are most popular?
  • Is the range of tracks for sale reflective of their popularity? Are there areas where Chinook should add to or reduce its offerings?
  • What does sales data tell us about the performance of different employees in the sales department?
  • Should Chinook continue selling full albums, or just sell select tracks from albums?
  • What can we learn from comparing sales data across different countries? Where is the highest volume of sales, and average sales per customer? Are there areas where Chinook could expand its business?

How this project helped me grow my skills:

I gained a lot of experience writing and tweaking SQL queries. I felt my fluency with SQL improving as a moved through this project, especially as I was able to ask and answer some of my own questions on this data. I learned how to construct more complex queries peice by peice, as an iterative process. I am now more confident in understanding exactly what a query will produce, and better able to tell if it is producing what I intended vs. an unreasonable response. I am beginning to understand how and when to use WITH, VIEW and CASE better, including to loop through a dataset looking for matches to a condition. Next, I'd like to continue to gain more fluency with SQL, especially with writing more complex queries and WITH, VIEW and CASE. I'd also like to gain confidence using SQL with a larger dataset.