1 Introduction and Preparing Jupyter Environment for SQL

In [37]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db
Out[37]:
'Connected: [email protected]'

Overview of our data:

  • We set out to explore out date with some SQL commands to show us the table names and the content of some of the tables:
In [38]:
%%sql
SELECT
    name,
    type
FROM sqlite_master
WHERE type in ("table", "view");
Done.
Out[38]:
name type
album table
artist table
customer table
employee table
genre table
invoice table
invoice_line table
media_type table
playlist table
playlist_track table
track table
In [39]:
%%sql
SELECT * FROM customer
LIMIT 5;
Done.
Out[39]:
customer_id first_name last_name company address city state country postal_code phone fax email support_rep_id
1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 [email protected] 3
2 Leonie Köhler None Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 +49 0711 2842222 None [email protected] 5
3 François Tremblay None 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 None [email protected] 3
4 Bjørn Hansen None Ullevålsveien 14 Oslo None Norway 0171 +47 22 44 22 22 None [email protected] 4
5 František Wichterlová JetBrains s.r.o. Klanova 9/506 Prague None Czech Republic 14700 +420 2 4172 5555 +420 2 4172 5555 [email protected] 4
In [40]:
%%sql
SELECT *
FROM invoice
LIMIT 5;
Done.
Out[40]:
invoice_id customer_id invoice_date billing_address billing_city billing_state billing_country billing_postal_code total
1 18 2017-01-03 00:00:00 627 Broadway New York NY USA 10012-2612 15.84
2 30 2017-01-03 00:00:00 230 Elgin Street Ottawa ON Canada K2P 1L7 9.9
3 40 2017-01-05 00:00:00 8, Rue Hanovre Paris None France 75002 1.98
4 18 2017-01-06 00:00:00 627 Broadway New York NY USA 10012-2612 7.92
5 27 2017-01-07 00:00:00 1033 N Park Ave Tucson AZ USA 85719 16.83

3 Selecting Albums to Add to our Inventory:

  • We need to check out database for records on which genres are best selling from among our inventory.
In [41]:
%%sql
WITH total_sales_usa AS 
    (
    SELECT SUM(quantity) 
    FROM invoice_line
    INNER JOIN invoice ON invoice_line.invoice_id = invoice.invoice_id
    WHERE invoice.billing_country = "USA"
    )

SELECT g.name, SUM(il.quantity) total_sales, (CAST(SUM(il.quantity) AS FLOAT)/(SELECT * FROM total_sales_usa)) percentage_sales
FROM invoice i
INNER JOIN invoice_line il ON i.invoice_id = il.invoice_id
INNER JOIN track t ON il.track_id = t.track_id
INNER JOIN genre g ON t.genre_id = g.genre_id
WHERE i.billing_country = "USA"
GROUP BY g.genre_id
ORDER BY 2 DESC;
Done.
Out[41]:
name total_sales percentage_sales
Rock 561 0.5337773549000951
Alternative & Punk 130 0.12369172216936251
Metal 124 0.11798287345385347
R&B/Soul 53 0.05042816365366318
Blues 36 0.03425309229305423
Alternative 35 0.03330161750713606
Latin 22 0.02093244529019981
Pop 22 0.02093244529019981
Hip Hop/Rap 20 0.019029495718363463
Jazz 14 0.013320647002854425
Easy Listening 13 0.012369172216936251
Reggae 6 0.005708848715509039
Electronica/Dance 5 0.004757373929590866
Classical 4 0.003805899143672693
Heavy Metal 3 0.0028544243577545195
Soundtrack 2 0.0019029495718363464
TV Shows 1 0.0009514747859181732
  • We can see that of the 4 genres of albums/artists that we are considering, the 3 genres with the highest sales percentages in our inventory are:

2nd) Alternative and Punk

5th) Blues

8th) Pop

  • based on our findings, we recommend adding the following artists to our inventory:

1) Red Tone

2) Slim Jim Bites

3) Meteor and the Girls

4 Analyzing Employee Sales Performance

  • We would like to check sales employee performance:
  • We start by checking the distinct employee types in our employee table:
In [42]:
%%sql
SELECT DISTINCT title
FROM employee;
Done.
Out[42]:
title
General Manager
Sales Manager
Sales Support Agent
IT Manager
IT Staff
In [43]:
%%sql
SELECT e.employee_id employee_id, 
        e.first_name || " " || e.last_name employee_name,
        e.hire_date employee_hire_date,
        e.country employee_country,
        TOTAL(i.total) total_sales
FROM employee e 
LEFT JOIN customer c ON e.employee_id = c.support_rep_id
INNER JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 5 DESC;
Done.
Out[43]:
employee_id employee_name employee_hire_date employee_country total_sales
3 Jane Peacock 2017-04-01 00:00:00 Canada 1731.510000000004
4 Margaret Park 2017-05-03 00:00:00 Canada 1584.0000000000032
5 Steve Johnson 2017-10-17 00:00:00 Canada 1393.9200000000028
  • We can see that Jane had the highest sales, but she is also the oldest employee

  • We can also see that Steve has a sales amount close to Margaret despite the fact that he was hired 5 months after her.

5 Analyzing Sales By Country

In [44]:
%%sql
WITH sales_by_country AS
(
SELECT c.country country,
        COUNT(DISTINCT(c.customer_id)) total_customers,
        TOTAL(i.total) total_sales,
        (TOTAL(i.total)/COUNT(DISTINCT(c.customer_id))) average_sales_pc,
        AVG(i.total) average_order_value,
        0 country_or_other
FROM customer c
INNER JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY c.country
HAVING total_customers > 1

UNION

SELECT country,
        SUM(total_customers) total_customers,
        TOTAL(total_sales) total_sales,
        TOTAL(total_sales)/SUM(total_customers) average_sales_pc,
        AVG(average_order_value) average_order_value,
        country_or_other
FROM( 
    SELECT "Other" country,
        COUNT(DISTINCT(c.customer_id)) total_customers,
        TOTAL(i.total) total_sales,
        (TOTAL(i.total)/COUNT(DISTINCT(c.customer_id))) average_sales_pc,
        AVG(i.total) average_order_value,
        1 country_or_other
    FROM customer c
    INNER JOIN invoice i ON c.customer_id = i.customer_id
    GROUP BY c.country
    HAVING total_customers = 1
    )
GROUP BY country
ORDER BY country_or_other ASC, total_sales DESC
)

SELECT country,
        total_customers,
        total_sales,
        average_sales_pc,
        average_order_value
FROM sales_by_country;
Done.
Out[44]:
country total_customers total_sales average_sales_pc average_order_value
USA 13 1040.4899999999998 80.0376923076923 7.942671755725189
Canada 8 535.5900000000001 66.94875000000002 7.047236842105265
Brazil 5 427.68000000000006 85.53600000000002 7.0111475409836075
France 5 389.0699999999999 77.81399999999998 7.781399999999998
Germany 4 334.62 83.655 8.161463414634147
Czech Republic 2 273.24000000000007 136.62000000000003 9.108000000000002
United Kingdom 3 245.52 81.84 8.768571428571429
Portugal 2 185.13000000000002 92.56500000000001 6.383793103448276
India 2 183.14999999999998 91.57499999999999 8.72142857142857
Other 15 1094.9399999999998 72.996 7.445071062271063

6 Albums Vs Individual Tracks

  • We need to determine how many invoices are for complete albums and their percentage out of the complete amount of purchases:
In [45]:
%%sql

WITH 

album_distinct_tracks AS 
(
SELECT a.album_id album_id, 
        a.title title, 
        COUNT(DISTINCT(t.track_id)) album_track_num
FROM album a
INNER JOIN track t ON a.album_id = t.album_id
GROUP BY 1
),

album_purchase_invoice AS 
(
SELECT il.invoice_id invoice_id, 
        COUNT(DISTINCT(t.album_id)) invoice_album_count,
        COUNT(DISTINCT(t.track_id)) invoice_track_count,
        adt.album_track_num,
        t.album_id album_id,
        a.title album_title
FROM invoice_line il
INNER JOIN track t ON il.track_id = t.track_id
INNER JOIN album a ON t.album_id = a.album_id
INNER JOIN album_distinct_tracks adt ON adt.album_id = a.album_id
GROUP BY 1
HAVING ((invoice_album_count = 1) AND (invoice_track_count = adt.album_track_num))
),

total_invoice_count AS
(
SELECT COUNT(i.invoice_id)
FROM invoice i
),

album_invoice_count AS
(
SELECT COUNT(api.invoice_id)
FROM album_purchase_invoice api
),

non_album_invoice_count AS
(
SELECT album_invoice_count.*
FROM album_invoice_count
)

SELECT COUNT(api.invoice_id) album_invoices,
        CAST(COUNT(api.invoice_id) AS FLOAT)/(SELECT * FROM total_invoice_count) percentage_album_invoices,
        (SELECT * FROM total_invoice_count) - COUNT(api.invoice_id) non_album_invoices,
        (SELECT * FROM total_invoice_count) total_invoice_count,
        (((SELECT * FROM total_invoice_count) - COUNT(api.invoice_id))/CAST((SELECT * FROM total_invoice_count) AS FLOAT)) percentage_non_album_invoices
FROM album_purchase_invoice api;
Done.
Out[45]:
album_invoices percentage_album_invoices non_album_invoices total_invoice_count percentage_non_album_invoices
114 0.18566775244299674 500 614 0.8143322475570033
  • The percentage of Invoices where albums where pruchases was roughly 18.6%. We can recommend that they do not end their policy of purchasing full albums from the record companies as it could cause them to lose this percentage of their revenue.