The dataset contains sample data of credit and debit/checking transactions, income and demographic details of UK banking consumers. All the details are at a transaction level. How do I get the data? Snowflake Data Marketplace. It provides you with free or paid datasets that you can use for your projects.
Columns | Descriptions |
---|---|
TRANSACTION_REFERENCE | The transaction identifier for each transaction made by consumer |
USER_REFERENCE | The user identifier of the consumer |
AGE_BAND | The consumer age range |
SALARY_BAND | The consumer salary range. |
POSTCODE | The postcode of where the consumer lives. |
LSOA | Geographical hierarchy: small areas that has similar population size (average of approximately 1,500 residents or 650 households). |
MSOA | Geographical hierarchy: medium areas where the minimum population size is 5000 (average of 7200). |
DERIVED_GENDER | The consumer gender identity. |
TRANSACTION_DATE | The transaction date made by the consumer. |
ACCOUNT_REFERENCE | The consumer bank account identifier. |
PROVIDER_GROUP_NAME | The consumer's bank for executing his/her transactions. |
ACCOUNT_TYPE | The account type: current, savings, etc. |
CREDIT_DEBIT | Type of transaction made by consumer: debit or credit. |
AMOUNT | The amount of transaction. |
AUTO_PURPOSE_TAG_NAME | The transaction purpose. |
MERCHANT_NAME | The merchant's name. |
MERCHANT_BUSINESS_LINE | The merchant's business category. |
ACCOUNT_CREATED_DATE | The date of when the account first created. |
ACCOUNT_LAST_REFRESHED | The date of when the account last updated. |
1. Select the business process.
If we look at the dataset and think of credit-debit transaction in general, I can identify the following business process.
⮕ The user or cardholder perform credit/debit transaction --- inflow and outflow transaction
⮕ The bank will deliver the money (outflow) to merchant's account (in case of debit transaction) or save the money in the current/saving account (in case of credit transaction)
⮕ The merchant received the money (in case of debit transaction) or the cardholder received the deposits.
And we also can see that there are 3 participants in the above processes; user (accountholder), bank and merchant.
2. Declare the grain.
The desired level of granularity is at transaction level, which already provided by the dataset.
3. Identify the dimensions.
If we refer to the business process and its participants, there should be 4 kinds of information that adding context to the business quantitative measure, the transaction amount. They are:
In this step I have identified and created 5 dimension tables for users, accounts, merchants, transactions and dates.
%config SqlMagic.autopandas=True
%config SqlMagic.displaycon=False
%config SqlMagic.feedback=False
%config SqlMagic.displaylimit=5
%reload_ext sql
import boto3
import json
def get_secret(secret_name='wysde', region_name = "us-east-1"):
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name)
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
get_secret_value_response = json.loads(get_secret_value_response['SecretString'])
return get_secret_value_response
db_credentials = get_secret()
USERNAME = db_credentials["SNOWFLAKE_USERNAME"]
PASSWORD = db_credentials["SNOWFLAKE_PASSWORD"]
ACCOUNT = db_credentials["SNOWFLAKE_ACCOUNT"]
WAREHOUSE = db_credentials["SNOWFLAKE_WAREHOUSE"]
ROLE = db_credentials["SNOWFLAKE_ROLE"]
DBNAME = "SPARSH"
SCHEMA_NAME = "MARTS_CREDIT_DEBIT_TRANSACTION"
CONN = f"snowflake://{USERNAME}:{PASSWORD}@{ACCOUNT}/?warehouse={WAREHOUSE}&role={ROLE}"
%sql {CONN}
%%sql
USE {DBNAME}
status | |
---|---|
0 | Statement executed successfully. |
%%sql
CREATE SCHEMA IF NOT EXISTS {SCHEMA_NAME};
USE SCHEMA {SCHEMA_NAME};
status | |
---|---|
0 | Statement executed successfully. |
%%sql
CREATE OR REPLACE TABLE dim_users AS (
SELECT
DISTINCT user_reference AS user_id,
age_band,
salary_band,
postcode,
LSOA,
MSOA,
derived_gender AS gender
FROM UK_ROWLEVEL_TRANSACTION_DATA__SAMPLE.public_listing.transactions
);
CREATE OR REPLACE TABLE dim_accounts AS (
SELECT
DISTINCT account_reference AS account_id,
provider_group_name AS bank_name,
account_type,
account_created_date,
account_last_refreshed
FROM UK_ROWLEVEL_TRANSACTION_DATA__SAMPLE.public_listing.transactions
);
CREATE OR REPLACE TABLE dim_merchants AS (
SELECT
DISTINCT HASH(merchant_name, merchant_business_line)::VARCHAR AS merchant_id,
merchant_name,
merchant_business_line
FROM UK_ROWLEVEL_TRANSACTION_DATA__SAMPLE.public_listing.transactions
);
CREATE OR REPLACE TABLE dim_transactions AS (
SELECT
transaction_reference AS transaction_id,
credit_debit AS transaction_type,
auto_purpose_tag_name AS transaction_purpose
FROM UK_ROWLEVEL_TRANSACTION_DATA__SAMPLE.public_listing.transactions
);
CREATE OR REPLACE TABLE dim_dates AS (
SELECT
DISTINCT transaction_date,
DAY(transaction_date)::VARCHAR as day_of_month,
DAYNAME(transaction_date) as day_name,
MONTH(transaction_date)::VARCHAR as month_of_year,
MONTHNAME(transaction_date) as month_name,
YEAR(transaction_date)::VARCHAR as year
FROM UK_ROWLEVEL_TRANSACTION_DATA__SAMPLE.public_listing.transactions
);
CREATE OR REPLACE TABLE fct_transactions AS (
SELECT
transaction_date AS transaction_date,
transaction_reference AS transaction_id,
user_reference AS user_id,
account_reference AS account_id,
HASH(merchant_name, merchant_business_line)::VARCHAR AS merchant_id,
amount::NUMBER as amount
FROM UK_ROWLEVEL_TRANSACTION_DATA__SAMPLE.public_listing.transactions
);
status | |
---|---|
0 | Table FCT_TRANSACTIONS successfully created. |
%sql SELECT * FROM fct_transactions LIMIT 10
transaction_date | transaction_id | user_id | account_id | merchant_id | amount | |
---|---|---|---|---|---|---|
0 | 2022-08-04 | 61343940-11e3-4ef9-900d-78dee080a8792C94E69F96... | 59618cc99ed8c11e072c03f91696aff805f5d27d5d2b6e... | 61343940-11e3-4ef9-900d-78dee080a879 | -3563482234593244164 | 20 |
1 | 2022-09-28 | 61343940-11e3-4ef9-900d-78dee080a87936CAE325FD... | 59618cc99ed8c11e072c03f91696aff805f5d27d5d2b6e... | 61343940-11e3-4ef9-900d-78dee080a879 | 8475122763193443513 | 25 |
2 | 2022-07-04 | 61343940-11e3-4ef9-900d-78dee080a8795A5BA4AEC0... | 59618cc99ed8c11e072c03f91696aff805f5d27d5d2b6e... | 61343940-11e3-4ef9-900d-78dee080a879 | 6204893564810900869 | 24 |
3 | 2022-08-27 | 5819bc0b-d64e-4e7b-89b3-a0e1c5e224f9D9DADCB088... | c53091f1c07f415f4b824dc7857e2c71803d5d8c2984b2... | 5819bc0b-d64e-4e7b-89b3-a0e1c5e224f9 | 8475122763193443513 | 20 |
4 | 2022-08-30 | 61343940-11e3-4ef9-900d-78dee080a8797D0707EE51... | 59618cc99ed8c11e072c03f91696aff805f5d27d5d2b6e... | 61343940-11e3-4ef9-900d-78dee080a879 | -4671496543306980058 | 100 |
5 | 2022-09-08 | 5819bc0b-d64e-4e7b-89b3-a0e1c5e224f9A7AA1A7347... | c53091f1c07f415f4b824dc7857e2c71803d5d8c2984b2... | 5819bc0b-d64e-4e7b-89b3-a0e1c5e224f9 | 6204893564810900869 | 40 |
6 | 2022-07-29 | 61343940-11e3-4ef9-900d-78dee080a879CCBACAFB40... | 59618cc99ed8c11e072c03f91696aff805f5d27d5d2b6e... | 61343940-11e3-4ef9-900d-78dee080a879 | 6204893564810900869 | 50 |
7 | 2022-08-02 | 61343940-11e3-4ef9-900d-78dee080a8791A089E81C0... | 59618cc99ed8c11e072c03f91696aff805f5d27d5d2b6e... | 61343940-11e3-4ef9-900d-78dee080a879 | 6204893564810900869 | 800 |
8 | 2022-08-28 | 5819bc0b-d64e-4e7b-89b3-a0e1c5e224f98EC57DDC03... | c53091f1c07f415f4b824dc7857e2c71803d5d8c2984b2... | 5819bc0b-d64e-4e7b-89b3-a0e1c5e224f9 | 8475122763193443513 | 7 |
9 | 2022-10-03 | 9219ccc1-8148-486d-bff1-891693d2fafdE514B58207... | 1959eef564cbc0832dbd8a2680471d69f60214399ccf73... | 9219ccc1-8148-486d-bff1-891693d2fafd | 3098642881933701501 | 8 |