dbt (data build tool) is an open source Python package that enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. dbt allows you to build your data transformation pipeline using SQL queries. In this scenario, you will install the dbt Python package and create a sample dbt project. You will configure a database connection for dbt, write your first dbt models, and run a dbt pipeline. Finally, you will also gain insight into writing and running tests with dbt.
%%capture
!pip install psycopg2-binary
!pip install boto3
!pip install dbt
!pip install awscli
!apt-get --quiet install tree
!pip install ipython-sql
%reload_ext sql
# !mkdir -p ~/.aws
# %%writefile ~/.aws/credentials
# [default]
# aws_access_key_id=
# aws_secret_access_key=
# %%writefile ~/.aws/config
# [default]
# region=us-east-1
# output=json
!mkdir -p ~/.aws && cp /content/drive/MyDrive/AWS/d01_admin/* ~/.aws
!chmod 600 ~/.aws/credentials
import os
import boto3
import json
from sqlalchemy import create_engine
import pandas as pd
def get_secret(secret_name):
region_name = "us-east-1"
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name)
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
get_secret_value_response = json.loads(get_secret_value_response['SecretString'])
return get_secret_value_response
db_credentials = get_secret(secret_name='dev/detraining/rds')
USERNAME = db_credentials["sparsh_rds_postgres_username"]
PASSWORD = db_credentials["sparsh_rds_postgres_password"]
HOST = db_credentials["sparsh_rds_postgres_host"]
PORT = db_credentials["sparsh_rds_postgres_port"]
DBNAME = "postgres"
CONN = f"postgresql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DBNAME}"
In this step, we will download the data, load it into pandas dataframe and then load this data into postgres database.
!aws s3 ls s3://nyc-tlc/trip\ data/
# download the data
# data source - https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
!wget -q --show-progress https://s3.amazonaws.com/nyc-tlc/trip\ data/yellow_tripdata_2019-01.parquet
!wget -q --show-progress https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv
yellow_tripdata_201 100%[===================>] 105.32M 72.5MB/s in 1.5s taxi+_zone_lookup.c 100%[===================>] 12.03K --.-KB/s in 0s
# read the data into pandas dataframe
import pandas as pd
yellow_tripdata_2019_df = pd.read_parquet('yellow_tripdata_2019-01.parquet')
yellow_tripdata_2019_df.head()
VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2019-01-01 00:46:40 | 2019-01-01 00:53:20 | 1.0 | 1.5 | 1.0 | N | 151 | 239 | 1 | 7.0 | 0.5 | 0.5 | 1.65 | 0.0 | 0.3 | 9.95 | NaN | None |
1 | 1 | 2019-01-01 00:59:47 | 2019-01-01 01:18:59 | 1.0 | 2.6 | 1.0 | N | 239 | 246 | 1 | 14.0 | 0.5 | 0.5 | 1.00 | 0.0 | 0.3 | 16.30 | NaN | None |
2 | 2 | 2018-12-21 13:48:30 | 2018-12-21 13:52:40 | 3.0 | 0.0 | 1.0 | N | 236 | 236 | 1 | 4.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 5.80 | NaN | None |
3 | 2 | 2018-11-28 15:52:25 | 2018-11-28 15:55:45 | 5.0 | 0.0 | 1.0 | N | 193 | 193 | 2 | 3.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 7.55 | NaN | None |
4 | 2 | 2018-11-28 15:56:57 | 2018-11-28 15:58:33 | 5.0 | 0.0 | 2.0 | N | 193 | 193 | 2 | 52.0 | 0.0 | 0.5 | 0.00 | 0.0 | 0.3 | 55.55 | NaN | None |
# let's also see the schema
yellow_tripdata_2019_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7696617 entries, 0 to 7696616 Data columns (total 19 columns): # Column Dtype --- ------ ----- 0 VendorID int64 1 tpep_pickup_datetime datetime64[ns] 2 tpep_dropoff_datetime datetime64[ns] 3 passenger_count float64 4 trip_distance float64 5 RatecodeID float64 6 store_and_fwd_flag object 7 PULocationID int64 8 DOLocationID int64 9 payment_type int64 10 fare_amount float64 11 extra float64 12 mta_tax float64 13 tip_amount float64 14 tolls_amount float64 15 improvement_surcharge float64 16 total_amount float64 17 congestion_surcharge float64 18 airport_fee object dtypes: datetime64[ns](2), float64(11), int64(4), object(2) memory usage: 1.1+ GB
# select only few columns that we are interested in
yellow_tripdata_2019_df = yellow_tripdata_2019_df[['VendorID',
'tpep_pickup_datetime',
'tpep_dropoff_datetime',
'passenger_count',
'PULocationID',
'DOLocationID',
'fare_amount']]
yellow_tripdata_2019_df.columns = ['vendor_id',
'pickup_datetime',
'dropoff_datetime',
'passenger_count',
'pickup_location_id',
'dropoff_location_id',
'fare_amount']
# check the schema again
yellow_tripdata_2019_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7696617 entries, 0 to 7696616 Data columns (total 7 columns): # Column Dtype --- ------ ----- 0 vendor_id int64 1 pickup_datetime datetime64[ns] 2 dropoff_datetime datetime64[ns] 3 passenger_count float64 4 pickup_location_id int64 5 dropoff_location_id int64 6 fare_amount float64 dtypes: datetime64[ns](2), float64(2), int64(3) memory usage: 411.0 MB
# let's also load the taxi zone data into pandas dataframe
taxi_zone_lookup = pd.read_csv('taxi+_zone_lookup.csv')
taxi_zone_lookup.head()
LocationID | Borough | Zone | service_zone | |
---|---|---|---|---|
0 | 1 | EWR | Newark Airport | EWR |
1 | 2 | Queens | Jamaica Bay | Boro Zone |
2 | 3 | Bronx | Allerton/Pelham Gardens | Boro Zone |
3 | 4 | Manhattan | Alphabet City | Yellow Zone |
4 | 5 | Staten Island | Arden Heights | Boro Zone |
# rename the columns
taxi_zone_lookup.columns = ['locationid','borough','zone','service_zone']
# check the schema
taxi_zone_lookup.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 265 entries, 0 to 264 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 locationid 265 non-null int64 1 borough 265 non-null object 2 zone 264 non-null object 3 service_zone 263 non-null object dtypes: int64(1), object(3) memory usage: 8.4+ KB
# load the data into our postgres database
from sqlalchemy import create_engine
import psycopg2
alchemyEngine = create_engine(CONN, pool_recycle=3600);
postgreSQLConnection = alchemyEngine.connect();
yellow_tripdata_2019_df.sample(100000).to_sql('yellow_tripdata_sample_2019_01', postgreSQLConnection, if_exists='replace');
taxi_zone_lookup.to_sql('taxi_zone_lookup', postgreSQLConnection, if_exists='replace');
postgreSQLConnection.close();
# confirm if data is loaded
%sql {CONN}
%sql select * from yellow_tripdata_sample_2019_01 limit 10;
* postgresql://postgres:***@database-1.ciykztisaaxg.us-east-1.rds.amazonaws.com:5432/postgres 10 rows affected.
index | vendor_id | pickup_datetime | dropoff_datetime | passenger_count | pickup_location_id | dropoff_location_id | fare_amount |
---|---|---|---|---|---|---|---|
6317675 | 1 | 2019-01-26 19:14:29 | 2019-01-26 19:36:30 | 1.0 | 138 | 24 | 24.0 |
3641213 | 2 | 2019-01-16 09:39:38 | 2019-01-16 10:00:58 | 1.0 | 263 | 87 | 23.5 |
5593383 | 1 | 2019-01-24 08:02:05 | 2019-01-24 08:09:41 | 1.0 | 162 | 137 | 6.5 |
3754727 | 1 | 2019-01-16 17:07:03 | 2019-01-16 17:16:20 | 1.0 | 125 | 231 | 7.0 |
6127276 | 2 | 2019-01-26 00:10:28 | 2019-01-26 00:21:10 | 1.0 | 79 | 234 | 9.0 |
277605 | 2 | 2019-01-02 14:51:35 | 2019-01-02 15:05:33 | 2.0 | 170 | 113 | 11.0 |
5182683 | 2 | 2019-01-22 16:16:13 | 2019-01-22 16:29:47 | 1.0 | 246 | 186 | 9.5 |
1247101 | 1 | 2019-01-06 19:19:28 | 2019-01-06 19:29:39 | 1.0 | 79 | 113 | 9.0 |
871245 | 2 | 2019-01-05 03:49:43 | 2019-01-05 04:09:31 | 3.0 | 137 | 87 | 14.0 |
1869997 | 2 | 2019-01-09 14:57:02 | 2019-01-09 15:01:59 | 1.0 | 144 | 79 | 5.0 |
%sql select * from taxi_zone_lookup limit 10;
* postgresql://postgres:***@database-1.ciykztisaaxg.us-east-1.rds.amazonaws.com:5432/postgres 10 rows affected.
index | locationid | borough | zone | service_zone |
---|---|---|---|---|
0 | 1 | EWR | Newark Airport | EWR |
1 | 2 | Queens | Jamaica Bay | Boro Zone |
2 | 3 | Bronx | Allerton/Pelham Gardens | Boro Zone |
3 | 4 | Manhattan | Alphabet City | Yellow Zone |
4 | 5 | Staten Island | Arden Heights | Boro Zone |
5 | 6 | Staten Island | Arrochar/Fort Wadsworth | Boro Zone |
6 | 7 | Queens | Astoria | Boro Zone |
7 | 8 | Queens | Astoria Park | Boro Zone |
8 | 9 | Queens | Auburndale | Boro Zone |
9 | 10 | Queens | Baisley Park | Boro Zone |
# initiate a project
!dbt init dbt_demo
Running with dbt=0.21.1 Creating dbt configuration folder at /root/.dbt With sample profiles.yml for redshift Your new dbt project "dbt_demo" was created! If this is your first time using dbt, you'll need to set up your profiles.yml file -- this file will tell dbt how to connect to your database. You can find this file by running: xdg-open /root/.dbt For more information on how to configure the profiles.yml file, please consult the dbt documentation here: https://docs.getdbt.com/docs/configure-your-profile One more thing: Need help? Don't hesitate to reach out to us via GitHub issues or on Slack: https://community.getdbt.com/ Happy modeling!
# go into the newly created directory
%cd dbt_demo
/content/dbt_demo
# let's see what's all there
!tree --du -h -C .
. ├── [4.0K] analysis ├── [4.0K] data ├── [1.3K] dbt_project.yml ├── [4.0K] macros ├── [9.0K] models │ └── [5.0K] example │ ├── [ 475] my_first_dbt_model.sql │ ├── [ 115] my_second_dbt_model.sql │ └── [ 437] schema.yml ├── [ 571] README.md ├── [4.0K] snapshots └── [4.0K] tests 35K used in 7 directories, 5 files
# let's also see what's inside the `.dbt` folder
!tree --du -h -C ~/.dbt
/root/.dbt
└── [ 456] profiles.yml
4.4K used in 0 directories, 1 file
profiles = f"""
default:
outputs:
dev:
type: postgres
threads: 1
host: {HOST}
port: {PORT}
user: {USERNAME}
pass: {PASSWORD}
dbname: {DBNAME}
schema: public
prod:
type: postgres
threads: 1
host: {HOST}
port: {PORT}
user: {USERNAME}
pass: {PASSWORD}
dbname: {DBNAME}
schema: public
target: dev
"""
%store profiles > ~/.dbt/profiles.yml
Writing 'profiles' (str) to file '/root/.dbt/profiles.yml'.
!dbt debug
Running with dbt=0.21.1 dbt version: 0.21.1 python version: 3.7.13 python path: /usr/bin/python3 os info: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic Using profiles.yml file at /root/.dbt/profiles.yml Using dbt_project.yml file at /content/dbt_demo/dbt_project.yml Configuration: profiles.yml file [OK found and valid] dbt_project.yml file [OK found and valid] Required dependencies: - git [OK found] Connection: host: database-1.ciykztisaaxg.us-east-1.rds.amazonaws.com port: 5432 user: postgres database: postgres schema: public search_path: None keepalives_idle: 0 sslmode: None Connection test: [OK connection ok] All checks passed!
!tree ./models
./models └── example ├── my_first_dbt_model.sql ├── my_second_dbt_model.sql └── schema.yml 1 directory, 3 files
Note: Model files in dbt look very much like regular SQL queries, but they may contain some additional instructions using Jinja template syntax in the form of curly braces {{}}. For example, by using {{ ref('my_first_dbt_model') you can reference a model called my_first_dbt_model and can use it in a SQL query as you would any regular database table.
Navigate to the dbt_demo > models / example folder. You will see two .sql files and one .yml file. These are simply example models that were generated by the dbt init step:
my_first_dbt_model.sql creates some basic data consisting of two rows. It also contains a statement {{ config(materialized='table') }}
, which instructs dbt to materialize the model as a table in the database.
my_second_dbt_model.sql references the first model using the {{ ref('my_first_dbt_model') }}
syntax and selects all rows with id = 1 from the first model.
schema.yml contains metadata about both models, such as a human-readable description, the column schema, and tests for each model.
!cat ./models/example/my_first_dbt_model.sql
/* Welcome to your first dbt model! Did you know that you can also configure models directly within SQL files? This will override configurations stated in dbt_project.yml Try changing "table" to "view" below */ {{ config(materialized='table') }} with source_data as ( select 1 as id union all select null as id ) select * from source_data /* Uncomment the line below to remove records with null `id` values */ -- where id is not null
!cat ./models/example/my_second_dbt_model.sql
-- Use the `ref` function to select from other models select * from {{ ref('my_first_dbt_model') }} where id = 1
!cat ./models/example/schema.yml
version: 2 models: - name: my_first_dbt_model description: "A starter dbt model" columns: - name: id description: "The primary key for this table" tests: - unique - not_null - name: my_second_dbt_model description: "A starter dbt model" columns: - name: id description: "The primary key for this table" tests: - unique - not_null
!dbt run
Running with dbt=0.21.1 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 162 macros, 0 operations, 0 seed files, 0 sources, 0 exposures 08:48:57 | Concurrency: 1 threads (target='dev') 08:48:57 | 08:48:57 | 1 of 2 START table model public.my_first_dbt_model................... [RUN] 08:48:58 | 1 of 2 OK created table model public.my_first_dbt_model.............. [SELECT 2 in 0.12s] 08:48:58 | 2 of 2 START view model public.my_second_dbt_model................... [RUN] 08:48:58 | 2 of 2 OK created view model public.my_second_dbt_model.............. [CREATE VIEW in 0.09s] 08:48:58 | 08:48:58 | Finished running 1 table model, 1 view model in 0.53s. Completed successfully Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
This means that dbt successfully parsed out the two SQL queries in the model files, interpreted the instructions in the Jinja statements, and materialized the resulting data in your local PostgreSQL database into their respective views called my_first_dbt_model and my_second_dbt_model. Any consumers of the data could now query those views to access the transformed data.
In this step, you will create your own dbt models to transform the sample data in the PostgreSQL database and run the pipeline.
Staging
These files are so-called staging models, a pattern commonly used in dbt to prevent access to raw data. Staging models typically simply select from the source data and, if needed, contain some light transformations such as column renamings.
!mkdir -p ./models/taxi/staging
%%writefile ./models/taxi/staging/schema.yml
version: 2
sources:
- name: source
schema: public
tables:
- name: taxi_zone_lookup
- name: yellow_tripdata_sample_2019_01
models:
- name: taxi_zone_lookup_model
description: "A list of all taxi zones with codes in NYC"
columns:
- name: locationid
tests:
- not_null
- name: borough
tests:
- not_null
- name: zone
tests:
- not_null
- name: service_zone
tests:
- not_null
- name: taxi_trips_model
description: "A reduced version of yellow taxi trip data in NYC"
columns:
- name: vendor_id
tests:
- not_null
- accepted_values:
values: ['1', '2', '4']
- name: pickup_datetime
tests:
- not_null
- name: dropoff_datetime
tests:
- not_null
- name: passenger_count
tests:
- not_null
- name: pickup_location_id
tests:
- not_null
- name: dropoff_location_id
tests:
- not_null
- name: fare_amount
tests:
- not_null
Overwriting ./models/taxi/staging/schema.yml
%%writefile ./models/taxi/staging/taxi_trips_model.sql
select
vendor_id,
pickup_datetime,
dropoff_datetime,
passenger_count,
pickup_location_id,
dropoff_location_id,
fare_amount
from {{ source('source', 'yellow_tripdata_sample_2019_01') }}
Overwriting ./models/taxi/staging/taxi_trips_model.sql
%%writefile ./models/taxi/staging/taxi_zone_lookup_model.sql
select
locationid,
borough,
zone,
service_zone
from {{ source('source', 'taxi_zone_lookup') }}
Overwriting ./models/taxi/staging/taxi_zone_lookup_model.sql
We will now create another dbt model, which combines data from the two staging models. Let's assume we want to write a query to join the staging tables on the location ID fields and add the actual location names to the pickup and dropoff locations of the taxi ride data.
%%writefile ./models/taxi/trips_with_borough_name_model.sql
select
t.vendor_id,
t.pickup_datetime,
t.dropoff_datetime,
z1.borough as pickup_borough,
z2.borough as dropoff_borough,
t.passenger_count,
t.fare_amount
from {{ ref('taxi_trips_model') }} t
left join {{ ref('taxi_zone_lookup_model') }} z1
on t.pickup_location_id = z1.locationid
left join {{ ref('taxi_zone_lookup_model') }} z2
on t.dropoff_location_id = z2.locationid
Overwriting ./models/taxi/trips_with_borough_name_model.sql
%%writefile ./models/taxi/schema.yml
version: 2
models:
- name: trips_with_borough_name
description: "Combines taxi rides with the borough names for pickup and dropoff locations."
columns:
- name: vendor_id
- name: pickup_datetime
- name: dropoff_datetime
- name: pickup_borough
- name: dropoff_borough
- name: passenger_count
- name: fare_amount
Overwriting ./models/taxi/schema.yml
This file defines how the data in the models under taxi will be materialized. Modify the section to replace example with our new taxi directory:
%%writefile dbt_project.yml
name: 'my_new_project'
version: '1.0.0'
config-version: 2
profile: 'default'
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
models:
my_new_project:
taxi:
materialized: view
Overwriting dbt_project.yml
!dbt run
Running with dbt=0.21.1 Found 5 models, 16 tests, 0 snapshots, 0 analyses, 162 macros, 0 operations, 0 seed files, 2 sources, 0 exposures 08:57:16 | Concurrency: 1 threads (target='dev') 08:57:16 | 08:57:16 | 1 of 5 START table model public.my_first_dbt_model................... [RUN] 08:57:16 | 1 of 5 OK created table model public.my_first_dbt_model.............. [SELECT 2 in 0.13s] 08:57:16 | 2 of 5 START view model public.taxi_trips_model...................... [RUN] 08:57:16 | 2 of 5 OK created view model public.taxi_trips_model................. [CREATE VIEW in 0.09s] 08:57:16 | 3 of 5 START view model public.taxi_zone_lookup_model................ [RUN] 08:57:16 | 3 of 5 OK created view model public.taxi_zone_lookup_model........... [CREATE VIEW in 0.06s] 08:57:16 | 4 of 5 START view model public.my_second_dbt_model................... [RUN] 08:57:16 | 4 of 5 OK created view model public.my_second_dbt_model.............. [CREATE VIEW in 0.07s] 08:57:16 | 5 of 5 START view model public.trips_with_borough_name_model......... [RUN] 08:57:16 | 5 of 5 OK created view model public.trips_with_borough_name_model.... [CREATE VIEW in 0.06s] 08:57:17 | 08:57:17 | Finished running 1 table model, 4 view models in 0.72s. Completed successfully Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5
Run the staging schema tests
!dbt test -m taxi_trips_model stg_taxi_zone_lookup_model
Running with dbt=0.21.1 Found 5 models, 16 tests, 0 snapshots, 0 analyses, 162 macros, 0 operations, 0 seed files, 2 sources, 0 exposures The selection criterion 'stg_taxi_zone_lookup_model' does not match any nodes 08:58:12 | Concurrency: 1 threads (target='dev') 08:58:12 | 08:58:12 | 1 of 8 START test accepted_values_taxi_trips_model_vendor_id__1__2__4 [RUN] 08:58:12 | 1 of 8 FAIL 1 accepted_values_taxi_trips_model_vendor_id__1__2__4.... [FAIL 1 in 0.08s] 08:58:12 | 2 of 8 START test not_null_taxi_trips_model_dropoff_datetime......... [RUN] 08:58:12 | 2 of 8 PASS not_null_taxi_trips_model_dropoff_datetime............... [PASS in 0.06s] 08:58:12 | 3 of 8 START test not_null_taxi_trips_model_dropoff_location_id...... [RUN] 08:58:12 | 3 of 8 PASS not_null_taxi_trips_model_dropoff_location_id............ [PASS in 0.05s] 08:58:12 | 4 of 8 START test not_null_taxi_trips_model_fare_amount.............. [RUN] 08:58:12 | 4 of 8 PASS not_null_taxi_trips_model_fare_amount.................... [PASS in 0.05s] 08:58:12 | 5 of 8 START test not_null_taxi_trips_model_passenger_count.......... [RUN] 08:58:12 | 5 of 8 FAIL 381 not_null_taxi_trips_model_passenger_count............ [FAIL 381 in 0.05s] 08:58:12 | 6 of 8 START test not_null_taxi_trips_model_pickup_datetime.......... [RUN] 08:58:12 | 6 of 8 PASS not_null_taxi_trips_model_pickup_datetime................ [PASS in 0.06s] 08:58:12 | 7 of 8 START test not_null_taxi_trips_model_pickup_location_id....... [RUN] 08:58:12 | 7 of 8 PASS not_null_taxi_trips_model_pickup_location_id............. [PASS in 0.06s] 08:58:12 | 8 of 8 START test not_null_taxi_trips_model_vendor_id................ [RUN] 08:58:12 | 8 of 8 PASS not_null_taxi_trips_model_vendor_id...................... [PASS in 0.05s] 08:58:12 | 08:58:12 | Finished running 8 tests in 0.63s. Completed with 2 errors and 0 warnings: Failure in test accepted_values_taxi_trips_model_vendor_id__1__2__4 (models/taxi/staging/schema.yml) Got 1 result, configured to fail if != 0 compiled SQL at target/compiled/my_new_project/models/taxi/staging/schema.yml/schema_test/accepted_values_taxi_trips_model_vendor_id__1__2__4.sql Failure in test not_null_taxi_trips_model_passenger_count (models/taxi/staging/schema.yml) Got 381 results, configured to fail if != 0 compiled SQL at target/compiled/my_new_project/models/taxi/staging/schema.yml/schema_test/not_null_taxi_trips_model_passenger_count.sql Done. PASS=6 WARN=0 ERROR=2 SKIP=0 TOTAL=8
%%writefile ./models/taxi/staging/taxi_trips_model.sql
select
vendor_id,
pickup_datetime,
dropoff_datetime,
passenger_count,
pickup_location_id,
dropoff_location_id,
fare_amount
from {{ source('source', 'yellow_tripdata_sample_2019_01') }} tt
where tt.vendor_id in ('1','2','4')
and tt.passenger_count is not null
Overwriting ./models/taxi/staging/taxi_trips_model.sql
%%writefile ./models/taxi/staging/taxi_zone_lookup_model.sql
select
locationid,
borough,
zone,
service_zone
from {{ source('source', 'taxi_zone_lookup') }} lk
where lk.zone is not null
and lk.service_zone is not null
Overwriting ./models/taxi/staging/taxi_zone_lookup_model.sql
!dbt run
Running with dbt=0.21.1 Found 5 models, 16 tests, 0 snapshots, 0 analyses, 162 macros, 0 operations, 0 seed files, 2 sources, 0 exposures 08:59:07 | Concurrency: 1 threads (target='dev') 08:59:07 | 08:59:07 | 1 of 5 START table model public.my_first_dbt_model................... [RUN] 08:59:08 | 1 of 5 OK created table model public.my_first_dbt_model.............. [SELECT 2 in 0.12s] 08:59:08 | 2 of 5 START view model public.taxi_trips_model...................... [RUN] 08:59:08 | 2 of 5 OK created view model public.taxi_trips_model................. [CREATE VIEW in 0.09s] 08:59:08 | 3 of 5 START view model public.taxi_zone_lookup_model................ [RUN] 08:59:08 | 3 of 5 OK created view model public.taxi_zone_lookup_model........... [CREATE VIEW in 0.07s] 08:59:08 | 4 of 5 START view model public.my_second_dbt_model................... [RUN] 08:59:08 | 4 of 5 OK created view model public.my_second_dbt_model.............. [CREATE VIEW in 0.06s] 08:59:08 | 5 of 5 START view model public.trips_with_borough_name_model......... [RUN] 08:59:08 | 5 of 5 OK created view model public.trips_with_borough_name_model.... [CREATE VIEW in 0.07s] 08:59:08 | 08:59:08 | Finished running 1 table model, 4 view models in 0.73s. Completed successfully Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5
!dbt test -m taxi_trips_model stg_taxi_zone_lookup_model
Running with dbt=0.21.1 Found 5 models, 16 tests, 0 snapshots, 0 analyses, 162 macros, 0 operations, 0 seed files, 2 sources, 0 exposures The selection criterion 'stg_taxi_zone_lookup_model' does not match any nodes 08:59:19 | Concurrency: 1 threads (target='dev') 08:59:19 | 08:59:19 | 1 of 8 START test accepted_values_taxi_trips_model_vendor_id__1__2__4 [RUN] 08:59:20 | 1 of 8 PASS accepted_values_taxi_trips_model_vendor_id__1__2__4...... [PASS in 0.25s] 08:59:20 | 2 of 8 START test not_null_taxi_trips_model_dropoff_datetime......... [RUN] 08:59:20 | 2 of 8 PASS not_null_taxi_trips_model_dropoff_datetime............... [PASS in 0.18s] 08:59:20 | 3 of 8 START test not_null_taxi_trips_model_dropoff_location_id...... [RUN] 08:59:20 | 3 of 8 PASS not_null_taxi_trips_model_dropoff_location_id............ [PASS in 0.25s] 08:59:20 | 4 of 8 START test not_null_taxi_trips_model_fare_amount.............. [RUN] 08:59:20 | 4 of 8 PASS not_null_taxi_trips_model_fare_amount.................... [PASS in 0.08s] 08:59:20 | 5 of 8 START test not_null_taxi_trips_model_passenger_count.......... [RUN] 08:59:20 | 5 of 8 PASS not_null_taxi_trips_model_passenger_count................ [PASS in 0.07s] 08:59:20 | 6 of 8 START test not_null_taxi_trips_model_pickup_datetime.......... [RUN] 08:59:20 | 6 of 8 PASS not_null_taxi_trips_model_pickup_datetime................ [PASS in 0.07s] 08:59:20 | 7 of 8 START test not_null_taxi_trips_model_pickup_location_id....... [RUN] 08:59:20 | 7 of 8 PASS not_null_taxi_trips_model_pickup_location_id............. [PASS in 0.06s] 08:59:20 | 8 of 8 START test not_null_taxi_trips_model_vendor_id................ [RUN] 08:59:21 | 8 of 8 PASS not_null_taxi_trips_model_vendor_id...................... [PASS in 0.06s] 08:59:21 | 08:59:21 | Finished running 8 tests in 1.25s. Completed successfully Done. PASS=8 WARN=0 ERROR=0 SKIP=0 TOTAL=8