Following the example I used in Python, where I want to conditionally join two dataframes with 'wildcard' categories, I'll demonstrate the same technique in Julia and the DataFramesMeta
package.
using DataFramesMeta
# Create our two dataframes, starting with the left dataframe
sales_volume_table = DataFrame("date"=>["2021-11-15", "2021-11-20", "2021-11-25", "2021-11-26"],
"quantity"=>[1,2,3,2],
"brand"=>["Outdoor", "Leisure", "Athletic", "Outdoor"])
4 rows × 3 columns
date | quantity | brand | |
---|---|---|---|
String | Int64 | String | |
1 | 2021-11-15 | 1 | Outdoor |
2 | 2021-11-20 | 2 | Leisure |
3 | 2021-11-25 | 3 | Athletic |
4 | 2021-11-26 | 2 | Outdoor |
# right dataframe
promos_table = DataFrame("start_date"=>["2021-11-01", "2021-11-25"],
"end_date"=>["2021-11-25", "2021-11-26"],
"brand"=>["ANY", "Outdoor"],
"rebate_per_unit"=>[3,5])
2 rows × 4 columns
start_date | end_date | brand | rebate_per_unit | |
---|---|---|---|---|
String | String | String | Int64 | |
1 | 2021-11-01 | 2021-11-25 | ANY | 3 |
2 | 2021-11-25 | 2021-11-26 | Outdoor | 5 |
Now we want to left join both dataframes where there is either a brand match or brand is 'ANY'. Julia has a crossjoin
function that does exactly what we're looking for without the need to create an intermediate column like we did in pandas.
results = crossjoin(sales_volume_table, promos_table, makeunique=true)
8 rows × 7 columns
date | quantity | brand | start_date | end_date | brand_1 | rebate_per_unit | |
---|---|---|---|---|---|---|---|
String | Int64 | String | String | String | String | Int64 | |
1 | 2021-11-15 | 1 | Outdoor | 2021-11-01 | 2021-11-25 | ANY | 3 |
2 | 2021-11-15 | 1 | Outdoor | 2021-11-25 | 2021-11-26 | Outdoor | 5 |
3 | 2021-11-20 | 2 | Leisure | 2021-11-01 | 2021-11-25 | ANY | 3 |
4 | 2021-11-20 | 2 | Leisure | 2021-11-25 | 2021-11-26 | Outdoor | 5 |
5 | 2021-11-25 | 3 | Athletic | 2021-11-01 | 2021-11-25 | ANY | 3 |
6 | 2021-11-25 | 3 | Athletic | 2021-11-25 | 2021-11-26 | Outdoor | 5 |
7 | 2021-11-26 | 2 | Outdoor | 2021-11-01 | 2021-11-25 | ANY | 3 |
8 | 2021-11-26 | 2 | Outdoor | 2021-11-25 | 2021-11-26 | Outdoor | 5 |
# Filter the results where the two columns match, OR the right column is 'ANY'
@rsubset(results, (:brand==:brand_1) || (:brand_1=="ANY"),
(:date>=:start_date) && (:date <=:end_date))
4 rows × 7 columns
date | quantity | brand | start_date | end_date | brand_1 | rebate_per_unit | |
---|---|---|---|---|---|---|---|
String | Int64 | String | String | String | String | Int64 | |
1 | 2021-11-15 | 1 | Outdoor | 2021-11-01 | 2021-11-25 | ANY | 3 |
2 | 2021-11-20 | 2 | Leisure | 2021-11-01 | 2021-11-25 | ANY | 3 |
3 | 2021-11-25 | 3 | Athletic | 2021-11-01 | 2021-11-25 | ANY | 3 |
4 | 2021-11-26 | 2 | Outdoor | 2021-11-25 | 2021-11-26 | Outdoor | 5 |