Arel is a very popular ruby gem that is one of the major components of the most popular ruby frameworks, Rails. It is an ORM-helper of sorts that exposes a beatiful and intuitive syntax for creating SQL strings by chaining Ruby methods.
In daru, we have successfully adopted this syntax and the result is a very intuitive and readable syntax for obtaining any sort of data from a DataFrame or Vector.
Lets see how this syntax works with some examples:
require 'daru'
true
To use this syntax we call any of the comparator methods defined on Daru::Vector
and pass the results from these to the #where
method available for both DataFrame
and Vector
.
To demonstrate with a quick example:
vector = Daru::Vector.new([2,4,5,51,5,16,2,5,3,2,1,5,2,5,2,1,56,234,6,21])
vector.where((vector.eq(5) | vector.eq(1)) & vector.mt(2))
Daru::Vector:75332930 size: 5 | |
---|---|
nil | |
2 | 5 |
4 | 5 |
7 | 5 |
11 | 5 |
13 | 5 |
** Note that we use union OR (|
) and union AND (&
) and not logical OR (||
) or logical AND (&&
) **
Thus the results returned by the Vector#eq
method are evaluated by Vector#where
and the generated Vector is returned. The index is also preserved.
The where
clause can also be used with DataFrame, with similar results.
df = Daru::DataFrame.new({
a: [1,2,3,4,5,6]*100,
b: ['a','b','c','d','e','f']*100,
c: [11,22,33,44,55,66]*100
}, index: (1..600).to_a.shuffle)
df.where(df[:a].eq(2) | df[:c].eq(55))
Daru::DataFrame:81901580 rows: 200 cols: 3 | |||
---|---|---|---|
a | b | c | |
33 | 2 | b | 22 |
465 | 5 | e | 55 |
527 | 2 | b | 22 |
114 | 5 | e | 55 |
48 | 2 | b | 22 |
593 | 5 | e | 55 |
416 | 2 | b | 22 |
415 | 5 | e | 55 |
334 | 2 | b | 22 |
165 | 5 | e | 55 |
533 | 2 | b | 22 |
49 | 5 | e | 55 |
360 | 2 | b | 22 |
164 | 5 | e | 55 |
240 | 2 | b | 22 |
369 | 5 | e | 55 |
414 | 2 | b | 22 |
565 | 5 | e | 55 |
525 | 2 | b | 22 |
76 | 5 | e | 55 |
390 | 2 | b | 22 |
322 | 5 | e | 55 |
187 | 2 | b | 22 |
400 | 5 | e | 55 |
341 | 2 | b | 22 |
288 | 5 | e | 55 |
348 | 2 | b | 22 |
62 | 5 | e | 55 |
421 | 2 | b | 22 |
336 | 5 | e | 55 |
474 | 2 | b | 22 |
229 | 5 | e | 55 |
... | ... | ... | ... |
196 | 5 | e | 55 |
The comparator methods on Vector return an object of type Daru::Core::Query::BoolArray
which lets us perform OR and AND operations on it. See this blog post for more information on BoolArray and other comparator methods.
As a convenience for readability, you can also you the #and
or #or
methods instead of #&
and #|
.
vector.where(vector.eq(2).or(vector.eq(5)).and(vector.mt(2)))
Daru::Vector:82473630 size: 5 | |
---|---|
nil | |
2 | 5 |
4 | 5 |
7 | 5 |
11 | 5 |
13 | 5 |
One of the major advantages of using the where
clause over other more robust methods like DataFrame#filter
or Vector#keep_if
is that it is much faster (though not destructive). These benchmarks prove my point.
Daru::DataFrame offers the #join
method for performing SQL style joins between two DataFrames. Currently #join supports inner, left outer, right outer and full outer joins between DataFrames.
To demonstrate:
left = Daru::DataFrame.new({
:id => [1,2,3,4],
:name => ['Pirate', 'Monkey', 'Ninja', 'Spaghetti']
})
right = Daru::DataFrame.new({
:id => [1,2,3,4],
:name => ['Rutabaga', 'Pirate', 'Darth Vader', 'Ninja']
})
nil
To perform an inner join on the :name
column:
left.join(right, on: [:name], how: :inner)
Daru::DataFrame:82190730 rows: 2 cols: 3 | |||
---|---|---|---|
id_1 | name | id_2 | |
0 | 1 | Pirate | 2 |
1 | 3 | Ninja | 4 |
An outer left join can be done with:
left.join(right, on: [:name], how: :left)
Daru::DataFrame:82015360 rows: 4 cols: 3 | |||
---|---|---|---|
id_1 | name | id_2 | |
0 | 1 | Pirate | 2 |
1 | 2 | Monkey | |
2 | 3 | Ninja | 4 |
3 | 4 | Spaghetti |
An outer right join can be done like so:
left.join(right, on: [:name], how: :right)
Daru::DataFrame:81791190 rows: 4 cols: 3 | |||
---|---|---|---|
id_1 | name | id_2 | |
0 | Rutabaga | 1 | |
1 | 1 | Pirate | 2 |
2 | Darth Vader | 3 | |
3 | 3 | Ninja | 4 |
And finally, a full outer join:
left.join(right, on: [:name], how: :outer)
Daru::DataFrame:81527400 rows: 6 cols: 3 | |||
---|---|---|---|
id_1 | name | id_2 | |
0 | 1 | Pirate | 2 |
1 | 2 | Monkey | |
2 | 3 | Ninja | 4 |
3 | 4 | Spaghetti | |
4 | Rutabaga | 1 | |
5 | Darth Vader | 3 |