Julia is getting more and more interesting as a way to translate math into beautiful-looking code. After some halted attempts earlier at Julia 1.0 and 1.4, I'm giving it a try again at 1.6 after it seems like Julia is looking more and more stable. So here I'll write down a simple task that would be trivial to do in Python, except I'll do it in Julia instead.

I want to:

Read a CSV file from the web (i.e. in a github repository) as a dataframe
Do some data cleaning - text replacement, fill na's, etc

Load a CSV File¶

In [1]:

using HTTP, CSV, DataFrames, PyCall, DataFramesMeta

In [2]:

# Use a GET command to get some data, in this case it's a String
# r = HTTP.get("https://api.github.com/users/banditkings").body |> String;

In [3]:

url = "https://ocw.mit.edu/courses/sloan-school-of-management/15-071-the-analytics-edge-spring-2017/linear-regression/assignment-2/climate_change.csv";

Docstrings must be placed before a function call. On the plus side, it seems to render the docstring really nicely:

In [4]:

"""
Basic function call example. In this case we're using it to fetch a CSV file from
github here.

You can even use latex in here! \$f(x) = Ax + b\$

<span style="color:red">and HTML!!</span>

Parameters
----------
url : String
* url address for the CSV file that you want to read in as a dataframe

Returns
-------
dataframe
"""    
function get_data(url)
    body = HTTP.get(url).body
    df = CSV.read(body, DataFrame);
end;

In [5]:

?get_data() # display the docstring we just generated

Out[5]:

Basic function call example. In this case we're using it to fetch a CSV file from github here.

You can even use latex in here! $f(x) = Ax + b$

and HTML!!

Parameters¶

url : String

url address for the CSV file that you want to read in as a dataframe

Returns¶

dataframe

Control Flows and `try`:`catch`¶

Our little get_data function is fine but I don't want to constantly be downloading this file if I don't have to...how about something like python's try:except and learn a little about how Julia handles control flows?

Goal: I want to try to load from CSV file if it's available, and load from a url if it doesn't exist.
Reference: Control Flows and try:catch here!

In [6]:

try
    df = CSV.read("climate_change.csv", DataFrame);
catch
    println("No CSV Found, Downloading from source")
    df = get_data(url);
    # We can also save the csv file locally
    CSV.write("climate_change.csv", df)
end;

df = CSV.read("climate_change.csv", DataFrame);

Python vs Julia: appending an item to a list (and PyCall in Jupyter)¶

Another key difference from python is appending lists together. In python, you'll create a list like:

>>> foo = [1]
>>> foo = foo + ['a', 'b']
>>> foo
[1, 'a', 'b']

Actually why don't we use the PyCall module to demonstrate the above? With PyCall, you can call Python and get functions and variables out. I tried this a while back and it seemed to be really bad at supporting pandas dataframes but things like numpy arrays are fine.

In [7]:

# Example of using Python within a jupyter notebook cell with a Julia kernel

py"""
foo = [1]
foo = foo + ['a', 'b']
"""

# Reference an existing variable or function
py"foo"

Out[7]:

3-element Vector{Any}:
 1
  "a"
  "b"

Now here's the same thing in Julia:

In [8]:

foo = Any[1]; # need to set to Any or it'll assume it's an Integer and it'll fail on the two strings there
append!(foo, ["a"])

Out[8]:

2-element Vector{Any}:
 1
  "a"

What is that `!` symbol?¶

Okay notice the exclamation mark ! after append - this is special Julia notation for functions that modify something inplace. In this case, append! adds an item to the list in place. You could use push! if adding a single item. And there's no append function, only append!. How do you do something where you don't modify it in place? You can't, but you can make a copy:

In [9]:

bar = foo |> copy
append!(bar, ["b"])

Out[9]:

3-element Vector{Any}:
 1
  "a"
  "b"

In [10]:

foo == bar

Out[10]:

false

DataFrame Manipulation¶

With Base Julia: Documentation

head and tail become first and last with Julia.

In [11]:

first(df, 5)

Out[11]:

5 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863
2	1983	6	2.167	345.52	1633.71	303.746	192.057	351.848	1366.12	0.0794
3	1983	7	1.741	344.15	1633.22	303.795	192.818	353.725	1366.29	0.0731
4	1983	8	1.13	342.25	1631.35	303.839	193.602	355.633	1366.42	0.0673
5	1983	9	0.428	340.17	1648.4	303.901	194.392	357.465	1366.23	0.0619

You can also pipe commands as well with Julia's base pipe |>

In [12]:

df |> last

Out[12]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
308	2008	12	-0.666	385.56	1812.88	322.182	244.204	535.005	1365.69	0.0046

How about the last 5 by pipe |>? Well it's not as good as R's dplyr unfortunately...the base pipe function in Julia can't handle multiple parameters. There are other libraries like Pipe and Chain that make use of macros, but compared to dplyr or even method chaining in Python, they both are too clunky.

Libraries like the Query from the Queryverse attempt to address this but one of the many issues of trying to do work in a new language that's decades behind Python and R.

Indexing and Slicing¶

Indexing is just like pandas like:

df[rows, columns]

In [13]:

# get Rows 5-10 (INCLUSIVE), and all columns
df[5:10, :]

Out[13]:

6 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	9	0.428	340.17	1648.4	303.901	194.392	357.465	1366.23	0.0619
2	1983	10	0.002	340.3	1663.79	303.97	195.171	359.174	1366.06	0.0569
3	1983	11	-0.176	341.53	1658.23	304.032	195.921	360.758	1366.11	0.0524
4	1983	12	-0.176	343.07	1654.31	304.082	196.609	362.174	1366.06	0.0486
5	1984	1	-0.339	344.05	1658.98	304.13	197.219	363.359	1365.43	0.0451
6	1984	2	-0.565	344.77	1656.48	304.194	197.759	364.296	1365.66	0.0416

You can exclude columns with Not(), and you can use regex, too. What a dream!!

In [14]:

# Exclude all columns that have a capital C in it
df[1, Not(r"C.*")]

Out[14]:

DataFrameRow (7 columns)

	Year	Month	MEI	N2O	TSI	Aerosols	Temp
	Int64	Int64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	303.677	1366.1	0.0863	0.109

Checking out the data:

size(df) : To get dataframe dimensions you need to use size instead of length or len or shape:
describe(df) : Gives you summary info

In [15]:

size(df)

Out[15]:

(308, 11)

In [16]:

describe(df)

Out[16]:

11 rows × 7 columns

	variable	mean	min	median	max	nmissing	eltype
	Symbol	Float64	Real	Float64	Real	Int64	DataType
1	Year	1995.66	1983	1996.0	2008	0	Int64
2	Month	6.55195	1	7.0	12	0	Int64
3	MEI	0.275555	-1.635	0.2375	3.001	0	Float64
4	CO2	363.227	340.17	361.735	388.5	0	Float64
5	CH4	1749.82	1629.89	1764.04	1814.18	0	Float64
6	N2O	312.392	303.677	311.507	322.182	0	Float64
7	CFC-11	251.973	191.324	258.344	271.494	0	Float64
8	CFC-12	497.525	350.113	528.356	543.813	0	Float64
9	TSI	1366.07	1365.43	1365.98	1367.32	0	Float64
10	Aerosols	0.0166571	0.0016	0.00575	0.1494	0	Float64
11	Temp	0.256776	-0.282	0.248	0.739	0	Float64

Filtering DataFrames Based on Conditions¶

How do you conditionally filter a dataframe? One way to do it is through indexing, similar to how you would do it in Pandas.

Note that you can reference a dataframe column (like a pandas series) both with and without quotes, so df.Year is the same as df."Year". Handy for dealing with spaces!

In [17]:

df[(df.Year.==1985) .& (df."Month".<=4), :]

Out[17]:

4 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1985	1	-0.561	345.25	1663.42	305.301	208.537	373.623	1365.61	0.0164
2	1985	2	-0.602	346.06	1666.21	305.243	209.543	374.681	1365.71	0.016
3	1985	3	-0.737	347.66	1678.34	305.165	210.368	376.004	1365.66	0.0141
4	1985	4	-0.484	348.2	1675.24	305.093	211.111	377.635	1365.51	0.0138

What is that `.` doing in front of the `<=`, `==` and `&` sign?¶

The . sign there is doing some heavy lifting. It's an 'elementwise' operator, meaning that it does the comparison across every element in the vector df.Year and df.Month. If you don't have them this is what it would return for the df.Year comparison:

In [18]:

df.Year==1985 # Returns true if ALL values of Year are equal to 1985

Out[18]:

false

When what we really want is something like a list (vector) of values that compares each element and returns either true or false for each element in each vector.

In [19]:

println((df.Year.==1985 )[1:5])

Bool[0, 0, 0, 0, 0]

Another example

In [20]:

println(([1,1,0].==[1,0,1]))

Bool[1, 0, 0]

In [21]:

println((df.Year.==1985) .& (df."Month".<=4))

Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

DataFrames also has a filter and subset option to index based on row values, but they're a little messy

The filter(f, df) function lets you filter by passing in a function, like:

In [22]:

filter(:Year=> ==(1985), df) |> first

Out[22]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1985	1	-0.561	345.25	1663.42	305.301	208.537	373.623	1365.61	0.0164

What is the colon `:` doing in front of `Year`?¶

Another nuance with Julia is this little colon in front of a column name, and that's because Year is a Symbol type. This is a special thing you can do for performance, but you can still index columns by string too - handy if your column name has spaces in it.

In [23]:

filter(:"Year"=> ==(1985), df) |> first

Out[23]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1985	1	-0.561	345.25	1663.42	305.301	208.537	373.623	1365.61	0.0164

What the heck is the `=>` doing after the column?¶

More nuance, and this is usually the part where I go back to Pandas or Dplyr. What's actually happening here is that it's assigning a Pair like key=>value.

Filtering Rows¶

The row logic makes sense to me more, but it's not as performant

In [24]:

@time begin
    filter(row -> (row.Year in [1983, 1984]), df) |> first
end

  0.190525 seconds (834.12 k allocations: 43.820 MiB, 99.68% compilation time)

Out[24]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863

Remember that -> is an anonymous function and filter is expecting a function as its first argument.

See Julia for Data Science for a good introductiont to filter and subset: julia for data science
Another Good writeup here in this article why do we have both subset and filter functions?

This row filtering is equivalent to:

In [25]:

@time begin
    df[(df.Year .==1983) .| (df.Year .==1984), :] |> first
end

  0.098306 seconds (370.27 k allocations: 18.121 MiB, 99.74% compilation time)

Out[25]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863

Or also:

In [26]:

# using a list comprehension works for the `in` operator here
@time begin
    df[[ year in [1983,1984] for year in df.Year], :] |> first
end

  0.071469 seconds (159.67 k allocations: 8.747 MiB, 99.50% compilation time)

Out[26]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863

And you can combine indexing like this:

In [27]:

@time begin 
    filter(row -> (row.Year in [1983, 1984]) & (row.Month <= 4), df)
end

  0.084675 seconds (455.85 k allocations: 23.343 MiB, 99.34% compilation time)

Out[27]:

4 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1984	1	-0.339	344.05	1658.98	304.13	197.219	363.359	1365.43	0.0451
2	1984	2	-0.565	344.77	1656.48	304.194	197.759	364.296	1365.66	0.0416
3	1984	3	0.131	345.46	1655.77	304.285	198.249	365.044	1366.17	0.0383
4	1984	4	0.331	346.77	1657.68	304.389	198.723	365.692	1365.57	0.0352

Or equivalently with plain indexing:

In [28]:

# using a list comprehension works because there's no elementwise `.in` or `.∈` operator here
@time begin
    condition1 = [year in [1983,1984] for year in df.Year]
    condition2 = [month <= 5 for month in df.Month]
    df[condition1 .& condition2, :] |> first
end

  0.206541 seconds (533.77 k allocations: 27.184 MiB, 13.27% gc time, 99.64% compilation time)

Out[28]:

DataFrameRow (11 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863

So the nice thing about filter is that you can apply functions and stuff to it, like you could do calculations and then do a boolean off of the calculation, etc etc. Either way it's not too slow.

Subset(df, args)¶

Returns a copy of the dataframe where args are true

In [29]:

subset(df, :Year=> ByRow(==(1983)))

Out[29]:

8 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863
2	1983	6	2.167	345.52	1633.71	303.746	192.057	351.848	1366.12	0.0794
3	1983	7	1.741	344.15	1633.22	303.795	192.818	353.725	1366.29	0.0731
4	1983	8	1.13	342.25	1631.35	303.839	193.602	355.633	1366.42	0.0673
5	1983	9	0.428	340.17	1648.4	303.901	194.392	357.465	1366.23	0.0619
6	1983	10	0.002	340.3	1663.79	303.97	195.171	359.174	1366.06	0.0569
7	1983	11	-0.176	341.53	1658.23	304.032	195.921	360.758	1366.11	0.0524
8	1983	12	-0.176	343.07	1654.31	304.082	196.609	362.174	1366.06	0.0486

Subset with multiple conditions¶

To do multiple conditions (including an OR operator) we need to use a function. Both filter and subset are powerful because of the ability to pass functions as arguments.

In [30]:

function my_filter(year, month)
    year_match = (year in [1983, 1984])
    month_match = (month<=4)
    return year_match & month_match
end;

# apply this filter for each row
subset(df, [:Year, :Month]=> ByRow(my_filter))

Out[30]:

4 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1984	1	-0.339	344.05	1658.98	304.13	197.219	363.359	1365.43	0.0451
2	1984	2	-0.565	344.77	1656.48	304.194	197.759	364.296	1365.66	0.0416
3	1984	3	0.131	345.46	1655.77	304.285	198.249	365.044	1366.17	0.0383
4	1984	4	0.331	346.77	1657.68	304.389	198.723	365.692	1365.57	0.0352

subset is nice because it can handle missing values, whereas filter won't. So in short, for most of your use cases you should stick with indexing or using subset.

In [31]:

@time begin
    subset(df, [:Year, :Month]=> ByRow(my_filter))
end

  0.000168 seconds (507 allocations: 39.328 KiB)

Out[31]:

4 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1984	1	-0.339	344.05	1658.98	304.13	197.219	363.359	1365.43	0.0451
2	1984	2	-0.565	344.77	1656.48	304.194	197.759	364.296	1365.66	0.0416
3	1984	3	0.131	345.46	1655.77	304.285	198.249	365.044	1366.17	0.0383
4	1984	4	0.331	346.77	1657.68	304.389	198.723	365.692	1365.57	0.0352

`@subset`¶

Instead of the filter and subset stuff in base dataframes, we can use DataFramesMeta:

In [32]:

@subset(df, (:Year .<1984), (:Month .==5))

Out[32]:

1 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863

You can do @rsubset to filter across rows so you don't need to do the elementwise . every time:

In [33]:

@rsubset(df, :Year<1984, :Month==5)

Out[33]:

1 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863

So in the above example we are using an AND statement, it'll keep doing the ANDs across each item there. So if you want to do something fancier like an OR statement you'll need to wrap it in parenthesis:

In [34]:

@time begin
    @rsubset(df, (:Year<1984 && :Month==5) || (:Year==1985 && :Month==3))
end

  0.134172 seconds (490.13 k allocations: 26.135 MiB, 99.18% compilation time)

Out[34]:

2 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1983	5	2.556	345.96	1638.59	303.677	191.324	350.113	1366.1	0.0863
2	1985	3	-0.737	347.66	1678.34	305.165	210.368	376.004	1365.66	0.0141

Or you can always pass a custom function too, just like before (this is fast!)

In [35]:

@time begin
    @rsubset(df, my_filter(:Year, :Month))
end

  0.000125 seconds (508 allocations: 39.375 KiB)

Out[35]:

4 rows × 11 columns (omitted printing of 1 columns)

	Year	Month	MEI	CO2	CH4	N2O	CFC-11	CFC-12	TSI	Aerosols
	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1984	1	-0.339	344.05	1658.98	304.13	197.219	363.359	1365.43	0.0451
2	1984	2	-0.565	344.77	1656.48	304.194	197.759	364.296	1365.66	0.0416
3	1984	3	0.131	345.46	1655.77	304.285	198.249	365.044	1366.17	0.0383
4	1984	4	0.331	346.77	1657.68	304.389	198.723	365.692	1365.57	0.0352

Nice, it's super fast too! I think this is the way.

@chain from DataFramesMeta¶

Instead of the clunky filtering and indexing, you can try out some of the querying libraries that Julia has to offer. They are all going to be different than pandas and dplyr, but that might not be a bad thing. The downside is that there isn't one obvious/clear standard yet in Julia, so for the practical business user or accidental coder it's still tough sledding.

It's November 2021 and we've had stuff like Query and Queryverse for a while now, with DataFramesMeta in the mix. I'll show off @chain macro stuff here from the DataFramesMeta package, and we've already been familiar with @rsubset from earlier.

In [36]:

df2 = rename(df,:CO2 => "Carbon Dioxide");

But there's more! Here's a guide for people coming from dplyr from the DataFramesMeta folks.

In [37]:

@chain df2 begin
    @rsubset(:Year==1985, :Month<=4)
    @select(:MEI, :Aerosols, :"Carbon Dioxide")
end

Out[37]:

4 rows × 3 columns

	MEI	Aerosols	Carbon Dioxide
	Float64	Float64	Float64
1	-0.561	0.0164	345.25
2	-0.602	0.016	346.06
3	-0.737	0.0141	347.66
4	-0.484	0.0138	348.2

Using our previous OR filter:

In [38]:

@chain df2 begin
    @rsubset(my_filter(:Year, :Month))
    @select(:MEI, :Aerosols, :"Carbon Dioxide")
end

Out[38]:

4 rows × 3 columns

	MEI	Aerosols	Carbon Dioxide
	Float64	Float64	Float64
1	-0.339	0.0451	344.05
2	-0.565	0.0416	344.77
3	0.131	0.0383	345.46
4	0.331	0.0352	346.77

In [ ]:

Load a CSV File¶

Parameters¶

Returns¶

Control Flows and try:catch¶

Python vs Julia: appending an item to a list (and PyCall in Jupyter)¶

What is that ! symbol?¶

DataFrame Manipulation¶

Indexing and Slicing¶

Filtering DataFrames Based on Conditions¶

What is that . doing in front of the <=, == and & sign?¶

What is the colon : doing in front of Year?¶

What the heck is the => doing after the column?¶

Filtering Rows¶

Subset(df, args)¶

Subset with multiple conditions¶

@subset¶

@chain from DataFramesMeta¶

Control Flows and `try`:`catch`¶

What is that `!` symbol?¶

What is that `.` doing in front of the `<=`, `==` and `&` sign?¶

What is the colon `:` doing in front of `Year`?¶

What the heck is the `=>` doing after the column?¶

`@subset`¶