Author: Fernando Pérez.
A demonstration of how to use Python, Julia, Fortran and R cooperatively to analyze data, in the same process.
This is supported by the IPython kernel and a few extensions that take advantage of IPython's magic system to provide low-level integration between Python and other languages.
See the companion notebook for data preparation and setup.
Used for a lecture at the Berkeley Institute for Data Science. The lecture video has a live demo of this material.
License: CC-BY.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
plt.style.use("seaborn")
plt.rcParams["figure.figsize"] = (10, 6)
sns.set_context("talk", font_scale=1.4)
Let's begin by reading our dataset and having a quick look:
data = pd.read_csv('data.csv')
print(data.shape)
data.head(3)
Ah, it looks like we have a quantitative dataset with $(x,y)$ pairs - a scatterplot is a decent starting point to get a feel for these data:
data.plot.scatter(x='x', y='y');
Mmh, what to do?
Let's try to build a simple linear model for these data, with some features we'll extract from the data. There's probably:
In summary, let's try
$$ y \sim \theta_1 x + \theta_2 x^2 + \theta_3 \sin(x^2) $$Maybe Julia can help us efficiently compute that nasty non-linear feature, $x^2$?
%load_ext julia.magic
jxsq = %julia xsq(x) = x.^2 # Simplest way to define a function in Julia
We've defined the function xsq
in Julia, and in Python we have it available as jxsq
, which we can call as a normal Python function:
x = data['x']
f2 = jxsq(x)
x.shape == f2.shape # simple sanity check
%load_ext fortranmagic
%%fortran
subroutine sinx2(x, y, n)
real, intent(in), dimension(n) :: x
real, intent(out), dimension(n) :: y
!intent(hide) :: n
y = sin(x**2)
end subroutine sinx2
Now, sinx2
can be used as a plain Python function too:
f3 = sinx2(x)
f3.shape == x.shape # same sanity check
We now have our data y
and our features $x$, $f_2 = x^2$ and $f_3 = \sin(x^2)$. This is a classic linear modeling problem, and R is awesome at fitting those!
Let's put our features together in a nice design matrix and load up R:
A = np.column_stack([x, f2, f3])
A.shape
%load_ext rpy2.ipython
y = data['y']
In R, this can be written as a linear model lm(y ~ 0 + A)
.
Note that we'll ask for the fit coefficients fitc
to keep moving forward:
%%R -i y,A -o fitc
ylm = lm(y ~ 0 + A)
fitc = coef(ylm)
print(summary(ylm))
par(mfrow=c(2,2))
plot(ylm)
R gave us our fit coefficient vector fitc
, we can now proceed using it:
fitc
We construct our fitted model and visualize our results:
yfit = A @ fitc
plt.plot(x, y, 'o', label='data')
plt.plot(x, yfit, label='fit', color='orange', lw=4)
plt.title('Julia, Python and R working in Jupyter')
plt.legend();
def f(x):
return x**2-x
def integrate_f(a, b, N):
s = 0; dx = (b-a)/N
for i in range(N):
s += f(a+i*dx)
return s * dx
%load_ext Cython
%%cython -a
cdef double fcy(double x) except? -2:
return x**2-x
def integrate_fcy(double a, double b, int N):
cdef int i
cdef double s, dx
s = 0; dx = (b-a)/N
for i in range(N):
s += fcy(a+i*dx)
return s * dx
tpy = %timeit -o -n 1000 integrate_f(0, 1, 100)
tcy = %timeit -o -n 1000 integrate_fcy(0, 1, 100)
tpy.best/tcy.best
%julia @pyimport numpy as np
%julia @pyimport matplotlib.pyplot as plt
%%julia
# Note how we mix numpy and julia:
t = linspace(0,2*pi,1000); # use the julia linspace
s = sin(3*t + 4*np.cos(2*t)); # use the numpy cosine and julia sine
fig = plt.gcf()
plt.plot(t, s, color="red", linewidth=2.0, linestyle="--")
fig = %julia fig
fig.suptitle("Adding a title!")
fig
%%fortran
subroutine f1(x, y, n)
real, intent(in), dimension(n) :: x
real, intent(out), dimension(n) :: y
!intent(hide) :: n
y = sin(x**2) - cos(x)
end subroutine f1
t = np.linspace(0,2* np.pi, 1000)
plt.plot(f1(t));
%%bash
echo $HOME
%%perl
@months = ("Jan", "Feb", "Mar");
print($months[1])