This small Jupyter notebook shows a simple benchmark comparing various implementations in Python and one in Julia of a specific numerical algorithm, the Romberg integration method.
For Python:
For Julia:
We will use scipy.integrate.quad
function to compare the result of our manual implementations.
from scipy.integrate import quad
Let try it with this function $f(x)$ on $[a,b]=[1993,2015]$:
$$ f(x) := \frac{12x+1}{1+\cos(x)^2} $$import math
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
a, b = 1993, 2017
quad(f, a, b)
(409937.57869797916, 8.482168070998719e-05)
The first value is the numerical value of the integral $\int_{a}^{b} f(x) \mathrm{d}x$ and the second value is an estimate of the numerical error.
$0.4\%$ is not much, alright.
See https://mec-cs101-integrals.readthedocs.io/en/latest/_modules/integrals.html#romberg_rec for the code and https://mec-cs101-integrals.readthedocs.io/en/latest/integrals.html#integrals.romberg_rec for the doc
def romberg_rec(f, xmin, xmax, n=8, m=None):
if m is None: # not m was considering 0 as None
m = n
assert n >= m
if n == 0 and m == 0:
return ((xmax - xmin) / 2.0) * (f(xmin) + f(xmax))
elif m == 0:
h = (xmax - xmin) / float(2**n)
N = (2**(n - 1)) + 1
term = math.fsum(f(xmin + ((2 * k) - 1) * h) for k in range(1, N))
return (term * h) + (0.5) * romberg_rec(f, xmin, xmax, n - 1, 0)
else:
return (1.0 / ((4**m) - 1)) * ((4**m) * romberg_rec(f, xmin, xmax, n, m - 1) - romberg_rec(f, xmin, xmax, n - 1, m - 1))
romberg_rec(f, a, b, n=0) # really not accurate!
romberg_rec(f, a, b, n=1) # alreay pretty good!
romberg_rec(f, a, b, n=2)
romberg_rec(f, a, b, n=3)
romberg_rec(f, a, b, n=8) # Almost the exact value.
romberg_rec(f, a, b, n=10) # Almost the exact value.
romberg_rec(f, a, b, n=12) # Almost the exact value.
404122.6272072803
372300.32065714896
373621.9973380798
373650.4784348298
409937.6105242601
409937.57869815244
409937.57869797904
It converges quite quickly to the "true" value as given by scipy.integrate.quad
.
See https://mec-cs101-integrals.readthedocs.io/en/latest/_modules/integrals.html#romberg for the code and https://mec-cs101-integrals.readthedocs.io/en/latest/integrals.html#integrals.romberg for the doc.
It is not hard to make this function non-recursive, by storing the intermediate results.
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = {(0, 0): 0.5 * (xmax - xmin) * (f(xmax) + f(xmin))}
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / float(2**i)
xsamples = [xmin + ((2 * k - 1) * h_i) for k in range(1, 1 + 2**(i - 1))]
r[(i, 0)] = (0.5 * r[(i - 1, 0)]) + h_i * math.fsum(f(x) for x in xsamples)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
try:
r[(i, j)] = (((4**j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / float((4**j) - 1)
except:
raise ValueError("romberg() with n = {}, m = {} and i = {}, j = {} was an error.".format(n, m, i, j))
return r[(n, m)]
romberg(f, a, b, n=0) # really not accurate!
romberg(f, a, b, n=1) # alreay pretty good!
romberg(f, a, b, n=2)
romberg(f, a, b, n=3)
romberg(f, a, b, n=8) # Almost the exact value.
romberg(f, a, b, n=10) # Almost the exact value.
romberg(f, a, b, n=12) # Almost the exact value.
404122.6272072803
372300.32065714896
373621.99733807985
373650.47843482986
409937.61052426015
409937.5786981525
409937.5786979791
It converges quite quickly to the "true" value as given by scipy.integrate.quad
.
Instead of using a dictionary, which gets filled up dynamically (and so, slowly), let us use a numpy arrays, as we already know the size of the array we need ($n+1 \times m+1$).
Note that only half of the array is used, so we could try to use sparse matrices maybe, for triangular matrices? From what I know, it is not worth it, both in term of memory information (if the sparsity measure is $\simeq 1/2$, you don't win anything from LIL or other sparse matrices representation...
We could use numpy.tri
but this uses a dense array so nope.
import numpy as np
def romberg_better(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
romberg_better(f, a, b, n=0) # really not accurate!
romberg_better(f, a, b, n=1) # alreay pretty good!
romberg_better(f, a, b, n=2)
romberg_better(f, a, b, n=3)
romberg_better(f, a, b, n=8) # Almost the exact value.
romberg_better(f, a, b, n=10) # Almost the exact value.
romberg_better(f, a, b, n=12) # Almost the exact value.
404122.62720728031
372300.32065714896
373621.99733807985
373650.47843482986
409937.61052426015
409937.5786981525
409937.5786979791
It converges quite quickly to the "true" value as given by scipy.integrate.quad
.
%timeit quad(f, a, b)
390 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit romberg_rec(f, a, b, n=10)
52.8 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit romberg(f, a, b, n=10)
819 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit romberg_better(f, a, b, n=10)
844 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
We already see that the recursive version is much slower than the dynamic programming one!
But there is not much difference between the one using dictionary (romberg()
) and the one using a numpy array of a known size (romberg_better()
).
%%time
import numpy as np
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg(f, a, b)
CPU times: user 24.1 s, sys: 0 ns, total: 24.1 s Wall time: 24.1 s
And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)
%%time
%%pypy
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = [[0 for _ in range(n+1)] for _ in range(m+1)]
r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)
return r[n][m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg_pypy(f, a, b)
CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 7.09 s
This version uses the improved memoization trick (no dictionary), but uses nested lists and not numpy arrays, I didn't bother to install numpy on my Pypy installation (even thought it should be possible).
from numba import jit
@jit
def romberg_numba(f, xmin, xmax, n=8):
assert xmin <= xmax
m = n
# First value:
r = {(0, 0): 0.5 * (xmax - xmin) * (f(xmax) + f(xmin))}
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / float(2**i)
xsamples = [xmin + ((2 * k - 1) * h_i) for k in range(1, 1 + 2**(i - 1))]
r[(i, 0)] = (0.5 * r[(i - 1, 0)]) + h_i * math.fsum(f(x) for x in xsamples)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
try:
r[(i, j)] = (((4**j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / float((4**j) - 1)
except:
raise ValueError("romberg() with n = {}, m = {} and i = {}, j = {} was an error.".format(n, m, i, j))
return r[(n, m)]
romberg_numba(f, a, b, n=8) # Almost the exact value.
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-18-b4426dd668a3> in <module>() ----> 1 romberg_numba(f, a, b, n=8) # Almost the exact value. /usr/local/lib/python3.5/dist-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws) 284 argtypes.append(self.typeof_pyval(a)) 285 try: --> 286 return self.compile(tuple(argtypes)) 287 except errors.TypingError as e: 288 # Intercept typing error that may be due to an argument /usr/local/lib/python3.5/dist-packages/numba/dispatcher.py in compile(self, sig) 530 531 self._cache_misses[sig] += 1 --> 532 cres = self._compiler.compile(args, return_type) 533 self.add_overload(cres) 534 self._cache.save_overload(sig, cres) /usr/local/lib/python3.5/dist-packages/numba/dispatcher.py in compile(self, args, return_type) 79 impl, 80 args=args, return_type=return_type, ---> 81 flags=flags, locals=self.locals) 82 # Check typing error if object mode is used 83 if cres.typing_error is not None and not flags.enable_pyobject: /usr/local/lib/python3.5/dist-packages/numba/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library) 691 pipeline = Pipeline(typingctx, targetctx, library, 692 args, return_type, flags, locals) --> 693 return pipeline.compile_extra(func) 694 695 /usr/local/lib/python3.5/dist-packages/numba/compiler.py in compile_extra(self, func) 348 self.lifted = () 349 self.lifted_from = None --> 350 return self._compile_bytecode() 351 352 def compile_ir(self, func_ir, lifted=(), lifted_from=None): /usr/local/lib/python3.5/dist-packages/numba/compiler.py in _compile_bytecode(self) 656 """ 657 assert self.func_ir is None --> 658 return self._compile_core() 659 660 def _compile_ir(self): /usr/local/lib/python3.5/dist-packages/numba/compiler.py in _compile_core(self) 643 644 pm.finalize() --> 645 res = pm.run(self.status) 646 if res is not None: 647 # Early pipeline completion /usr/local/lib/python3.5/dist-packages/numba/compiler.py in run(self, status) 234 # No more fallback pipelines? 235 if is_final_pipeline: --> 236 raise patched_exception 237 # Go to next fallback pipeline 238 else: /usr/local/lib/python3.5/dist-packages/numba/compiler.py in run(self, status) 226 try: 227 event(stage_name) --> 228 stage() 229 except _EarlyPipelineCompletion as e: 230 return e.result /usr/local/lib/python3.5/dist-packages/numba/compiler.py in stage_analyze_bytecode(self) 362 Analyze bytecode and translating to Numba IR 363 """ --> 364 func_ir = translate_stage(self.func_id, self.bc) 365 self._set_and_check_ir(func_ir) 366 /usr/local/lib/python3.5/dist-packages/numba/compiler.py in translate_stage(func_id, bytecode) 754 def translate_stage(func_id, bytecode): 755 interp = interpreter.Interpreter(func_id) --> 756 return interp.interpret(bytecode) 757 758 /usr/local/lib/python3.5/dist-packages/numba/interpreter.py in interpret(self, bytecode) 89 # Control flow analysis 90 self.cfa = controlflow.ControlFlowAnalysis(bytecode) ---> 91 self.cfa.run() 92 if config.DUMP_CFG: 93 self.cfa.dump() /usr/local/lib/python3.5/dist-packages/numba/controlflow.py in run(self) 513 fn(inst) 514 else: --> 515 assert not inst.is_jump, inst 516 517 # Close all blocks AssertionError: Failed at object (analyzing bytecode) SETUP_EXCEPT(arg=82, lineno=17)
It fails! Almost as always when trying Numba, it fails cryptically, too bad. I don't want to spend time debugging this.
Thanks to this page for a nice and short introduction to Julia.
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
a = 1993
b = 2017
function romberg_julia(f, xmin, xmax; n=8)
m = n
# First value:
r = Dict()
r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
end
end
r[(n, m)]
end
println(romberg_julia(f, a, b, n=0)) # really not accurate!
println(romberg_julia(f, a, b, n=1)) # alreay pretty good!
println(romberg_julia(f, a, b, n=2))
println(romberg_julia(f, a, b, n=3))
println(romberg_julia(f, a, b, n=8)) # Almost the exact value.
println(romberg_julia(f, a, b, n=10)) # Almost the exact value.
println(romberg_julia(f, a, b, n=12)) # Almost the exact value.
f (generic function with 1 method) 1993 2017 romberg_julia (generic function with 1 method) 404122.6272072803 372300.32065714896 373621.99733807985 373650.47843482986 409937.6105242601 409937.57869815256 409937.5786979804
It seems to work well, like the Python implementation. We get the same numerical result:
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
a, b = 1993, 2017
quad(f, a, b)
romberg(f, a, b, n=12)
(409937.57869797916, 8.482168070998719e-05)
409937.5786979791
Let try a less naive version using a fixed-size array instead of a dictionary. (as we did before for the Python version)
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
a = 1993
b = 2017
function romberg_julia_better(f, xmin, xmax; n=8)
m = n
# First value:
r = zeros((n+1, m+1)) # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
end
end
r[n + 1, m + 1]
end
println(romberg_julia_better(f, a, b, n=0)) # really not accurate!
println(romberg_julia_better(f, a, b, n=1)) # alreay pretty good!
println(romberg_julia_better(f, a, b, n=2))
println(romberg_julia_better(f, a, b, n=3))
println(romberg_julia_better(f, a, b, n=8)) # Almost the exact value.
println(romberg_julia_better(f, a, b, n=10)) # Almost the exact value.
println(romberg_julia_better(f, a, b, n=12)) # Almost the exact value.
f (generic function with 1 method) 1993 2017 romberg_julia_better (generic function with 1 method) 404122.6272072803 372300.32065714896 373621.99733807985 373650.47843482986 409937.6105242601 409937.57869815256 409937.5786979804
First with Python:
%%time
import numpy as np
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg(f, a, b)
CPU times: user 25.6 s, sys: 0 ns, total: 25.6 s Wall time: 25.6 s
And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)
%%time
%%pypy
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = [[0 for _ in range(n+1)] for _ in range(m+1)]
r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)
return r[n][m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg_pypy(f, a, b)
CPU times: user 4 ms, sys: 4 ms, total: 8 ms Wall time: 7.12 s
And finally with Julia:
%%time
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
function romberg_julia(f, xmin, xmax; n=8)
m = n
# First value:
r = Dict()
r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
end
end
r[(n, m)]
end
for _ in 1:100000
a = rand(-2000:2000)
b = a + rand(0:100)
romberg_julia(f, a, b)
end
f (generic function with 1 method) romberg_julia (generic function with 1 method) CPU times: user 4 ms, sys: 4 ms, total: 8 ms Wall time: 8.1 s
On this first test, it doesn't look faster than Pypy... But what if we use the improved version, with an array instead of dictionary?
%%time
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
function romberg_julia_better(f, xmin, xmax; n=8)
m = n
# First value:
r = zeros((n+1, m+1)) # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
end
end
r[n + 1, m + 1]
end
for _ in 1:100000
a = rand(-2000:2000)
b = a + rand(0:100)
romberg_julia_better(f, a, b)
end
f (generic function with 1 method) romberg_julia_better (generic function with 1 method) CPU times: user 4 ms, sys: 4 ms, total: 8 ms Wall time: 2.7 s
Oh, this time it finally seems faster. Really faster? Yes, about 3 to 4 time faster than Pypy.
Remark also that this last cells compared by using the magic %%pypy
and %%script julia
, so they both need a warm-up time (opening the pipe, the sub-process, initializing the JIT compiler etc).
But it is fair to compare Pypy to Julia this way.
Let try the same numerical algorithm but with a different integrand function.
$$\int_{0}^{1} \exp(-x^2) \mathrm{d}x \approx 0.842700792949715$$First with Python:
%%time
import numpy as np
import math
import random
f = lambda x: (2.0 / math.sqrt(math.pi)) * math.exp(-x**2)
# Same code
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
for _ in range(100000):
a = 0
b = 1
romberg(f, a, b)
print(romberg(f, a, b))
0.84270079295 CPU times: user 20.7 s, sys: 8 ms, total: 20.8 s Wall time: 20.8 s
And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)
%%time
%%pypy
import math
import random
f = lambda x: (2.0 / math.sqrt(math.pi)) * math.exp(-x**2)
# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = [[0 for _ in range(n+1)] for _ in range(m+1)]
r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)
return r[n][m]
for _ in range(100000):
a = 0
b = 1
romberg_pypy(f, a, b)
print(romberg_pypy(f, a, b))
0.84270079295 CPU times: user 8 ms, sys: 0 ns, total: 8 ms Wall time: 6.35 s
And finally with Julia:
%%time
%%script julia
function f(x)
(2.0 / sqrt(pi)) * exp(-x^2)
end
function romberg_julia(f, xmin, xmax; n=8)
m = n
# First value:
r = Dict()
r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
end
end
r[(n, m)]
end
for _ in 1:100000
a = 0
b = 1
romberg_julia(f, a, b)
end
println(romberg_julia(f, 0, 1))
f (generic function with 1 method) romberg_julia (generic function with 1 method) 0.8427007929497148 CPU times: user 4 ms, sys: 4 ms, total: 8 ms Wall time: 7.19 s
Still not faster than Pypy... So what is the goal of Julia?
%%time
%%script julia
function f(x)
(2.0 / sqrt(pi)) * exp(-x^2)
end
function romberg_julia_better(f, xmin, xmax; n=8)
m = n
# First value:
r = zeros((n+1, m+1)) # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
end
end
r[n + 1, m + 1]
end
for _ in 1:100000
a = 0
b = 1
romberg_julia_better(f, a, b)
end
println(romberg_julia_better(f, 0, 1))
f (generic function with 1 method) romberg_julia_better (generic function with 1 method) 0.8427007929497148 CPU times: user 8 ms, sys: 0 ns, total: 8 ms Wall time: 2.42 s
This is also faster than Pypy, but not that much...
$\implies$ On this (baby) example of a real world numerical algorithms, tested on thousands of random inputs or on thousands time the same input, the speed-up is in favor of Julia, but it doesn't seem impressive enough to make me want to use it (at least for now).
If I have to use 1-based indexing and a slightly different language, just to maybe gain a speed-up of 2 to 3 (compared to Pypy) or even a 10x speed-up compare to naive Python, why bother?
Of course, this was a baby benchmark, on a small algorithm, and probably wrongly implemented in both Python and Julia.
But still, I am surprised to see that the naive Julia version was slower than the naive Python version executed with Pypy... For the less naive version (without dictionary), the Julia version was about 2 to 3 times faster than the Python version with Pypy.