Lecture 2: Code specialization

In lecture 1, we learned how the semantics of types affect performance — it was not possible to implement a C-speed sum function, for example, unless (a) the element type is attached to the array as a whole so that (b) the element data can be stored consecutively in memory and (b) the computional code can be specialized at compile-time for + on this element type.

For code to be fast, however, it must also be possible to take advantage of this information in your own code. This is obviously possible in statically typed languages like C, which are compiled to machine instructions ahead of time with type information declared explicitly to the compiler. It is more subtle in a language like Julia, where the programmer does not usually explicitly declare type information.

In particular, we will talk about how Julia achieves both fast code and type-generic code (e.g. a sum function that works on any container of any type supporting +, even user-defined types), by aggressive automated code specialization by the Julia compiler.

A type-generic sum function

Let's again look at our simple hand-written sum function.

In [1]:
function mysum(a)
    s = zero(eltype(a))
    for x in a
        s += x
    end
    return s
end
Out[1]:
mysum (generic function with 1 method)
In [2]:
a = rand(10^7)
mysum(a)
Out[2]:
5.0008825087619815e6
In [3]:
mysum(a)  sum(a)
Out[3]:
true
In [4]:
using BenchmarkTools
@btime sum($a)
@btime mysum($a)
  3.908 ms (0 allocations: 0 bytes)
  12.330 ms (0 allocations: 0 bytes)
Out[4]:
5.0008825087619815e6

Hooray! Basically the same speed as Julia's built-in sum function and numpy.sum! And it only required 7 lines of code, some care with types, and a very minor bit of wizardry with the @simd tag to get the last factor of two.

Moreover, the code is still type generic: it can sum any container of any type that works with addition. For example, it works for complex numbers, which are about two times slower as you might expect (since each complex addition requires two real additions):

In [5]:
z = rand(Complex{Float64}, length(a));
@btime mysum($z)
  12.041 ms (0 allocations: 0 bytes)
Out[5]:
4.997905404810494e6 + 4.999604643441703e6im

And we didn't have to declare any types of any arguments or variables; the compiler figured everything out. How?

In [6]:
s = Set([2, 17, 6 , 24])
Out[6]:
Set([2, 17, 6, 24])
In [7]:
typeof(s)
Out[7]:
Set{Int64}
In [8]:
2 in s, 13 in s
Out[8]:
(true, false)
In [9]:
mysum(s)
Out[9]:
49

Type inference and specialization

To go any further, you need to understand something very basic about how Julia works. Suppose we define a very simple function:

In [10]:
f(x) = x + 1
Out[10]:
f (generic function with 1 method)

We didn't declare the type of x, and so our function f(x) will work with any type of x (as long as the + 1 operation is defined for that type):

In [11]:
f(3) # x is an integer (technically, a 64-bit integer)
Out[11]:
4
In [12]:
f(3.1) # x is a floating-point value (Float64)
Out[12]:
4.1
In [13]:
f([1,2,3]) # x is an array of integers
MethodError: no method matching +(::Array{Int64,1}, ::Int64)
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
  +(!Matched::Complex{Bool}, ::Real) at complex.jl:292
  +(!Matched::Missing, ::Number) at missing.jl:93
  ...

Stacktrace:
 [1] f(::Array{Int64,1}) at ./In[10]:1
 [2] top-level scope at In[13]:1
In [14]:
f("hello")
MethodError: no method matching +(::String, ::Int64)
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
  +(!Matched::Complex{Bool}, ::Real) at complex.jl:292
  +(!Matched::Missing, ::Number) at missing.jl:93
  ...

Stacktrace:
 [1] f(::String) at ./In[10]:1
 [2] top-level scope at In[14]:1

How can a function like f(x) work for any type? In Python, x would be a "box" that could contain anything, and it would then look up at runtime how to compute x + 1. But we just saw that untyped Julia sum code could be fast.

The secret is just-in-time (JIT) compilation. The first time you call f(x) with a new type of argument x, it will compile a new version of f specialized for that type. The next time it calls f(x) with the same argument type

So, right now, after evaluating the above code, we have three versions of f compiled and sitting in memory: one for x of type Int (we say x::Int in Julia), one for x::Float64, and one for x::Vector{Int}.

We can even see what the compiled code for f(x::Int) looks like, either the compiler (LLVM) bytecode or the low-level (below C!) assembly code:

In [15]:
@code_llvm f(1)
; Function f
; Location: In[10]:1
define i64 @julia_f_35737(i64) {
top:
; Function +; {
; Location: int.jl:53
  %1 = add i64 %0, 1
;}
  ret i64 %1
}
In [16]:
@code_native f(1)
	.section	__TEXT,__text,regular,pure_instructions
; Function f {
; Location: In[10]:1
; Function +; {
; Location: In[10]:1
	decl	%eax
	leal	1(%edi), %eax
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

Let's break this down. When you tell Julia's compiler that x is an Int, it:

  • It knows x fits into a 64-bit CPU register (and is passed to the function via a register).

  • Looks at x + 1. Since x and 1 are both Int, it knows it should call the + function for two Int values. This corresponds to one machine instruction leaq to add two 64-bit registers.

  • Since the + function here is so simple, it won't bother to do a function call. It will inline the (+)(Int,Int) function into the compiled f(x) code.

  • Since it now knows what + function it is calling, it knows that the result of the + is also an Int, and it can return it via register.

This process works recursively if we define a new function g(x) that calls f(x):

In [17]:
g(x) = f(x) * 2
g(1)
Out[17]:
4
In [18]:
@code_llvm g(1)
; Function g
; Location: In[17]:1
define i64 @julia_g_35851(i64) {
top:
; Function *; {
; Location: int.jl:54
  %1 = shl i64 %0, 1
  %2 = add i64 %1, 2
;}
  ret i64 %2
}

When it specialized g for x::Int, it not only figured out what f function to call, it not only inlined f(x) into g, but the compiler was smart enough to combine the two additions into a single addition x + 4.

In [19]:
h(x) = g(x) * 2
@code_llvm h(1)
; Function h
; Location: In[19]:1
define i64 @julia_h_35853(i64) {
top:
; Function g; {
; Location: In[17]:1
; Function *; {
; Location: int.jl:54
  %1 = shl i64 %0, 2
;}}
; Function *; {
; Location: int.jl:54
  %2 = add i64 %1, 4
;}
  ret i64 %2
}

Julia's type inference is smart enough that it can figure out the return type even for recursive functions:

In [20]:
fib(n::Integer) = n < 3 ? 1 : fib(n-1) + fib(n-2)
Out[20]:
fib (generic function with 1 method)
In [21]:
[fib(n) for n = 1:10]
Out[21]:
10-element Array{Int64,1}:
  1
  1
  2
  3
  5
  8
 13
 21
 34
 55
In [22]:
fib.(1:10)
Out[22]:
10-element Array{Int64,1}:
  1
  1
  2
  3
  5
  8
 13
 21
 34
 55
In [23]:
@code_warntype fib(1)
Body::Int64
│╻  <1 1 ─ %1  = (Base.slt_int)(n, 3)::Bool
  └──       goto #3 if not %1
  2 ─       return 1
│╻  -  3 ─ %4  = (Base.sub_int)(n, 1)::Int64
││╻  <  │   %5  = (Base.slt_int)(%4, 3)::Bool
││   └──       goto #5 if not %5
││   4 ─       goto #6
││╻  -  5 ─ %8  = (Base.sub_int)(%4, 1)::Int64
││   │   %9  = invoke Main.fib(%8::Int64)::Int64
││╻  -  │   %10 = (Base.sub_int)(%4, 2)::Int64
││   │   %11 = invoke Main.fib(%10::Int64)::Int64
││╻  +  │   %12 = (Base.add_int)(%9, %11)::Int64
││   └──       goto #6
  6 ┄ %14 = φ (#4 => 1, #5 => %12)::Int64
│╻  -  │   %15 = (Base.sub_int)(n, 2)::Int64
││╻  <  │   %16 = (Base.slt_int)(%15, 3)::Bool
││   └──       goto #8 if not %16
││   7 ─       goto #9
││╻  -  8 ─ %19 = (Base.sub_int)(%15, 1)::Int64
││   │   %20 = invoke Main.fib(%19::Int64)::Int64
││╻  -  │   %21 = (Base.sub_int)(%15, 2)::Int64
││   │   %22 = invoke Main.fib(%21::Int64)::Int64
││╻  +  │   %23 = (Base.add_int)(%20, %22)::Int64
││   └──       goto #9
  9 ┄ %25 = φ (#7 => 1, #8 => %23)::Int64
│╻  +  │   %26 = (Base.add_int)(%14, %25)::Int64
  └──       return %26

Dispatch on the argument type

It is often useful to declare the argument types in Julia. For example, above, we define fib(n::Integer). This says that the argument must be some type of integer. If we give it a different number type, it will now give an error:

In [24]:
fib(3.7)
MethodError: no method matching fib(::Float64)
Closest candidates are:
  fib(!Matched::Integer) at In[20]:1

Stacktrace:
 [1] top-level scope at In[24]:1

Integer is an "abstract" type in Julia. There are many subtypes of Integer in Julia, including Int64 (64-bit signed integers), UInt8 (8-bit unsigned integers), and BigInt (arbitrary-precision integers: the number of digits grows as the numbers get larger and larger).

Declaring the argument type has no effect on performance in Julia: the compiler automatically specializes the function when it is called, even if we declare no type at all. There are three main reasons to declare an argument type in Julia:

  • Clarity: Declaring the argument type can help readers understand the code. (However, over-specifying the type may make the function less general than it has to be!)

  • Correctness: Our fib function above would have given some answer if we allowed the user to pass 3.7, but it probably wouldn't be the intended answer.

  • Dispatch: Julia allows you to define different versions of a function (different methods) for different argument types.

For example, let's use this to define two versions of a factorial function myfact, one for integers that recursively multiplies its arguments:

In [25]:
function myfact(n::Integer) 
    n < 0 && throw(DomainError("n must be positive"))
    return n < 2 ? one(n) : n * myfact(n-1)
end
Out[25]:
myfact (generic function with 1 method)
In [26]:
myfact(10)
Out[26]:
3628800

You need BigInt for factorials of larger arguments, since factorials grow rapidly (faster than exponentially with n):

In [27]:
myfact(BigInt(100))
Out[27]:
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

Since we said n::Integer, this will give an error for a floating-point argument:

In [28]:
myfact(3.7)
MethodError: no method matching myfact(::Float64)
Closest candidates are:
  myfact(!Matched::Integer) at In[25]:2

Stacktrace:
 [1] top-level scope at In[28]:1

However, it turns out that there is a very natural extension of the factorial to arbitrary real and complex numbers, via the gamma function: $n! = \Gamma(n+1)$. Julia has a built-in gamma(x) function, so we can define myfact for other number types based on that:

In [29]:
using SpecialFunctions
myfact(x::Number) = gamma(x+1)
Out[29]:
myfact (generic function with 2 methods)
In [30]:
myfact(5.0)
Out[30]:
120.0

The "factorial" of $-\frac{1}{2}$ is then $\sqrt{\pi}$:

In [31]:
myfact(-0.5)^2
Out[31]:
3.1415926535897936

Now there are two different methods of myfact. You can get a list of the methods of any function by calling methods:

In [32]:
methods(myfact)
Out[32]:
2 methods for generic function myfact:
  • myfact(n::Integer) in Main at In[25]:2
  • myfact(x::Number) in Main at In[29]:2

Multiple dispatch

A key property of Julia is that is based around the principle of multiple dispatch: when you call a function f(arguments...) in Julia, it picks the most specific method of f based on the types of all of the arguments.

This can be thought of as a generalization of object-oriented programming (OOP). In an OOP language like Python, you would typically write object.method(arguments...): based on the type of object, it will decide which version of method to call (it "dispatches" to the correct method). The Julia analogue is method(object, arguments...). But whereas an OOP language would only look at the type of object (single dispatch), Julia looks at both the types of object and the other arguments (multiple dispatch).

Multiple dispatch is very natural for talking about mathematical operations like a + b, which is just a call to the function + in Julia. In an OOP language like C++, a + b decides what + function to call based on the type of a, which is rather weird: the function is "owned" by its first argument. In Julia, it looks at both arguments, and there are in fact as huge number of + functions for different argument types:

In [33]:
methods(+)
Out[33]:
163 methods for generic function +:

You can use the @which method to see which method is called for a specific set of arguments:

In [34]:
@which (+)(3,4)
Out[34]:
+{T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}}(x::T, y::T) in Base at int.jl:53

Here is a somewhat artificial example where we define three methods for f(x,y) depending on the argument types, and we can see how Julia picks which one to call:

In [35]:
f(x,y) = 3
f(x::Integer,y::Integer) = 4
f(x::Number,y::Integer) = 5
Out[35]:
f (generic function with 4 methods)
In [36]:
f(3,4), f(3.5,4), f(4,3.5)
Out[36]:
(4, 5, 3)

One nice thing about Julia's "verb-centric" multiple-dispatch approach (as opposed to OOP's "noun-centric" approach) is that we can add new methods and functions to existing types (unlike OOP where you need to create a subclass to add new methods).

For example, string concatenation in Julia is performed by the * operator:

In [37]:
"3" * "7"
Out[37]:
"37"

The + operator is not defined for strings:

In [38]:
"3" + "7"
MethodError: no method matching +(::String, ::String)
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502

Stacktrace:
 [1] top-level scope at In[38]:1

But we can define it if we want! Here, instead of defining our own function like myfact, we are adding a new method of an existing function that was defined in Base (Julia's standard library). So, we need to explicitly tell Julia that we are extending Base.+, not defining a new function that happens to be called +:

In [39]:
Base.:+(a::AbstractString, b::AbstractString) = a * " + " * b
In [40]:
"3" + "7"
Out[40]:
"3 + 7"

Now that we have defined +, we can even sum arrays of strings, because sum works for anything that defines + (and zero for summing empty arrays):

In [41]:
sum(["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."])
Out[41]:
"The + quick + brown + fox + jumps + over + the + lazy + dog."

Defeating type inference: Type instabilities

To get good performance, there are some fairly simple rules that you need to follow in Julia code to avoid defeating the compiler's type inference. See also the performance tips section of the Julia manual.

Three of the most important are:

  • Don't use (non-constant) global variables in critical code — put your critical code into a function (this is good advice anyway, from a software-engineering standpoint). The compiler assumes that a global variable can change type at any time, so it is always stored in a "box", and "taints" anything that depends on it.

  • Local variables should be "type-stable": don't change the type of a variable inside a function. Use a new variable instead.

  • Functions should be "type-stable": a function's return type should only depend on the argument types, not on the argument values.

To diagnose all of these problems, the @code_warntype macro that we used above is your friend. If it labels any variables (or the function's return value) as Any or Union{...}, it means that the compiler couldn't figure out a precise type.

The third point, type-stability of functions, leads to lots of important but subtle choices in library design. For example, consider the (built-in) sqrt(x) function, which computes $\sqrt{x}$:

In [42]:
sqrt(4)
Out[42]:
2.0

You might think that sqrt(-1) should return $i$ (or im, in Julia syntax). (Matlab's sqrt function does this.) Instead, we get:

In [43]:
sqrt(-1)
DomainError with -1.0:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).

Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at ./math.jl:31
 [2] sqrt at ./math.jl:492 [inlined]
 [3] sqrt(::Int64) at ./math.jl:518
 [4] top-level scope at In[43]:1
In [44]:
sqrt(-1 + 0im)
Out[44]:
0.0 + 1.0im

Why did Julia implement sqrt in this silly way, throwing an error for negative arguments unless you add a zero imaginary part? Any reasonable person wants an imaginary result from sqrt(-1), surely?

The problem is that defining sqrt to return an imaginary result from sqrt(-1) would not be type stable: sqrt(x) would return a real result for non-negative real x, and a complex result for negative real x, so the return type would depend on the value of x and not just its type.

That would defeat type inference, not just for the sqrt function, but for anything the sqrt function touches. Unless the compiler can somehow figure out x ≥ 0, it will have to either store the result in a "box" or compile two branches of the result. Let's see how that works by defining our own square-root function:

In [45]:
mysqrt(x::Complex) = sqrt(x)
mysqrt(x::Real) = x < 0 ? sqrt(complex(x)) : sqrt(x)
Out[45]:
mysqrt (generic function with 2 methods)

This definition is an example of Julia's multiple dispatch style, which in some sense is a generalization of object-oriented programming but focuses on "verbs" (functions) rather than nouns. We will discuss this more in a later lecture.

The ::Complex and ::Real are argument-type declarations. Such declarations are not related to performance, but instead act as a "filter" to allow us to have one version of mysqrt for complex arguments and another for real arguments.

In [46]:
mysqrt(2)
Out[46]:
1.4142135623730951
In [47]:
mysqrt(-2)
Out[47]:
0.0 + 1.4142135623730951im
In [48]:
mysqrt(-2+0im)
Out[48]:
0.0 + 1.4142135623730951im

Looks great, right? But let's see what happens to type inference in a function that calls mysqrt instead of sqrt:

In [49]:
slowfun(x) = mysqrt(x) + 1
@code_warntype slowfun(2)
Body::Union{Complex{Float64}, Float64}
│╻╷     mysqrt1 1 ── %1  = (Base.slt_int)(x, 0)::Bool
││       └───       goto #3 if not %1
││╻╷╷╷   sqrt  2 ── %3  = (Base.sitofp)(Float64, x)::Float64
│││╻╷     float  │    %4  = (Base.sitofp)(Float64, 0)::Float64
││││╻      Type  │    %5  = %new(Complex{Float64}, %3, %4)::Complex{Float64}
│││      │    %6  = invoke Base.sqrt(%5::Complex{Float64})::Complex{Float64}
││       └───       goto #8
│││╻╷╷    float  3 ── %8  = (Base.sitofp)(Float64, x)::Float64
││││╻      <  │    %9  = (Base.lt_float)(%8, 0.0)::Bool
││││     └───       goto #5 if not %9
││││     4 ──       invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %8::Float64)
││││     └───       $(Expr(:unreachable))
││││     5 ── %13 = (Base.Math.sqrt_llvm)(%8)::Float64
││││     └───       goto #6
│││      6 ──       goto #7
││       7 ──       goto #8
  8 ┄─ %17 = φ (#2 => %6, #7 => %13)::Union{Complex{Float64}, Float64}
  │    %18 = (isa)(%17, Complex{Float64})::Bool
  └───       goto #10 if not %18
  9 ── %20 = π (%17, Complex{Float64})
││╻╷     real  │    %21 = (Base.getfield)(%20, :re)::Float64
│││╻╷╷╷   promote  │    %22 = (Base.sitofp)(Float64, 1)::Float64
│││╻      +  │    %23 = (Base.add_float)(%22, %21)::Float64
│││╻      getproperty  │    %24 = (Base.getfield)(%20, :im)::Float64
│││╻      Type  │    %25 = %new(Complex{Float64}, %23, %24)::Complex{Float64}
  └───       goto #13
  10 ─ %27 = (isa)(%17, Float64)::Bool
  └───       goto #12 if not %27
  11 ─ %29 = π (%17, Float64)
││╻╷╷╷   promote  │    %30 = (Base.sitofp)(Float64, 1)::Float64
││╻      +  │    %31 = (Base.add_float)(%29, %30)::Float64
  └───       goto #13
  12 ─       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))
  └───       $(Expr(:unreachable))
  13 ┄ %35 = φ (#9 => %25, #11 => %31)::Union{Complex{Float64}, Float64}
  └───       return %35

Because the compiler doesn't know at compile-time that x is positive (at compile-time it uses only types, not values, it doesn't know whether the result is real (Float64) or complex (Complex{Float64}) and has to store it in a "box". This kills performance.

Defining our own types

Let's define our own type to represent a "point" in two dimensions. Each point will have an $(x,y)$ location. So that we can use the points with our sum functions above, we'll also define + and zero functions to do the obvious vector addition.

One such definition in Julia is:

In [50]:
mutable struct Point1
    x
    y
end
Base.:+(p::Point1, q::Point1) = Point1(p.x + q.x, p.y + q.y)
Base.zero(::Type{Point1}) = Point1(0,0)

Point1(3,4)
Out[50]:
Point1(3, 4)
In [51]:
Point1(3,4) + Point1(5,6)
Out[51]:
Point1(8, 10)

Our type is very generic, and can hold any type of x and y values:

In [52]:
Point1(3.7, 4+5im)
Out[52]:
Point1(3.7, 4 + 5im)

Perhaps too generic:

In [53]:
Point1("x", [3,4,5])
Out[53]:
Point1("x", [3, 4, 5])

Since x and y can be anything, they must be pointers to "boxes". This is bad news for performance.

A mutable struct is mutable, which means we can create a Point1 object and then change x or y:

In [54]:
p = Point1(3,4)
p.x = 7
p
Out[54]:
Point1(7, 4)

This means that every reference to a Point1 object must be a pointer to an object stored elsewhere in memory, because how else would we "know" when an object changes? Furthermore, an array of Point1 objects must be an array of pointers (which is bad news for performance again):

In [55]:
P = [p,p,p]
Out[55]:
3-element Array{Point1,1}:
 Point1(7, 4)
 Point1(7, 4)
 Point1(7, 4)
In [56]:
p.y = 8
P
Out[56]:
3-element Array{Point1,1}:
 Point1(7, 8)
 Point1(7, 8)
 Point1(7, 8)

Let's test this out by creating an array of Point1 objects and summing it. Ideally, this would be about twice as slow as summing an equal-length array of numbers, since there are twice as many numbers to sum. But because of all of the boxes and pointer-chasing, it should be far slower.

To create the array, we'll call the Point1(x,y) constructor with our array a, using Julia's "dot-call" syntax that applies a function "element-wise" to arrays:

In [57]:
a1 = Point1.(a, a)
Out[57]:
10000000-element Array{Point1,1}:
 Point1(0.41611080115182997, 0.41611080115182997)  
 Point1(0.09121781540916163, 0.09121781540916163)  
 Point1(0.5821689193788118, 0.5821689193788118)    
 Point1(0.4450648623203812, 0.4450648623203812)    
 Point1(0.31025284561131117, 0.31025284561131117)  
 Point1(0.13470267547220938, 0.13470267547220938)  
 Point1(0.9686840088959152, 0.9686840088959152)    
 Point1(0.017240806807339526, 0.017240806807339526)
 Point1(0.03285701827401222, 0.03285701827401222)  
 Point1(0.8007999312473584, 0.8007999312473584)    
 Point1(0.6967238705169183, 0.6967238705169183)    
 Point1(0.6365055896997436, 0.6365055896997436)    
 Point1(0.8960119603293422, 0.8960119603293422)    
 ⋮                                                 
 Point1(0.33070791321634574, 0.33070791321634574)  
 Point1(0.6843509271585413, 0.6843509271585413)    
 Point1(0.5179481631937215, 0.5179481631937215)    
 Point1(0.26483888958030555, 0.26483888958030555)  
 Point1(0.7830762795414743, 0.7830762795414743)    
 Point1(0.9102152866041469, 0.9102152866041469)    
 Point1(0.5857308463542181, 0.5857308463542181)    
 Point1(0.192522005018664, 0.192522005018664)      
 Point1(0.7633399781271266, 0.7633399781271266)    
 Point1(0.7675359710514917, 0.7675359710514917)    
 Point1(0.6697973896081715, 0.6697973896081715)    
 Point1(0.9035572561313561, 0.9035572561313561)    
In [58]:
@btime sum($a1)
  503.527 ms (29999997 allocations: 610.35 MiB)
Out[58]:
Point1(5.000882508762066e6, 5.000882508762066e6)
In [59]:
@btime mysum($a1)
  733.217 ms (30000001 allocations: 610.35 MiB)
Out[59]:
Point1(5.0008825087619815e6, 5.0008825087619815e6)

The time is at least 50× slower than we would like, but consistent with our other timing results on "boxed" values from last lecture.

An imperfect solution: A concrete immutable type

We can avoid these two problems by:

  • Declare the types of x and y to be concrete types, so that they don't need to be pointers to boxes.
  • Declare our Point to be an immutable type (x and y cannot change), so that Julia is not forced to make every reference to a Point into a pointer: just struct, not mutable struct:
In [60]:
struct Point2
    x::Float64
    y::Float64
end
Base.:+(p::Point2, q::Point2) = Point2(p.x + q.x, p.y + q.y)
Base.zero(::Type{Point2}) = Point2(0.0,0.0)

Point2(3,4)
Out[60]:
Point2(3.0, 4.0)
In [61]:
Point2(3,4) + Point2(5,6)
Out[61]:
Point2(8.0, 10.0)
In [62]:
p = Point2(3,4)
P = [p,p,p]
Out[62]:
3-element Array{Point2,1}:
 Point2(3.0, 4.0)
 Point2(3.0, 4.0)
 Point2(3.0, 4.0)
In [63]:
p.x = 6 # gives an error since p is immutable
type Point2 is immutable

Stacktrace:
 [1] setproperty!(::Point2, ::Symbol, ::Int64) at ./sysimg.jl:19
 [2] top-level scope at In[63]:1

If this is working as we hope, then summation should be much faster:

In [64]:
a2 = Point2.(a,a)
@btime sum($a2)
  11.253 ms (0 allocations: 0 bytes)
Out[64]:
Point2(5.000882508762066e6, 5.000882508762066e6)

Now the time is only about 10ms, only slightly more than twice the cost of summing an array of individual numbers of the same length!

Unfortunately, we paid a big price for this performance: our Point2 type only works with a single numeric type (Float64), much like a C implementation.

The best of both worlds: Parameterized immutable types

How do we get a Point type that works for any type of x and y, but at the same time allows us to have an array of points that is concrete and homogeneous (every point in the array is forced to be the same type)? At first glance, this seems like a contradiction in terms.

The answer is not to define a single type, but rather to define a whole family of types that are parameterized by the type T of x and y. In computer science, this is known as parametric polymorphism. (An example of this can be found in C++ templates.)

In Julia, we will define such a family of types as follows:

In [69]:
struct Point3{T<:Real}
    x::T
    y::T
end
Base.:+(p::Point3, q::Point3) = Point3(p.x + q.x, p.y + q.y)
Base.zero(::Type{Point3{T}}) where {T} = Point3(zero(T),zero(T))

Point3(3,4)
Out[69]:
Point3{Int64}(3, 4)

Here, Point3 is actually a family of subtypes Point{T} for different types T. The notation <: in Julia means "is a subtype of", and hence T<:Real means that we are constraining T to be a Real type (a built-in abstract type in Julia that includes e.g. integers or floating point).

In [66]:
Point3(3,4) + Point3(5.6, 7.8)
Out[66]:
Point3{Float64}(8.6, 11.8)

Now, let's make an array:

In [67]:
a3 = Point3.(a,a)
Out[67]:
10000000-element Array{Point3{Float64},1}:
 Point3{Float64}(0.41611080115182997, 0.41611080115182997)  
 Point3{Float64}(0.09121781540916163, 0.09121781540916163)  
 Point3{Float64}(0.5821689193788118, 0.5821689193788118)    
 Point3{Float64}(0.4450648623203812, 0.4450648623203812)    
 Point3{Float64}(0.31025284561131117, 0.31025284561131117)  
 Point3{Float64}(0.13470267547220938, 0.13470267547220938)  
 Point3{Float64}(0.9686840088959152, 0.9686840088959152)    
 Point3{Float64}(0.017240806807339526, 0.017240806807339526)
 Point3{Float64}(0.03285701827401222, 0.03285701827401222)  
 Point3{Float64}(0.8007999312473584, 0.8007999312473584)    
 Point3{Float64}(0.6967238705169183, 0.6967238705169183)    
 Point3{Float64}(0.6365055896997436, 0.6365055896997436)    
 Point3{Float64}(0.8960119603293422, 0.8960119603293422)    
 ⋮                                                          
 Point3{Float64}(0.33070791321634574, 0.33070791321634574)  
 Point3{Float64}(0.6843509271585413, 0.6843509271585413)    
 Point3{Float64}(0.5179481631937215, 0.5179481631937215)    
 Point3{Float64}(0.26483888958030555, 0.26483888958030555)  
 Point3{Float64}(0.7830762795414743, 0.7830762795414743)    
 Point3{Float64}(0.9102152866041469, 0.9102152866041469)    
 Point3{Float64}(0.5857308463542181, 0.5857308463542181)    
 Point3{Float64}(0.192522005018664, 0.192522005018664)      
 Point3{Float64}(0.7633399781271266, 0.7633399781271266)    
 Point3{Float64}(0.7675359710514917, 0.7675359710514917)    
 Point3{Float64}(0.6697973896081715, 0.6697973896081715)    
 Point3{Float64}(0.9035572561313561, 0.9035572561313561)    

Note that the type of this array is Array{Point3{Float64},1} (we could equivalently write this as Vector{Point3{Float64}}, since Vector{T} is a synonym for Array{T,1}). You should learn a few things from this:

  • An Array{T,N} in Julia is itself a parameterized type, parameterized by the element type T and the dimensionality N.

  • Since the element type T is encoded in the Array{T,N} type, the element type does not need to be stored in each element. That means that the Array is free to store an array of "inlined" elements, rather than an array of pointers to boxes. (This is why Array{Float64,1} earlier could be stored in memory like a C double*.

  • It is still important that the element type be immutable, since an array of mutable elements would still need to be an array of pointers (so that it could "notice" if another reference to an element mutates it).

In [70]:
@btime sum($a3)
@btime mysum($a3)
  11.314 ms (0 allocations: 0 bytes)
  11.629 ms (0 allocations: 0 bytes)
Out[70]:
Point3{Float64}(5.0008825087619815e6, 5.0008825087619815e6)

Hooray! It is again only about 10ms, the same time as our completely concrete and inflexible Point2.