Notebook

Imperative vs functional programming style¶

In this notebook, the same task is implemented in three different programming languages using two different programming styles. The task in question is figuring out from a letter dataset all the authors who write more than 10 letters, and ordering them in decreasing order by their letter count.

Imperative¶

Imperative programming is a style where you go through items, gathering new information bit by bit into new data structures.

In [1]:

# Python
import pandas
dt = pandas.read_csv("letters.csv").to_dict(orient='records')
# First, create an author name to letter count dictionary
m = dict()
# Now, go through all the letters, increasing the by author counts each time for the relevant author
for row in dt:
  m[row['Author']] = m.get(row['Author'],0) + 1
# Then, create a second dictionary, and copy into that only the authors and counts where the counts are over 10
n = dict()
for key,value in m.items():
  if (value>10):
    n[key] = value
# Finally, sort the authors by decreasing counts
sorted(n.items(), key=lambda p: (p[1],p[0]), reverse=True)

Out[1]:

[('Maria Celeste Galilei', 49),
 ('Geri Bocchineri', 49),
 ('Mario Guiducci', 30),
 ('Francesco Niccolini', 16)]

In [2]:

# R
library(tidyverse)
dt <- read_csv("letters.csv",col_types = cols())
# First, create an author name to letter count dictionary
m <- numeric(0)
# Now, go through all the letters, increasing the by author counts each time for the relevant author
for (i in 1:nrow(dt)) { 
    row <- dt[i,]
    tmp <- m[row$Author]
    if (is.na(tmp)) tmp <- 0
    m[row$Author]=tmp+1
}
# Then, create a second dictionary, and copy into that only the authors and counts where the counts are over 10
n <- numeric(0)
for (i in names(m))
    if (m[i]>10) n[i] <- m[i]
# Finally, sort the authors by decreasing counts
sort(n, decreasing = TRUE)

Geri Bocchineri: 49
Maria Celeste Galilei: 49
Mario Guiducci: 30
Francesco Niccolini: 16

In [3]:

// Scala
import $ivy.`com.github.tototoshi::scala-csv:1.3.5`
import com.github.tototoshi.csv._
import java.io.File
import scala.collection.mutable.HashMap

// First, create an author name to letter count dictionary
var m = HashMap[String,Int]()
// Now, go through all the letters, increasing the by author counts each time for the relevant author
for (entry <- CSVReader.open(new File("letters.csv")).allWithHeaders)
  m(entry("Author")) = m.getOrElse(entry("Author"),0) + 1
// Then, create a second dictionary, and copy into that only the authors and counts where the counts are over 10
var n = HashMap[String,Int]()
for ((name,count) <- m)
  if (count>10) n(name) = count
// Finally, sort the authors by decreasing counts
println(n.toSeq.sortBy(_._2)(Ordering[Int].reverse))

ArrayBuffer((Geri Bocchineri,49), (Maria Celeste Galilei,49), (Mario Guiducci,30), (Francesco Niccolini,16))

Out[3]:

import $ivy.$                                      

import com.github.tototoshi.csv._

import java.io.File

import scala.collection.mutable.HashMap

// First, create an author name to letter count dictionary

m: HashMap[String, Int] = Map(
  "Gabriello Riccardi" -> 2,
  "Antonio Nardi" -> 5,
  "Raffaello Magiotti" -> 6,
  "Orazio Cavalcanti" -> 2,
  "Giovanni Ciampoli" -> 4,
  "Francesco Stelluti" -> 1,
  "Sebastiano Venier" -> 1,
  "Gio. Battista Doni" -> 1,
  "Bonaventura Calvalcanti" -> 1,
  "Raffaello Visconti" -> 1,
  "Mario Guiducci" -> 30,
  "Baldassarre Nardi" -> 1,
  "Francesco Niccolini" -> 16,
  "Gian Giacomo Bouchard" -> 1,
  "Benedetto Millini" -> 2,
  "Ascanio Piccolomini" -> 5,
  "Gio. Camillo Gloriosi" -> 1,
  "Carlo Rinuccini" -> 3,
  "Francesco Maria Fiorentini" -> 2,
  "Cassiano dal Pozzo" -> 1,
  "Filippo Magalotti" -> 1,
  "Bernardo Conti" -> 3,
  "Giovanfrancesco Buonamici" -> 1,
  "Luca Degli Albizzi" -> 1,
  "Pier Francesco Rinuccini" -> 2,
  "Matthias Bernegger" -> 1,
  "Giulio Ninci" -> 1,
  "Antonio de Ville" -> 1,
  "Geri Bocchineri" -> 49,
  "Gio. Francesco Tolomei" -> 8,
  "Giovanni Vannuccini" -> 2,
  "Vincenzo Galilei, Jr." -> 4,
  "Maria Tedaldi" -> 6,
  "Benedetto Castelli" -> 10,
  "Marcantonio Pieralli" -> 1,
  "Girolamo da Sommaia" -> 1,
  "Antonio Quaratesi" -> 2,
  "Pietro Mazzei" -> 1,
...
n: HashMap[String, Int] = Map(
  "Francesco Niccolini" -> 16,
  "Mario Guiducci" -> 30,
  "Geri Bocchineri" -> 49,
  "Maria Celeste Galilei" -> 49
)

Functional¶

In functional programming, you chain together applications of functions that each take in some data, transform it in some way, and then pass that transformed data into the next function. Through this chaining of well-thought transformation functions, you end up with the result you desired. Here, the transformations to be done are as follows:

First, transform the list of database rows into a dictionary so that each author is associated with a list of rows where they feature. For example, you go from ("Bocchineri,Rome","Niccolini,Rome","Bocchineri,Florence") to Bocchineri->("Bocchineri,Rome","Bocchineri,Florence");Niccolini->("Niccolini,Rome").
Then, transform the dictionary into one where the grouped lists of rows are replaced by the lengths of the lists. So, from Bocchineri->("Bocchineri,Rome","Bocchineri,Florence");Niccolini->("Niccolini,Rome") to Bocchineri->2;Niccolini->1.
Then, you transform this dictionary to one only retaining the entries that have a value more than 10. In the example, you'd be left with no rows, but if you'd limit to more than 1, you'd go from Bocchineri->2;Niccolini->1 to Bocchineri->2.
Finally, you transform the dictionary into a sequence where the entries are in the proper order according to value.

Often, the transformation functions implemented in ready libraries are what one calls "higher order functions", in that what they do depend on code blocks passed to them as parameters (frequently defined inline using language capabilities for anonymous or lambda functions). So, there is for example a general function to filter a collection, but you have to tell it yourself using code that the criteria here is to retain only people who've written more than 10 letters.

How well functional programming is supported differs a lot between languages, as seen below:

In [4]:

// Scala
import $ivy.`com.github.tototoshi::scala-csv:1.3.5`
import com.github.tototoshi.csv._
import java.io.File

/* Scala has very good support for functional programming on all levels, 
   with most collection classes defining all the sensible higher order transformation functions, 
   as well as robust support for using (anonymous, shorthand) functions as parameters. */

println(CSVReader.open(new File("letters.csv")).allWithHeaders.groupBy(_("Author"))
  .mapValues(_.length)
  .filter(_._2>10)
  .toSeq
  .sortBy(_._2)(Ordering[Int].reverse))

Vector((Maria Celeste Galilei,49), (Geri Bocchineri,49), (Mario Guiducci,30), (Francesco Niccolini,16))

Out[4]:

import $ivy.$                                      

import com.github.tototoshi.csv._

import java.io.File

/* Scala has very good support for functional programming on all levels, 
   with most collection classes defining all the sensible higher order transformation functions, 
   as well as robust support for using (anonymous, shorthand) functions as parameters. */

In [5]:

# Python
import pandas

dt = pandas.read_csv("letters.csv")

# With regard to data wrangling, the Pandas library (https://pandas.pydata.org/) defines commonly 
# needed functions for grouping and filtering in an object oriented way suitable for chaining.
# However, these do not consistently use function parameters for the conditions (which is why
# the below uses where() followed by dropna() instead of filter()).
print((dt.groupby('Author')
  .size()
  .where(lambda x: x>10).dropna()
  .sort_values(ascending=False)))

# Plain Python also has some general machinery for functional programming. However, because 
# 1) this is kept separate from the core collection classes and 2) plain Python isn't object 
# oriented, it doesn't support method chaining, resulting in functional code often being 
# difficult to use and understand. For example, note how here the order in which the functions 
# appear is inverted from the order of application, which isn't as easy to follow as the method 
# chaining approach possible with object oriented approaches
from itertools import groupby
dt2 = dt.to_dict(orient='records')
print(sorted(
  filter(
    lambda k: k[1]>10,
    map(
      lambda k: (k[0],len(list(k[1]))),
      groupby(sorted(dt2,key=lambda p: p['Author']),lambda p: p['Author']) # Core Python groupby only works on already sorted collections, necessitating the additional sort here
    )
  ),
  key=lambda k: k[1],
  reverse = True
))

# The above conceptual ordering problem can be sidestepped using temporary variables, but it does 
# result in more typing:
tmp = groupby(sorted(dt2,key=lambda p: p['Author']),lambda p: p['Author'])
tmp = map(lambda k: (k[0],len(list(k[1]))),tmp)
tmp = filter(lambda k: k[1]>10,tmp)
print(sorted(tmp,key=lambda k: k[1],reverse = True))

# Finally, note that Python has language support for writing certain types of list and dictionary 
# transformations as so called compherension expressions (https://en.wikipedia.org/wiki/Set-builder_notation#Parallels_in_programming_languages),
# which can be used in parts of the process:
tmp = groupby(sorted(dt2,key=lambda p: p['Author']),lambda p: p['Author'])
tmp = {k:len(list(v)) for (k,v) in tmp} # dictionary comprehension
tmp = [(k,v) for (k,v) in tmp.items() if v>10 ] # list comprehension
print(sorted(tmp,key=lambda k: k[1],reverse = True))

Author
Maria Celeste Galilei    49.0
Geri Bocchineri          49.0
Mario Guiducci           30.0
Francesco Niccolini      16.0
dtype: float64
[('Geri Bocchineri', 49), ('Maria Celeste Galilei', 49), ('Mario Guiducci', 30), ('Francesco Niccolini', 16)]
[('Geri Bocchineri', 49), ('Maria Celeste Galilei', 49), ('Mario Guiducci', 30), ('Francesco Niccolini', 16)]
[('Geri Bocchineri', 49), ('Maria Celeste Galilei', 49), ('Mario Guiducci', 30), ('Francesco Niccolini', 16)]

In [6]:

# R
library(tidyverse)

dt <- read_csv("letters.csv",col_types = cols())

# R in general doesn't have shorthand lambda expressions or consistent definitions for the 
# common higher order functions, and isn't object oriented, so cannot benefit from automatic method
# chaining.
#
# However, inside the tidyverse group of libraries (https://www.tidyverse.org), both of these 
# issues are transcended. First, chaining issolved through the definition of the %>% operator, which 
# alters the following function call by inserting its left hand argument as the first parameter. 
# Second, because R as a language allows methods to access their arguments before evaluation, the 
# library is able to extract and evaluate code parameters without needing them to be explicitly defined
# as functions. Therefore, the following:

print(dt %>% 
  group_by(Author) %>% 
  tally() %>% 
  filter(n>10) %>% 
  arrange(desc(n)))

# is exactly same as the following:

print(
  arrange_at(
    filter_at(
      tally(
        group_by_at(dt,"Author")
      ),
      "n",
      ~.>10 # a formula, something akin to a (lambda) function
    ),
    "n",
    ~desc(.) # a formula, something akin to a (lambda) function
  )
)

# A tibble: 4 x 2
  Author                    n
  <chr>                 <int>
1 Geri Bocchineri          49
2 Maria Celeste Galilei    49
3 Mario Guiducci           30
4 Francesco Niccolini      16
# A tibble: 4 x 2
  Author                    n
  <chr>                 <int>
1 Geri Bocchineri          49
2 Maria Celeste Galilei    49
3 Mario Guiducci           30
4 Francesco Niccolini      16