Boolean Scans and Reductions

The content of this notebook was presented in Dyalog Webinars: Boolean Scans and Reductions.

Logical values are the integers 0 and 1, in contrast to distinct true and false values you might find in other programming languages.

Utility

Boolean data is widely used in APL.

Results of comparison functions:

In [1]:
_  ,   ⍝ Stranding function
¯2 ¯1 0 1 2 ( < _  _ = _  _  _ >) 0
Out[1]:
1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 0 0 1 1

Data-driven conditionals. For example, "increase the salaries in group A by 5%":

In [2]:
groups    'ABACC'
salaries  20000 25750 21000 32350 32400
salaries × 1.05 * groups = 'A'
Out[2]:
21000 25750 22050 32350 32400

Selecting from arrays:

In [3]:
values  4 10 6 8 16 24
names'Anne' 'Ben' 'Charlie' 'Demi' 'Ella' 'Fiona'
(values10)/names
Out[3]:
┌───┬────┬─────┐ │Ben│Ella│Fiona│ └───┴────┴─────┘

Simple statistics:

In [4]:
(+÷≢)'the alphabet''aeiou'
Out[4]:
0.333333

Sixteen logical functions

With modern APL, all sixteen logical functions for 2-bit inputs can be represented by either primitives, atops or constant functions.

The descriptions given below are either common names for digital logic gates or mnemonic descriptions of the functionality with regards to left and right Boolean arguments.

Binary Decimal 2⊥ Function f Description
0 0 0 0 0 0⍨ FALSE
1 0 0 0 1 AND
1 0 0 0 2 > Left but not right
1 1 0 0 3 Left
1 0 0 0 4 < Right but not left
1 0 1 0 5 Right
1 1 0 0 6 Exlusive OR
1 1 1 0 7 OR
1 0 0 0 8 NOR
1 0 0 1 9 = Exclusive NOR
1 0 1 0 10 ~⍤⊢ Not right
1 0 1 1 11 Left OR not right
1 1 0 0 12 ~⍤⊣ Not left
1 1 0 1 13 Right OR not left
1 1 1 0 14 NAND
1 1 1 1 15 1⍨ TRUE

Note
Phil Last's article gives scalar dfn definitions, whereas some definitions in the previous table are not scalar functions.

Scans and reductions

Reductions summarise some property:

In [5]:
+/4 3 1 5   ⍝ Sum
Out[5]:
13

Scans indicate a progression of that property along the array:

In [6]:
+\4 3 1 5   ⍝ Cumulative sum
Out[6]:
4 7 8 13

Some reductions, and their related scans, are well known to APLers:

In [7]:
/1 1 1 1 1 1 1 1   ⍝ Are all true?
\1 1 0 1 1 1 0 1   ⍝ Were all true so far?
/0 0 1 0 0 0 0 0   ⍝ Are any true?
\0 0 1 0 0 0 1 0   ⍝ Were any true so far?
Out[7]:
1
Out[7]:
1 1 0 0 0 0 0 0
Out[7]:
1
Out[7]:
0 0 1 1 1 1 1 1

The last value

The value of a vector reduction f/⍵ is always the last value in the related scan f\⍵.

In [8]:
+/3 4 1 5
⊃⌽+\3 4 1 5
Out[8]:
13
Out[8]:
13

In general, properties of boolean vectors for which 1≡f/⍵ for a function f reflect properties in the related scans f\⍵.

From here we will take a quick look at a couple of particular scans and reductions which have some interesting applications. At the end of this notebook is a table of reductions for the sixteen logical functions, as well as some references for further reading.

Less-than

Less-than-scan leaves only the first true bit on, and all others are turned off:

In [9]:
<\0 0 1 0 1 1 1 1 1 1 0 1 0 1 1
Out[9]:
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

This is because the reduction only returns true when the last bit is the one and only true bit:

In [10]:
</0 0 0 1
Out[10]:
1

Too see the relationship, view the scan as a progression of reductions:

In [11]:
⍝ <\0 1 0 1 1
0
0<1
0<1<0
0<1<0<1
0<1<0<1<1
Out[11]:
0
Out[11]:
1
Out[11]:
0
Out[11]:
0
Out[11]:
0

An example use is to mark the start of end-of-line comments:

In [12]:
code  '+/⍳10   ⍝ sum⍝of⍝first⍝10⍝ints'
c  code = '⍝'
 code c (<\c)
Out[12]:
+ / ⍳ 1 0 ⍝ s u m ⍝ o f ⍝ f i r s t ⍝ 1 0 ⍝ i n t s 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The pair-wise reduction can be used to mark the start of groups of similar cells:

In [13]:
v  '⍟⎕⍟⍟⍟⎕⎕⍟⎕⎕⍟⍟⍟⍟⎕⎕⎕⎕⍟⍟⎕⎕⎕⎕⍟⍟⎕⍟'
q  '⎕'=v
v q (2</0,q)
Out[13]:
⍟ ⎕ ⍟ ⍟ ⍟ ⎕ ⎕ ⍟ ⎕ ⎕ ⍟ ⍟ ⍟ ⍟ ⎕ ⎕ ⎕ ⎕ ⍟ ⍟ ⎕ ⎕ ⎕ ⎕ ⍟ ⍟ ⎕ ⍟ 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0

Not-equal

The not-equal reduction ≠/ only returns true if there are an odd number of 1s

In [14]:
/1 1 1 1
2|+/1 1 1 1

/1 1 1 0
2|+/1 1 1 0
Out[14]:
0
Out[14]:
0
Out[14]:
1
Out[14]:
1

The related scan ≠\ then has the property that odd and even instances of 1s are joined by a series of 1s

In [15]:
b  0 0 1 0 0 0 1 0 0 0 0 1 0 0 1
 b (\b)
Out[15]:
0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0

Again, looking at the progression can help us to see how this works:

In [16]:
⍝ ≠\0 1 0 0 1
0
01
010
0100
01001
Out[16]:
0
Out[16]:
1
Out[16]:
1
Out[16]:
1
Out[16]:
0

An example use of this is to mark quoted parts of text:

In [17]:
quoted  'extract the "quoted" parts from this "text"'
q  '"'=quoted
inc  (~∧≠\)'"'=quoted   ⍝ Including quote marks
exc  (⊢∨≠\)'"'=quoted   ⍝ Exclusing quote marks
 quoted q inc exc
Out[17]:
e x t r a c t t h e " q u o t e d " p a r t s f r o m t h i s " t e x t " 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

These masks can easily be used to isolate the quoted parts:

In [18]:
inc/quoted
excquoted
Out[18]:
quotedtext
Out[18]:
┌────────┬──────┐ │"quoted"│"text"│ └────────┴──────┘

With a bit more effort, we can mark C-style (Java, JavaScript, CSS etc.) comments:

In [19]:
css  '* {color: #000; /* text */ background: #fff; /* bg */}'
cpos  '/*'(\⍷∨¯1⌽⍷)css
 css cpos
Out[19]:
* { c o l o r : # 0 0 0 ; / * t e x t * / b a c k g r o u n d : # f f f ; / * b g * / } 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0

Some of these examples, and more, can be found on APLcart.

For one last example, let's use something like a data table with successive fields of different lengths:

In [20]:
data  (?9100),(4 2 33 33/'ABC'),⍪?90

In data, there is a field of length 4 containing the 'A' markers, followed by a field of length 2 with the 'B' markers and finally a field of length 3 with the 'C' markers.

In [21]:
data
Out[21]:
50 AAA 0.127932 26 AAA 0.528921 10 AAA 0.116772 50 AAA 0.320547 64 BBB 0.377289 90 BBB 0.54566 68 CCC 0.339044 98 CCC 0.145094 66 CCC 0.63628

We would like to present the data above in a more human-readable format, so we are going to insert two blank lines between each field.

To start with, let's create some partition vectors. The function MTake creates a boolean vector where 1 indicates the start of a field:

MTake has a neat, nested-array style definition:

In [22]:
MTake  ∊↑1¨
MTake 4 2 3
Out[22]:
1 0 0 0 1 0 1 0 0

A more traditional definition has performance benefits for larger data:

In [23]:
MTake  {¯1(⍳+/)∊+\}
MTake 4 2 3
Out[23]:
1 0 0 0 1 0 1 0 0

If f indicates a field, and b indicates a series of blank lines,

1 0 0 0 1 0 1 0 1 0 1 0 0 1 0
←--f--→ ←b→ ←f→ ←b→ ←-f-→ ←b→

we can use catenate-rank-zero to interleave the blank lines:

In [24]:
MTake,4 2 3,02
Out[24]:
1 0 0 0 1 0 1 0 1 0 1 0 0 1 0

And use not-equal-scan to switch sections on and off, creating an expansion vector:

In [25]:
e\MTake,4 2 3,02
Out[25]:
1 1 1 1 0 0 1 1 0 0 1 1 1 0 0

Which we then use to expand a character matrix format of our data:

In [26]:
edata
Out[26]:
50 AAA 0.127932 26 AAA 0.528921 10 AAA 0.116772 50 AAA 0.320547 64 BBB 0.377289 90 BBB 0.54566 68 CCC 0.339044 98 CCC 0.145094 66 CCC 0.63628

A similar technique can be used to also label our fields:

In [27]:
labels  (MTake 4 2 3)('Field ',¨'ABC')
elabels,data
Out[27]:
Field A 50 AAA 0.127932 26 AAA 0.528921 10 AAA 0.116772 50 AAA 0.320547 Field B 64 BBB 0.377289 90 BBB 0.54566 Field C 68 CCC 0.339044 98 CCC 0.145094 66 CCC 0.63628

Flat partition

The STSC publication Boolean Functions and Techniques contains a wealth of information on Boolean functions and techniques (duh).

In particular, there is a description of a general procedure for applying some function to parts of an array independently, where parts is defined similarly to the fields in the previous example.

The general procedure involves a loop. With nested arrays, the pattern is quite easy to show:

In [28]:
1 0 0 0 1 0 0 {∊⌽¨} 'ABCDXYZ'   ⍝ Reverse two parts independently
Out[28]:
DCBAZYX
In [29]:
1 0 0 0 1 0 0 {∊+} 7   ⍝ Sum two parts independently
Out[29]:
10 18

This pattern is easily turned into a dop:

In [30]:
_P  {⍺⍺¨}   ⍝ A partitioned-function-application operator
p  1 0 0 0 1 0 0
p  _P 'ABCDXYZ'
p +/_P 7
Out[30]:
DCBAZYX
Out[30]:
10 18

However, this looping approach can become inefficient for large arrays with many parts. Page 10 of Boolean Functions and Techniques gives some definitions for more array-oriented solutions for use with specific functions. Here we will look at a couple of these, starting with partitioned-reverse PREVERSE.

The plus-scan is useful with these types of Boolean partitioned vectors, as it causes partitions to be marked by groups of successive integers:

In [31]:
a  'ABCDXYZ'
 a p (+\p)
Out[31]:
A B C D X Y Z 1 0 0 0 1 0 0 1 1 1 1 2 2 2

If we downgrade the plus-scan, the relative positions of indices for each partition relative to the whole array are reversed, but the positions of indices within each partition stay in ascending order:

In [32]:
⍒+\p
Out[32]:
5 6 7 1 2 3 4

Reversing the plus-scan then restores the relative positions of partitions within the whole array, but the indices within each partition are reversed:

In [33]:
⌽⍒+\p
Out[33]:
4 3 2 1 7 6 5

Indexing with this vector creates the desired effect:

In [34]:
a[⌽⍒+\p]
Out[34]:
DCBAZYX

These kinds of techniques often give performance improvements compared to their general nested-array counterparts:

      p ← 1,1↓1=?1e4⍴10
      t ← ⎕A[?1e4⍴26]

      ]runtime -c "p ⌽_P t" "p {⍵[⌽⍒+\⍺]} t"

  p ⌽_P t        → 1.4E¯4 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  p {⍵[⌽⍒+\⍺]} t → 3.5E¯5 | -75% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The partitioned sum starts with the plus-scan of the whole vector:

In [35]:
+\7
Out[35]:
1 3 6 10 15 21 28

We then extract the progressive sums up to the end of each partition:

In [36]:
(1p)/+\7
Out[36]:
10 28

Finally subtract each partition's sum from the sum of its preceding partition:

In [37]:
¯2-/0,(1p)/+\7
Out[37]:
10 18

Once again, there is a speedup relative to the general case:

      p ← 1,1↓1=?1e4⍴10
      i ← ?1e4⍴1000

      ]runtime -c "p +/_P i" "p {¯2-/0,(1⌽⍺)/+\⍵} i"

  p +/_P i              → 6.8E¯5 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  p {¯2-/0,(1⌽⍺)/+\⍵} i → 8.9E¯6 | -87% ⎕⎕⎕⎕⎕

Efficient implementation

Some languages implement boolean types as bytes (8 bits). The decision to use bit representations was made by Larry Breed for the first implementation of APL.

Not only do boolean arrays use the smallest amount of memory possible, they are also subject to SIMD parallelisations - even on hardware without special SIMD capabilities.

More on this topic can be found on the APL Wiki article on Booleans.

Sixteen boolean reductions

Finally let's return to the table of Boolean functions from earlier. In the Vector article, Phil Last gives plain English descriptions for properties of boolean vectors when their reduction using a Boolean function f gives 1.

Function f f/⍵ is true if satisfies:
0⍨ Never
All ones
> Odd leading ones
First is one
< Last is the only one
Last is one
Odd ones
At least one one
Odd leading zeros else the last is the only one
= Even zeroes
~⍤⊢ Last is parity of the length
Even leading zeros
~⍤⊣ First is zero
Last is not the only zero
Even leading ones else last is the only zero
1⍨ Always

For those not described in this notebook, you might want to experiment and show yourself the relationships between the descriptions given in the table, and the properties of the scans f\⍵. APL expressions offering similar insights are given on page 23 of the Boolean Functions and Techniques publication.