In [4]:

```
LoadPackage("meataxe64");; Read("../gap/bench.g");
Read("../gap/mtx64utils.g"); LoadPackage("jupyterviz");;
```

The Classification of Finite Simple Groups was one of the highlights of 20th Century mathematics.

It included the discovery of the sporadic simple Fischer-Griess Monster $\mathbb{M}$ of order

$\mathbb{M}$ has many intriguing links to other areas of mathematics and even to theoretical physics, so we'd like to compute in it.

**But**, the *most* tractable representation of this group is as $196882\times 196882$ matrices of integers mod 2 -- Each matrix is 5GB.

For many years this was clearly beyond any practical computation. The first actual multiplication of two general elements was done in 1998, using most of the computing resources of a large maths department for **45 hours**.

Before OpenDreamKit this computation would still have taken days. We have contributed to the development of a massively improved C and assembler library and a GAP interface for it.

A Monster matrix multiplication now takes about **8 minutes** on a laptop, scaling to 2-3 minutes on a multicore server.

We start with matrices half that size for speed.

In [9]:

```
size := 196882/2;; f := MTX64_FiniteField(2);;
m := MTX64_RandomMat(f,size,size);
```

Out[9]:

We can multiply this by itself in about 1 minute:

In [11]:

```
ShowBench(ParMult,m,m);
```

We'll start the full sized computation and come back to it after the rest of Clement's talk

In [13]:

```
m := MTX64_RandomMat(f,2*size,2*size);; ShowBench(ParMult, m, m);
```

This is not just for the Monster. High performance linear algebra over these small fields are an essential component of a VRE for users in many areas. As well as multiplication we need

Our other key primitive operation. Much more difficult to parallelize in general.

Our basic Gaussian elimnination operation applied to a matrix $A$, computes $M$, $K$, $R$, $\gamma$ and $\rho$ satisfying:

$$\pmatrix{M&0\cr K & 1} \rho A \gamma = \pmatrix{-1&R\cr0&0}$$where $\gamma$ and $\rho$ are permutations that effectively select the pivot columns and pivot rows of $A$. This is effectively full reduced row echelon form, with transforming matrices.

Using this, we can compute inverses, solve systems of equations, determine nullspaces, etc. efficiently.

To see what it does properly we need a singular matrix. We take the Kronecker (tensor) product of two rectangular matrices. (If $A$ is $n\times m$ and $B$ is $m\times n$ with $m < n$ then $A\otimes B$ will be $nm\times nm$ of rank at most $m^2$.)

We'll try a different small finite field.

In [42]:

```
f := MTX64_FiniteField(9);;
m1 := MTX64_RandomMat(f, 200,99);;
m2 := MTX64_RandomMat(f, 99,200);;
m := MTX64_KroneckerProduct(m1,m2);
```

Out[42]:

In [47]:

```
ech := fail;; # suppress a warning.
ShowBench(function() ech := MTX64_Echelize(m);end);
ech.multiplier; ech.cleaner; ech.remnant; # M, K and R in the above formula
```

Out[47]:

Out[47]:

Out[47]:

We can also use the multi-threaded version of the Gaussian elimination (although this problenm is a little small).

In [35]:

```
MTX64_WriteMatrix(m, "a");
ShowBench(MTX64_fEchelize, ".", "a", "gamma", "rho", "m", "k", "r");
```

Out[35]:

We set the field and maximum dimension and make a set of random matrices of different sizes

In [18]:

```
q := 5;; maxdim := 12000;; steps := 10;;
sizes := List([1..steps], i-> i*QuoInt(maxdim, steps));;
mats := List(sizes, i-> MTX64_RandomMat(MTX64_FiniteField(q), i, i));
```

Out[18]:

And look at the timing for squaring them:

In [21]:

```
marks1 := List(mats, x-> BenchMark(\*,x,x));;
marksm := List(mats, x-> BenchMark(ParMult,x,x));;
Plot(
[sizes,List(marks1, x-> x.cpu), rec(name := "single-threaded",
title := "Meataxe64 runtimes for matrix multiply", xaxis := "Dimension", yaxis := "ms")],
[sizes,List(marksm, x-> QuoInt(x.wall,10^6)), rec(name := "multi-threaded wall time")]
);
```

Out[21]:

In [ ]:

```
```