Introductory tutorial¶

This is the introduction to a four part tutorial which demonstrates how to de-duplicate a small dataset using simple settings.

The aim of the tutorial is to demonstarate core Splink functionality succinctly, rather that comprehensively document all configuration options.

The four parts are:

1. Exploratory analysis
2. Estimating model parameters
3. Predicting results
4. Quality assurance

Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop.