This is the introduction to a four part tutorial which demonstrates how to de-duplicate a small dataset using simple settings.
The aim of the tutorial is to demonstarate core Splink functionality succinctly, rather that comprehensively document all configuration options.
The four parts are:
Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop.