#!/usr/bin/env python # coding: utf-8 # # Introductory tutorial # # This is the introduction to a five part tutorial which demonstrates how to de-duplicate a small dataset using simple settings. # # The aim of the tutorial is to demonstarate core Splink functionality succinctly, rather that comprehensively document all configuration options. # # The five parts are: # # - [1. Exploratory analysis](https://moj-analytical-services.github.io/splink/demos/01_Exploratory_analysis.html) # # - [2. Choosing blocking rules to optimise runtimes](https://moj-analytical-services.github.io/splink/demos/02_Blocking.html) # # - [3. Estimating model parameters](https://moj-analytical-services.github.io/splink/demos/03_Estimating_model_parameters.html) # # - [4. Predicting results](https://moj-analytical-services.github.io/splink/demos/04_Predicting_results.html) # # - [5. Visualising predictions](https://moj-analytical-services.github.io/splink/demos/05_Visualising_predictions.html) # # - [6. Quality assurance](https://moj-analytical-services.github.io/splink/demos/06_Quality_assurance.html) # # # Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop. # # You can find these tutorial notebooks in the `splink_demos` repo, and you can run them live in your web browser by clicking the following link: # # [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/moj-analytical-services/splink_demos/splink3_demos?urlpath=lab) # # # # # # # #