Summarise index details

This notebook counts the number of rows in each index and calculates the total for the whole repository. It formats the results in nice HTML and Markdown tables for easy browsing.

In [54]:
from IPython.display import display, HTML
import os
from urllib.parse import urljoin
import pandas as pd
from tabulate import tabulate
from slugify import slugify
In [34]:
# Load the index data
df = pd.read_csv('indexes.csv').sort_values(by='title')
In [58]:
def make_download_link(title):
    '''
    Create a link to download the CSV file from GitHub
    '''
    filename = '{}.csv'.format(slugify(title))
    url = urljoin('https://raw.githubusercontent.com/wragge/srnsw-indexes/master/data/', filename)
    link = '<a href="{}">CSV file</a>'.format(url)
    return link

# Create a HTML link to more info about the index
df['more_info'] = df['more_info_url'].apply(lambda x: '<a href="{}">More info</a>'.format(x))

# Create a HTML link to the index data on the NSWSA site
df['web'] = df['url'].apply(lambda x: '<a href="{}">Browse index</a>'.format(x))

# Create a HTML link to download the CSV file from GitHub
df['download'] = df['title'].apply(lambda x: make_download_link(x))
In [59]:
def count_rows(title):
    '''
    Count the number of rows in a CSV file.
    '''
    df = pd.read_csv(os.path.join('csv', '{}.csv'.format(slugify(title))), dtype=object)
    return df.shape[0]

# Add number of rows in the CSV
df['rows'] = df['title'].apply(lambda x: count_rows(x))
In [60]:
# How many rows in the whole repository?
df['rows'].sum()
Out[60]:
1499259
In [61]:
# Which index has the most number of rows?
df.loc[df['rows'].idxmax()]
Out[61]:
id                                                              15
more_info_url    https://www.records.nsw.gov.au/archives/collec...
status                                               Not digitised
title                                             Deceased Estates
url              https://www.records.nsw.gov.au/searchhits_noco...
more_info        <a href="https://www.records.nsw.gov.au/archiv...
web              <a href="https://www.records.nsw.gov.au/search...
rows                                                        257524
download         <a href="https://raw.githubusercontent.com/wra...
Name: 29, dtype: object

Summarise the results of the harvest

In [62]:
'Currently: {} indexes harvested with {:,} rows of data.'.format(df.shape[0], df['rows'].sum())
Out[62]:
'Currently: 64 indexes harvested with 1,499,259 rows of data.'

Make a nicely formatted table in both HTML and Markdown.

In [63]:
# Select the columns that we want
columns = df[['title', 'status', 'rows', 'download', 'web', 'more_info']]

# Create a list of headers
headers = ['Title', 'Status', 'Number of rows', 'Download data', 'View at NSWSA', 'More info']

# Use Tabulate to generate a HTML table
display(HTML(tabulate(columns, headers=headers, showindex=False, tablefmt='html')))

# Write a GitHub Markdown formatted version of the table to a file
with open('indexes.md', 'w') as md_file:
    md_file.write(tabulate(columns, headers=headers, showindex=False, tablefmt='github'))
Title Status Number of rowsDownload data View at NSWSA More info
Assisted Immigrants Fully digitised 191688CSV file Browse index More info
Australian Railway Supply Detachment Fully digitised 65CSV file Browse index More info
Bankruptcy Index Not digitised 28880CSV file Browse index More info
Bench of Magistrates cases, 1788-1820 Not digitised 4442CSV file Browse index More info
Botanic Gardens and Government Domains Employees Index Not digitised 916CSV file Browse index More info
Bubonic Plague Index Fully digitised 592CSV file Browse index More info
CSreLand Not digitised 10849CSV file Browse index More info
Child Care and Protection Not digitised 21980CSV file Browse index More info
Closer Settlement Transfer Registers, NRS 8082 Not digitised 4957CSV file Browse indexMore info
Closer and Soldier Settlement Transfer Files Not digitised 9656CSV file Browse index More info
Colonial Secretary Main series of letters received,1826-1982Not digitised 7638CSV fileBrowse index More info
Convict Index Not digitised 141854CSV file Browse index More info
Convicts Applications to Marry 1825-51 Not digitised 8456CSV file Browse index More info
Coroners Inquests 1796-1824 Not digitised 808CSV file Browse index More info
Court of Civil Jurisdiction index Not digitised 2876CSV file Browse index More info
Crew (and Passenger) Lists, 1828-1841 Fully digitised 2560CSV file Browse index More info
Criminal Court Records index 1788-1833 Not digitised 5028CSV file Browse index More info
Criminal Indictments, 1863-1919 Not digitised 15701CSV file Browse index More info
Deceased Estates Not digitised 257524CSV file Browse index More info
Depasturing Licenses Not digitised 7449CSV file Browse index More info
Devonshire Street Cemetery Reinterment Index, 1901 Not digitised 9559CSV file Browse index More info
Divorce Index Not digitised 21239CSV file Browse index More info
Early Convict Index Fully digitised 12933CSV file Browse index More info
FieldBooks Not digitised 813CSV file Browse index More info
Government Architect Not digitised 2373CSV file Browse index More info
Government Asylums for the Infirm and Destitute Not digitised 10264CSV file Browse index More info
Governor’s Court Case Papers, 1815-1824 Not digitised 3789CSV file Browse index More info
Index on Occupants on Aboriginal Reserves, 1875 to 1904 Not digitised 80CSV file Browse index More info
Index to 1841 Census Not digitised 9355CSV file Browse index More info
Index to Closer Settlement Promotion Not digitised 4354CSV file Browse index More info
Index to Court of Claims Not digitised 1051CSV file Browse index More info
Index to Deposition Registers Not digitised 65790CSV file Browse index More info
Index to Early Probate Records Not digitised 1627CSV file Browse index More info
Index to Gaol Photographs Fully digitised 48171CSV file Browse index More info
Index to Intestate Estate Case Papers Not digitised 22520CSV file Browse index More info
Index to Miscellaneous Immigrants Not digitised 8821CSV file Browse index More info
Index to Quarter Sessions cases, 1824-37 Not digitised 6232CSV file Browse index More info
Index to Registers of Firms Not digitised 45683CSV file Browse index More info
Index to Squatters and Graziers Not digitised 9003CSV file Browse index More info
Index to Vessels Arrived, 1837 - 1925 Not digitised 120083CSV file Browse index More info
Index to convict exiles, 1846-50 Not digitised 3036CSV file Browse index More info
Index to the Unassisted Arrivals NSW 1842-1855 Not digitised 135792CSV file Browse index More info
Indigenous Colonial Court Cases 1788-1838 Not digitised 66CSV file Browse index More info
Insolvency Index Not digitised 23108CSV file Browse index More info
King’s and Queen’s Counsel Appointments Fully digitised 2083CSV file Browse index More info
LandGrants Not digitised 5627CSV file Browse index More info
List of Maps and Plans (and Supplement) Not digitised 5455CSV file Browse index More info
NSW Chemists and Druggists Not digitised 2967CSV file Browse index More info
NSW Government Employees Granted Military Leave, 1914-1918 Not digitised 13735CSV file Browse index More info
NSW Govt Railways and Tramways - Roll of Honour - 1914-1919 Not digitised 1214CSV file Browse index More info
Naturalisation Not digitised 9860CSV file Browse index More info
Nominal Roll of the First Railway Section (AIF) Not digitised 417CSV file Browse index More info
Publicans Licenses Not digitised 18457CSV file Browse index More info
Railway Employment Records Not digitised 763CSV file Browse index More info
Register of Auriferous Leases Not digitised 53076CSV file Browse index More info
Registers of Nurses Not digitised 26665CSV file Browse index More info
Registers of Police Not digitised 11319CSV file Browse index More info
Registers of Settlement Purchases Not digitised 9776CSV file Browse index More info
Returned Soldier Settlement Loan Files Not digitised 7642CSV file Browse index More info
Returned Soldiers Settlement Misc files 1916-25 Not digitised 1050CSV file Browse index More info
Schools Not digitised 21246CSV file Browse index More info
Surveyor General - Letters received 1822-55 Not digitised 157CSV file Browse index More info
Teachers Rolls Not digitised 14867CSV file Browse index More info
Unemployed in Sydney 1866 Fully digitised 3222CSV file Browse index More info

Created by Tim Sherratt.

Part of the GLAM Workbench project.