This notebook counts the number of rows in each index and calculates the total for the whole repository. It formats the results in nice HTML and Markdown tables for easy browsing.
from urllib.parse import urljoin
import pandas as pd
from IPython.display import HTML, display
from tabulate import tabulate
# Load the index data
df = pd.read_csv("indexes.csv").sort_values(by="title")
def make_download_link(url):
"""
Create a link to download the CSV file from GitHub
"""
slug = url.strip("/").split("/")[-1]
filename = f"{slug}.csv"
url = urljoin(
"https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/",
filename,
)
link = '<a href="{}">CSV file</a>'.format(url)
return link
# Create a HTML link to the index data on the NSWSA site
df["web"] = df["url"].apply(lambda x: '<a href="{}">Browse index</a>'.format(x))
# Create a HTML link to download the CSV file from GitHub
df["download"] = df["url"].apply(lambda x: make_download_link(x))
def count_rows(url):
"""
Count the number of rows in a CSV file.
"""
slug = url.strip("/").split("/")[-1]
url = urljoin(
"https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/",
f"{slug}.csv",
)
df = pd.read_csv(url, dtype=object)
return df.shape[0]
# Add number of rows in the CSV
df["rows"] = df["url"].apply(lambda x: count_rows(x))
# How many rows in the whole repository?
df["rows"].sum()
2481881
# Which index has the most number of rows?
df.loc[df["rows"].idxmax()]
title Deceased estates index 1880-1958 url https://mhnsw.au/indexes/deceased-estates/dece... description Researching deceased estates files is a comple... category Deceased estates web <a href="https://mhnsw.au/indexes/deceased-est... download <a href="https://media.githubusercontent.com/m... rows 577891 Name: 26, dtype: object
"Currently: {} indexes harvested with {:,} rows of data.".format(
df.shape[0], df["rows"].sum()
)
'Currently: 75 indexes harvested with 2,481,881 rows of data.'
Make a nicely formatted table in both HTML and Markdown.
# Select the columns that we want
columns = df[["title", "rows", "download", "web"]]
# Create a list of headers
headers = ["Title", "Number of rows", "Download data", "View at State Archives"]
# Use Tabulate to generate a HTML table
display(
HTML(
tabulate(
columns, headers=headers, showindex=False, tablefmt="unsafehtml", intfmt=","
)
)
)
# Write a GitHub Markdown formatted version of the table to a file
with open("indexes.md", "w") as md_file:
md_file.write(
tabulate(
columns, headers=headers, showindex=False, tablefmt="github", intfmt=","
)
)
Title | Number of rows | Download data | View at State Archives |
---|---|---|---|
Aboriginal People in the Register of Aboriginal Reserves 1875-1904 | 78 | CSV file | Browse index |
Assisted Immigrants Index 1839-1896 | 200,000 | CSV file | Browse index |
Australian Railway Supply Detachment 1914 | 65 | CSV file | Browse index |
Bankruptcy index 1888-1929 | 30,000 | CSV file | Browse index |
Bench of Magistrates Index 1788-1820 | 4,442 | CSV file | Browse index |
Botanic Gardens and government domains employees | 916 | CSV file | Browse index |
Bubonic plague index 1900-1908 | 567 | CSV file | Browse index |
Census - 1841 | 9,355 | CSV file | Browse index |
Chemists, druggists and pharmacists index 1876-1920 | 2,967 | CSV file | Browse index |
Child care and protection index 1817-1942 | 21,292 | CSV file | Browse index |
Colonial (Government) Architect index 1837-1970 | 2,373 | CSV file | Browse index |
Colonial Secretary Letters Received, 1826-1896 | 205,863 | CSV file | Browse index |
Colonial Secretary's Papers 1788-1825 | 144,572 | CSV file | Browse index |
Colonial Secretary's letters relating to land 1826-1856 | 20,000 | CSV file | Browse index |
Colonial Secretary's main series of letters received | 7,638 | CSV file | Browse index |
Convict assignments index 1821-1825 | 6,156 | CSV file | Browse index |
Convict exiles index 1849-1850 | 3,004 | CSV file | Browse index |
Convict indents (digitised) index 1788-1801 | 20,000 | CSV file | Browse index |
Convicts applications to marry 1825-1851 | 14,327 | CSV file | Browse index |
Convicts index 1791-1873 | 150,000 | CSV file | Browse index |
Coroners' inquests index 1796-1824 | 808 | CSV file | Browse index |
Court of Civil Jurisdiction index 1799-1814 | 2,876 | CSV file | Browse index |
Court of Claims (Land) index 1833-1922 | 2,966 | CSV file | Browse index |
Crew and passengers 1828-1841 | 2,560 | CSV file | Browse index |
Criminal court records index 1788-1833 | 5,028 | CSV file | Browse index |
Criminal depositions (Deposition Books) index 1849-1949 | 117,508 | CSV file | Browse index |
Criminal indictments index 1863-1919 | 20,000 | CSV file | Browse index |
Deceased estates index 1880-1958 | 577,891 | CSV file | Browse index |
Depasturing licenses index 1837-1851 | 7,449 | CSV file | Browse index |
Dependent children registers 1883-1923 | 28,910 | CSV file | Browse index |
Devonshire Street Cemetery reinterment index | 9,559 | CSV file | Browse index |
Divorce records index 1873-1923 | 21,239 | CSV file | Browse index |
Fire Commissioners Personnel | 3,767 | CSV file | Browse index |
Gaol inmates & prisoners photos index 1870-1930 | 52,055 | CSV file | Browse index |
Gold (auriferous) lease registers 1874-1953 | 60,000 | CSV file | Browse index |
Indigenous colonial court cases 1788-1838 | 65 | CSV file | Browse index |
Infirm & destitute (Government) asylums index 1880-1896 | 20,000 | CSV file | Browse index |
Inquest index 1942-1963 | 45,547 | CSV file | Browse index |
Insolvency index 1842-1887 | 23,108 | CSV file | Browse index |
Intestate estates index 1821-1913 | 30,000 | CSV file | Browse index |
Land grants and leases (registers) 1792-1865 | 5,627 | CSV file | Browse index |
Letters re migration to NSW 1838-1857 | 22,771 | CSV file | Browse index |
Maintenance registers - Metropolitan Children's Court 1915-1917 | 1,372 | CSV file | Browse index |
Miscellaneous immigrants index 1828-1843 | 8,821 | CSV file | Browse index |
NSW Government employees granted military leave | 20,000 | CSV file | Browse index |
NSW King’s / Queen’s Counsel appointment correspondence | 2,083 | CSV file | Browse index |
Naturalization index 1834-1903 | 9,860 | CSV file | Browse index |
Nominal Roll of the First Railway Section (AIF) | 416 | CSV file | Browse index |
Norfolk Island special bundles index 1794-1813 | 216 | CSV file | Browse index |
Nurses index 1926-1954 | 46,499 | CSV file | Browse index |
Police service registers 1852-1913 | 20,000 | CSV file | Browse index |
Port Macquarie Small Debts Register, 1845-1887 | 2,036 | CSV file | Browse index |
Probate records - supplementary index 1790-1875 | 1,626 | CSV file | Browse index |
Public Works Salary Registers | 523 | CSV file | Browse index |
Publicans' licenses index 1830-1861 | 20,000 | CSV file | Browse index |
Quarter sessions cases 1824-1837 | 6,232 | CSV file | Browse index |
Railway employment records 1856-1917 | 763 | CSV file | Browse index |
Railways and Tramways Roll of Honour | 1,214 | CSV file | Browse index |
Register of Firms index 1903-1922 | 50,000 | CSV file | Browse index |
School teachers' rolls 1869-1908 | 20,000 | CSV file | Browse index |
Schools and related records 1876-1979 | 30,181 | CSV file | Browse index |
Soldier (Closer) Settlement - Returned Soldiers Transfer files 1907-1951 | 9,656 | CSV file | Browse index |
Soldier (Closer) Settlement transfer registers 1919-1925 | 4,957 | CSV file | Browse index |
Soldier (Closer) settlement promotion files index 1913-1958 | 4,354 | CSV file | Browse index |
Soldier Settlement loan files index 1906-1960 | 7,642 | CSV file | Browse index |
Soldier Settlement miscellaneous files index 1916 | 1,050 | CSV file | Browse index |
Soldier Settlement purchases index 1905-1937 | 9,776 | CSV file | Browse index |
Squatters and graziers index 1837-1849 | 9,003 | CSV file | Browse index |
Surveyor General's crown plans 1792-1886 | 5,455 | CSV file | Browse index |
Surveyors' field books 1794-1860 | 813 | CSV file | Browse index |
Surveyors’ letters 1822-1855 | 157 | CSV file | Browse index |
Tramway employees 1879-1911 | 10,606 | CSV file | Browse index |
Unassisted immigrants index 1842-1855 | 140,000 | CSV file | Browse index |
Unemployed in Sydney 1866 | 3,222 | CSV file | Browse index |
Vessels arrived in Sydney 1837-1925 | 129,999 | CSV file | Browse index |
Created by Tim Sherratt for the GLAM Workbench project.