An exploration of making nbdev v2 work with gitlab.
fast.ai recently released version 2 of their popular nbdev toolkit. nbdev is a set of tools that enable you to write, test, document, and distribute software packages and technical articles all in one place - the jupyter notebook. Version two, like version one, is designed with integration into the github.com ecosystem therefore making it a little more difficult to use in other similar ecosystems. Where I work we run an internal gitlab.com system and are unable to use the github features of nbdev. So, for the last year or so I have been using a modified nbdev v1 system that i hacked together to work on the gitlab ecosystem and now with the recent update to version 2 I have set out to do the same again. This time i thought i'd share what i did and learnt along the way.
The image below from nbdev.fast.ai shows the basic layout of the nbdev ecosystem:
From this image we can see that the key things that deal with github are the CI for testing and the CD for document build and deploy using quarto on github pages. Luckily for us gitlab has equivelant functionality and so we will be able to replicate most of the nbdev v2 functionality in gitlab.
Therefore our todo list is quite short:
TODO list:
gitlab CI/CD is configured in a similar way to github actions in that there is a config file that lives in your repo to tell the runner what to do and when to do it. The config file detail what os/environment to use to run your code in. In the official nbdev case they use the latest build of ubuntu and in my case I use a continuumio/miniconda3 build as my base. nbdev use this image base to install the latest versions of everything on the fly during every actions run but as i have a shared and quite slow runner at work I tend to use a pre-build container with the latest nbdev and quarto already installed and ready to go.
Feel free to use my container yourself if you like: https://hub.docker.com/r/robtheoceanographer/nbdev2
Now that we have the basic image ready to use we now need to write our config file. The language used for gitlab ci/cd is yaml
and my basic set up is shown below:
image: robtheoceanographer/nbdev2:latest
# Cache modules in between jobs
cache:
key: $CI_COMMIT_REF_SLUG
clean:
stage: build
script:
- echo "Check we are starting with clean git checkout"
- if [ -n "$(git status -uno -s)" ]; then echo "git status is not clean"; false; fi
- echo "Trying to strip out notebooks"
- nbdev_clean
- echo "Check that strip out was unnecessary"
- git status -s # display the status to see which nbs need cleaning up
- if [ -n "$(git status -uno -s)" ]; then echo -e "!!! Detected unstripped out notebooks\n!!!Remember to run nbdev_install_hooks"; false; fi
test:
stage: test
script:
- echo "Doing the testing here..."
- nbdev_install
- nbdev_test
pages:
stage: deploy
script:
- pwd
- nbdev_install
- nbdev_docs
- cp -r _docs/ public/
- ls
artifacts:
paths:
- public
Now this is a first cut gitlab-ci.yml and there's definately more to do and change here... for example, i haven't set up good caching or sharing of builds and so things get built multiple times... feel free to contribute it you like.
This config will replicate most of what is being doing in the github actions of nbdev on github.
Because they nbdev ecosystem is expecting you to push your code to github it has a few pre-filled github things in the settings.ini
that we need to tweak.
The main things to change are the host
and git_url
sections.
Here is an example of my template setting.ini file for you to see the difference:
[DEFAULT]
# All sections below are required unless otherwise specified
# see https://github.com/fastai/nbdev/blob/master/settings.ini for original
host = gitlab
lib_name = nbdev2_template
description = a clone of nbdev v2 to gitlab.
copyright = this is all thanks to fastai
keywords = fastai nbdev
user = robtheoceanographer
author = Robert Johnson
author_email = example@gmail.com
repo = nbdev2_template
branch = main
version = 0.0.1
min_python = 3.9
audience = Developers
language = English
custom_sidebar = False
license = None
status = 2
nbs_path = nbs
doc_path = _docs
recursive = False
tst_flags = notest
### Inferred From Other Values but tailored for gitlab ###
doc_host = /
doc_baseurl = /%(lib_name)s/
git_url = https://gitlab.com/%(user)s/%(lib_name)s/tree/%(branch)s/
lib_path = %(lib_name)s
title = %(lib_name)s
### OPTIONAL ### see https://github.com/fastai/nbdev/blob/master/settings.ini for examples
# requirements = fastcore pandas
# dev_requirements =
# console_scripts =
So, to bring all of this together i made an example repo on gitlab.com that you can look at and copy if your find it helpful.
This is the repo: https://gitlab.com/robtheoceanographer/nbdev2_template
This is the rendered quarto docs: https://robtheoceanographer.gitlab.io/nbdev2_template/
Of course, there are a bunch of things that don't work when we move to gitlab. The two things that spring to mind are:
doc_path
from settings.ini to GitHub PagesI recommend using the official github version and benefiting from all the functionality there rather than rolling your own on gitlab or elsewhere. This whole posit was just to show that it can be done. Well, that is my brain dump of what i've done so far and i'm sure i haven't covered everything so feel free to reach out and ask me questions if you like: https://twitter.com/JohnsonRob