An exploration of making nbdev v2 work with gitlab.
fast.ai recently released version 2 of their popular nbdev toolkit. nbdev is a set of tools that enable you to write, test, document, and distribute software packages and technical articles all in one place - the jupyter notebook. Version two, like version one, is designed with integration into the github.com ecosystem therefore making it a little more difficult to use in other similar ecosystems. Where I work we run an internal gitlab.com system and are unable to use the github features of nbdev. So, for the last year or so I have been using a modified nbdev v1 system that i hacked together to work on the gitlab ecosystem and now with the recent update to version 2 I have set out to do the same again. This time i thought i'd share what i did and learnt along the way.
From this image we can see that the key things that deal with github are the CI for testing and the CD for document build and deploy using quarto on github pages. Luckily for us gitlab has equivelant functionality and so we will be able to replicate most of the nbdev v2 functionality in gitlab.
Therefore our todo list is quite short:
gitlab CI/CD is configured in a similar way to github actions in that there is a config file that lives in your repo to tell the runner what to do and when to do it. The config file detail what os/environment to use to run your code in. In the official nbdev case they use the latest build of ubuntu and in my case I use a continuumio/miniconda3 build as my base. nbdev use this image base to install the latest versions of everything on the fly during every actions run but as i have a shared and quite slow runner at work I tend to use a pre-build container with the latest nbdev and quarto already installed and ready to go.
Feel free to use my container yourself if you like: https://hub.docker.com/r/robtheoceanographer/nbdev2
Now that we have the basic image ready to use we now need to write our config file. The language used for gitlab ci/cd is
yaml and my basic set up is shown below:
image: robtheoceanographer/nbdev2:latest # Cache modules in between jobs cache: key: $CI_COMMIT_REF_SLUG clean: stage: build script: - echo "Check we are starting with clean git checkout" - if [ -n "$(git status -uno -s)" ]; then echo "git status is not clean"; false; fi - echo "Trying to strip out notebooks" - nbdev_clean - echo "Check that strip out was unnecessary" - git status -s # display the status to see which nbs need cleaning up - if [ -n "$(git status -uno -s)" ]; then echo -e "!!! Detected unstripped out notebooks\n!!!Remember to run nbdev_install_hooks"; false; fi test: stage: test script: - echo "Doing the testing here..." - nbdev_install - nbdev_test pages: stage: deploy script: - pwd - nbdev_install - nbdev_docs - cp -r _docs/ public/ - ls artifacts: paths: - public
Now this is a first cut gitlab-ci.yml and there's definately more to do and change here... for example, i haven't set up good caching or sharing of builds and so things get built multiple times... feel free to contribute it you like.
This config will replicate most of what is being doing in the github actions of nbdev on github.
Because they nbdev ecosystem is expecting you to push your code to github it has a few pre-filled github things in the
settings.ini that we need to tweak.
The main things to change are the
[DEFAULT] # All sections below are required unless otherwise specified # see https://github.com/fastai/nbdev/blob/master/settings.ini for original host = gitlab lib_name = nbdev2_template description = a clone of nbdev v2 to gitlab. copyright = this is all thanks to fastai keywords = fastai nbdev user = robtheoceanographer author = Robert Johnson author_email = email@example.com repo = nbdev2_template branch = main version = 0.0.1 min_python = 3.9 audience = Developers language = English custom_sidebar = False license = None status = 2 nbs_path = nbs doc_path = _docs recursive = False tst_flags = notest ### Inferred From Other Values but tailored for gitlab ### doc_host = / doc_baseurl = /%(lib_name)s/ git_url = https://gitlab.com/%(user)s/%(lib_name)s/tree/%(branch)s/ lib_path = %(lib_name)s title = %(lib_name)s ### OPTIONAL ### see https://github.com/fastai/nbdev/blob/master/settings.ini for examples # requirements = fastcore pandas # dev_requirements = # console_scripts =
So, to bring all of this together i made an example repo on gitlab.com that you can look at and copy if your find it helpful.
Of course, there are a bunch of things that don't work when we move to gitlab. The two things that spring to mind are:
doc_pathfrom settings.ini to GitHub Pages
I recommend using the official github version and benefiting from all the functionality there rather than rolling your own on gitlab or elsewhere. This whole posit was just to show that it can be done. Well, that is my brain dump of what i've done so far and i'm sure i haven't covered everything so feel free to reach out and ask me questions if you like: https://twitter.com/JohnsonRob