#!/usr/bin/env python # coding: utf-8 # # Python Packages and Environments # # ## What to use Where, and Why # # MOAD Group Software **Discussion** # # 16 & 17 Dec 2020 # This notebook can be viewed as a slideshow by using the # (RISE)[https://rise.readthedocs.io/en/stable/index.html] # slide show extension for Jupyter. # # *Note: RISE only works with `jupyter notebook`, not with `jupyter lab` :-(* # # If you are working in an up to date clone of the # (UBC-MOAD/PythonNotes repo)[https://github.com/UBC-MOAD/PythonNotes], # you can run the slideshow locally. # To do so: # * create an conda environment containing `jupyter` and `rise` with: # ```bash # conda env create -f PythonNotes/pkgs-envs/environment.yaml # ``` # * start `jupyter notebook` # * open `PythonNotes/pkgs-envs/PythonPkgsEnvsSlides.ipynb` # * use `Alt+r` or the `Enter/Exit RISE Slideshow` toobar button to start/stop the slideshow mode # * use `Space` and `Shift+Space` to navigate forward and backward through the slide cells # * What is a Python package? # * What is a Python environment? # * 2 Python package managers: `conda` and `pip` # * 2 Python environment managers: `conda env` and `virtualenv` # * 5 ways to install Python packages: # * `conda env create ...` # * `conda env update ...` # * `pip install -r ...` # * `pip install -e ...` # * `pip install --user ...` # * What to use Where, and Why # # Outline # # * Python Ecosystem # * Python Modules # * Python Packages # * Package Managers # * Python Environments # * Environment Description Files # * Special Ways to Install Packages # # Python Ecosystem # # * Interpreter (https://docs.python.org/3/reference/index.html) # * Written in C # * Standard Library (Included with the language) # (https://docs.python.org/3/library/index.html) # * Built-in Functions, Built-in Constants, Built-in Exceptions # * Also written in C # # * Python modules that you `import` # * Community Developed Packages # * Collections of Python modules that we install so that we can `import` things from them # # Python Modules # # * Python code in a `.py` file # * Usual contents: # * `import` statements # * maybe constant definitions # * maybe class definitions (`class`) # * function definitions (`def`) # * Everything in a module is executed when it is imported # * Execution of function and class definitions are what makes them available # to be called # # Example: `nowcast.workers.collect_river_data.py` # # Python Packages # # * Collection of Python modules with some metadata # * Mechanism for distributing Python code between users # * Distribution via the Internet: # * `conda` uses package channels; e.g. `conda-forge` # * `pip` uses package indices; e.g. PyPI # * code repository clones from GitHub, Bitbucket, GitLab, ... # # ### Aside # # [Docs about how MOAD Python packages are structured, and why](https://ubc-moad-docs.readthedocs.io/en/latest/python_packaging/pkg_structure.html) # ## Packages Give Us 2 Features/Challenges # # 1. Can include Python extensions written in C # * NumPy, SciPy, netCDF4, ... # * Compiler(s), libraries, build tools, etc. are required to install from source code # 2. Packages can depend on other packages; i.e. PackageA requires that PackageB is also installed in order to work # * Code sharing and re-use; "standing on the shoulders of giants" # * Leads to a web of dependencies # * Need to construct and solve a graph to satisfy package version constraints # # Package Managers # # Because `sys.path.append(...)` doesn't scale 😱 # # * Download packages we want to install, # and their dependencies, # from the Internet # * Store the package files where our Python code can find them # * Additional feature: # * Isolate package files in places that don't require special permissions, # and that don't break the operating system by over-writing other versions; avoiding: # * `pip install` ... permission denied # * ~`sudo pip install`~ 😱 # # Package Managers # # ## pip # # ## conda # # #### Others past, and present # ## pip # # * **p**ip **i**nstalls **p**ackages # * But only from source code # * Until recently, when "wheels" were introduced # * Naive dependency resolver # * Until very recently: "new resolver" in pip=20.2.4 released 2020-10-16 # * Package isolation is not part of `pip`; it is separate, but highly recommended # ## conda # # Scientific Python community, # lead by Travis Oliphant in 2012, # couldn't wait for the Python Packaging Authority (PyPA)'s plans for pip, # wheels, # and a sophisticated dependency resolver to come to fruition # ## conda # # * Packages are built by maintainer, not users # * Solves the build problem for C extensions # * Allows installation of binary packages that aren't even Python; e.g. gfortran # * Allows installation of different versions of Python itself # * Meta-packages; e.g. anaconda # * Dependency resolver looks at packages being installed *and* packages already installed # * But dependency resolution is still a hard problem... # * Implicitly uses environments to isolate collections of packages # * `pip` can be used inside `conda`-managed environments # # Progress Check # # * *What is a Python package?* # * What is a Python environment? # * *2 Python package managers: `conda` and `pip`* # * 2 Python environment managers: `conda env` and `virtualenv` # * 5 ways to install Python packages: # * `conda env create ...` # * `conda env update ...` # * `pip install -r ...` # * `pip install -e ...` # * `pip install --user ...` # * What to use Where, and Why # # Python Environments # # 1. Directory trees where Python package files are stored # 2. Manipulation of environment variables, especially PATH, when we enter/leave environments # # Python Environments # # # ### Directory trees where Python package files are installed # # * Each environment has its own directory structure # * In user file space # * Isolates those packages from other environments and from the system Python packages, avoiding: # * `pip install` ... permission denied # * ~`sudo pip install`~ 😱 # * Breaking your operating system by overwriting Python packages it installed # # Python Environments # # # ### Environment variable manipulation on activate/deactivate # # * Set PATH to ensure that the operating system finds the Python packages in the environment before it looks in the usual operating system places # * Maybe other environment variables too # * Change the command-line prompt to remind you what environment is activated # # `conda env` and `virtualenv` Environments # # `conda env` # # The first choice! # # Use them everywhere (except on Compute Canada HPC clusters): # * on your laptop # * on Waterhole workstations and `salish` # * on cloud VMs # # If you have `anaconda` installed, # you have `conda env`. # # If you are starting from scratch, # use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) # to get the `conda` package manager and `conda env` environment manager # without the hundreds of packages in the `anaconda` meta-package. # # `python3 -m virtualenv` # # Use them on Compute Canada HPC clusters. # # * Compute Canada have built wheels for many (but not all) # of the scientific Python packages. # In some cases those take advantage of specific features of # the HPC architecture. # * Their docs explicitly request us not to install `anaconda` on clusters. # # `module load python/3.8.2` (or whatever is the latest version) # includes `virtualenv` and `pip`. # # Environments are Rutabagas!!! # # ![image.png](attachment:image.png) # By Picasa user Seedambassadors - http://picasaweb.google.com/seedambassadors/SscVarieties#5296490359767135106&, CC BY-SA 3.0, Link # # Environments are Rutabagas!!! # # They are not pets. # # They are not unique snowflakes. # # They are not precious, artisinal, heritage artefacts. # # Just as rutabagas are planted, harvested, eaten, and composted... # # Environments should be created, used, and destroyed and replaced when they get old or rotten. # # Environment Descriptions are Precious # # Create them carefully for specific pieces of work. # # Track them using Git. # # Use them to update your environments. # # Commit changes. # # 2 Kinds of Environment Descriptions # # 1. List of packages # 2. List of specific package versions (version pinning) # # `conda env` # # * Create an environment description file: often `environment.yaml` # * Create the environment: # ```bash # conda env create -f environment.yaml # ``` # # * Activate the environment: # ```bash # conda activate env-name # ``` # # * To install a new package, edit `environment.yaml`, and update the environment: # ```bash # conda env update -f environment.yaml # ```` # ## A Small Example # # The `environment.yaml` for this notebook/slideshow: # In[4]: get_ipython().system('cat environment.yaml') # # Let's Build an Environment!! # # `conda env` # # Recall that environments are (mainly) about 2 things: a directory tree, and managing `PATH` # # * `conda env create` sets up the directory tree and installs the isolated packages in it # * That directory tree also includes an isolated copy of Python itself # * `conda activate` does the `PATH` environment variable manipulation # * It also sets some other envvars, and puts the env name in the shell prompt # # `conda` Environment Description Files # # * Store your `environment.yaml` files with your code or notebooks # * Create environments for specific pieces of work # * Commit your `environment.yaml` files in Git when you create them, # and whenever you change them # ## More Complicated Environment # # * More comments # * List of package channels; order or priority # * More version pinning # * `pip` as a dependency # * Using `pip` to install packages from PyPI because there is no `conda` package # In[2]: get_ipython().system('cat /media/doug/warehouse/MEOPAR/NEMO-Cmd/envs/environment-dev.yaml') # # Progress Check # # * *What is a Python package?* # * *What is a Python environment?* # * *2 Python package managers: `conda` and `pip`* # * 2 Python environment managers: *`conda env`* and `virtualenv` # * 5 ways to install Python packages: # * *`conda env create ...`* # * *`conda env update ...`* # * `pip install -r ...` # * `pip install -e ...` # * `pip install --user ...` # * What to use Where, and Why # # `python3 -m virtualenv` # # * Load a Python module: # ```bash # module load python/3.8.2 # ``` # # * Create the virtual environment: # ```bash # python3 -m virtualenv --no-download ~/venvs/env-name # ``` # # * Activate the environment: # ```bash # source ~/venvs/env-name/bin/activate # ``` # # `python3 -m virtualenv` # # * Create an environment description file: often `requirements.txt` # * Install the packages: # ```bash # python3 -m pip install -r requirements.txt # ``` # # * To install a new package, edit `requirements.txt`, and do an install from it: # ```bash # python3 -m pip install -r requirements.txt # ``` # ## A Small Example # # The `requirements.txt` for a useful `jupyterlab` environment on `graham`: # In[3]: get_ipython().system('cat requirements-graham-jupyter.txt') # # `python3 -m virtualenv` # # Recall that environments are (mainly) about 2 things: a directory tree, and managing `PATH` # # * `python3 -m virtualenv` sets up the directory tree # * That directory tree includes a symbolic link to the Python used to create it # * `source ~/venvs/env-name/bin/activate` does the `PATH` environment variable manipulation # * It also sets some other envvars, and puts the env name in the shell prompt # * `python3 -m pip install -r requirements.txt` installs the packages into the environment # # `virtualenv` Environment Description Files # # * Store your `requirements.txt` files with your code or notebooks # * Create environments for specific pieces of work # * Commit your `requirements.txt` files in Git when you create them, # and whenever you change them # # Aside: `python3 -m virtualenv` ??? # # The `-m` option on `python3` means: # # Search sys.path for the named module and execute # its contents # # This ensures that the `virtualenv` (or other package module) that you run is the one associated with the presently activate Python environment. # If there is no environment active, # it uses the system Python 3, # or the one from the HPC module you loaded. # # Getting things installed in the wrong environment is one of the biggest pain-points of using environments. # `python3 -m` avoids that. # # # ### Docs # https://docs.python.org/3/using/cmdline.html#cmdoption-m # # 2 Kinds of Environment Descriptions # # 1. List of packages you decided to install # * Everyday use of package descriptions # * Easily updatable or re-creatable to get new releases # # 2. List of all packages with exact version details (version pinning) # * Useful when you want to be able to get back to exact package versions; # e.g. record the environment you used to do analysis for a paper when you submit the paper # # `conda env export` # # ```bash # conda env export > environment-pinned.yaml # ``` # * List of *all* packages in env with version numbers and build specifications # * Includes `pip` installed packages from original environment description # # Docs: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html # # `pip list` # # ```bash # python3 -m pip list --format=freeze > requirements-pinned.txt # ``` # * List of *all* packages in env with version numbers # * Use it for `python3 -m virtualenv` environments on Compute Canada clusters # # Progress Check # # * *What is a Python package?* # * *What is a Python environment?* # * *2 Python package managers: `conda` and `pip`* # * *2 Python environment managers: `conda env` and `virtualenv`* # * 5 ways to install Python packages: # * *`conda env create ...`* # * *`conda env update ...`* # * *`python3 -m pip install -r ...`* # * `python3 -m pip install -e ...` # * `python3 -m pip install --user ...` # * What to use Where, and Why # # `python3 -m pip install -e` ??? # # The `-e` option (short for `--editable`) on `pip install` means: # # Install the package using symbolic links, # such that it’s available on sys.path, # yet can still be edited directly from its source files. # # We use this for our group-developed packages. # It makes the workflow for getting updates into our installed packages (usually) a simple `git pull` in the package repository clone directory. # # Example: # ```bash # python3 -m pip install -e moad_tools # ``` # # `python3 -m pip install -e` # # Editable installs facilitate: # # * you to easily write and test modules without a package build step # # `python3 -m pip install -e` # # Editable installs avoid: # # * me having to build releases of our packages and upload them to a package channel # * me having to decide when the changes that have happened warrant building a release # * you having to wait for me to do those things # * you having to install the new release into your environment to get access to the changes (mostly) # * exceptions: # * version changes # * command-line interface changes in scripts/tools that use `Click` or `Cliff` for their CLI # # Installing to the User Site # # `python3 -m pip install --user package` # # * Limited usefulness # * Really only for packages that provide a command-line interface # (i.e. not about being able to `import` from the package) # * Installs files into a hidden tree in your home directory # * Typically `~/.local/`, but `python3 -m site --user-base` will say for sure # * Also need to ensure that `~/.local/bin` is near the front of your `PATH` # # Installing to the User Site # # We use this on HPC machines to install packages like `NEMO-Cmd`, `SalishSeaCmd`, and `MOHID-Cmd` from our Git clones: # ```bash # python3 -m pip install --user -e $PROJECT/$USER/MEOPAR/NEMO-Cmd/ # ``` # # It makes it so that you can do `nemo run`, `salishsea run`, or `mohid run` without worrying about activating a Python environment. # # Relies on: # # * Already having done `module load python/3.8.2` # * Having `$HOME/.local/bin` in your `PATH` # # Summary # # * What is a Python package? # * What is a Python environment? # * 2 Python package managers: `conda` and `pip` # * 2 Python environment managers: `conda env` and `virtualenv` # * 5 ways to install Python packages: # * `conda env create ...` # * `conda env update ...` # * `pip install -r ...` # * `pip install -e ...` # * `pip install --user ...` # * What to use Where, and Why # # Take-aways # # * Use `conda env` to create and update "project-specific" environments # * Use `python3 -m virtualenv` on Compute Canada # * Track your `environment.yaml` or `requirements.txt` environment descriptions with Git # * Keep environment descriptions with the notebooks/code that use them # * Don't hesitate to remove and re-create environments to keep them fresh