#!/usr/bin/env python
# coding: utf-8
# # Python Packages and Environments
#
# ## What to use Where, and Why
#
# MOAD Group Software **Discussion**
#
# 16 & 17 Dec 2020
# This notebook can be viewed as a slideshow by using the
# (RISE)[https://rise.readthedocs.io/en/stable/index.html]
# slide show extension for Jupyter.
#
# *Note: RISE only works with `jupyter notebook`, not with `jupyter lab` :-(*
#
# If you are working in an up to date clone of the
# (UBC-MOAD/PythonNotes repo)[https://github.com/UBC-MOAD/PythonNotes],
# you can run the slideshow locally.
# To do so:
# * create an conda environment containing `jupyter` and `rise` with:
# ```bash
# conda env create -f PythonNotes/pkgs-envs/environment.yaml
# ```
# * start `jupyter notebook`
# * open `PythonNotes/pkgs-envs/PythonPkgsEnvsSlides.ipynb`
# * use `Alt+r` or the `Enter/Exit RISE Slideshow` toobar button to start/stop the slideshow mode
# * use `Space` and `Shift+Space` to navigate forward and backward through the slide cells
# * What is a Python package?
# * What is a Python environment?
# * 2 Python package managers: `conda` and `pip`
# * 2 Python environment managers: `conda env` and `virtualenv`
# * 5 ways to install Python packages:
# * `conda env create ...`
# * `conda env update ...`
# * `pip install -r ...`
# * `pip install -e ...`
# * `pip install --user ...`
# * What to use Where, and Why
# # Outline
#
# * Python Ecosystem
# * Python Modules
# * Python Packages
# * Package Managers
# * Python Environments
# * Environment Description Files
# * Special Ways to Install Packages
# # Python Ecosystem
#
# * Interpreter (https://docs.python.org/3/reference/index.html)
# * Written in C
# * Standard Library (Included with the language)
# (https://docs.python.org/3/library/index.html)
# * Built-in Functions, Built-in Constants, Built-in Exceptions
# * Also written in C
#
# * Python modules that you `import`
# * Community Developed Packages
# * Collections of Python modules that we install so that we can `import` things from them
# # Python Modules
#
# * Python code in a `.py` file
# * Usual contents:
# * `import` statements
# * maybe constant definitions
# * maybe class definitions (`class`)
# * function definitions (`def`)
# * Everything in a module is executed when it is imported
# * Execution of function and class definitions are what makes them available
# to be called
#
# Example: `nowcast.workers.collect_river_data.py`
# # Python Packages
#
# * Collection of Python modules with some metadata
# * Mechanism for distributing Python code between users
# * Distribution via the Internet:
# * `conda` uses package channels; e.g. `conda-forge`
# * `pip` uses package indices; e.g. PyPI
# * code repository clones from GitHub, Bitbucket, GitLab, ...
#
# ### Aside
#
# [Docs about how MOAD Python packages are structured, and why](https://ubc-moad-docs.readthedocs.io/en/latest/python_packaging/pkg_structure.html)
# ## Packages Give Us 2 Features/Challenges
#
# 1. Can include Python extensions written in C
# * NumPy, SciPy, netCDF4, ...
# * Compiler(s), libraries, build tools, etc. are required to install from source code
# 2. Packages can depend on other packages; i.e. PackageA requires that PackageB is also installed in order to work
# * Code sharing and re-use; "standing on the shoulders of giants"
# * Leads to a web of dependencies
# * Need to construct and solve a graph to satisfy package version constraints
# # Package Managers
#
# Because `sys.path.append(...)` doesn't scale π±
#
# * Download packages we want to install,
# and their dependencies,
# from the Internet
# * Store the package files where our Python code can find them
# * Additional feature:
# * Isolate package files in places that don't require special permissions,
# and that don't break the operating system by over-writing other versions; avoiding:
# * `pip install` ... permission denied
# * ~`sudo pip install`~ π±
# # Package Managers
#
# ## pip
#
# ## conda
#
# #### Others past, and present
# ## pip
#
# * **p**ip **i**nstalls **p**ackages
# * But only from source code
# * Until recently, when "wheels" were introduced
# * Naive dependency resolver
# * Until very recently: "new resolver" in pip=20.2.4 released 2020-10-16
# * Package isolation is not part of `pip`; it is separate, but highly recommended
# ## conda
#
# Scientific Python community,
# lead by Travis Oliphant in 2012,
# couldn't wait for the Python Packaging Authority (PyPA)'s plans for pip,
# wheels,
# and a sophisticated dependency resolver to come to fruition
# ## conda
#
# * Packages are built by maintainer, not users
# * Solves the build problem for C extensions
# * Allows installation of binary packages that aren't even Python; e.g. gfortran
# * Allows installation of different versions of Python itself
# * Meta-packages; e.g. anaconda
# * Dependency resolver looks at packages being installed *and* packages already installed
# * But dependency resolution is still a hard problem...
# * Implicitly uses environments to isolate collections of packages
# * `pip` can be used inside `conda`-managed environments
# # Progress Check
#
# * *What is a Python package?*
# * What is a Python environment?
# * *2 Python package managers: `conda` and `pip`*
# * 2 Python environment managers: `conda env` and `virtualenv`
# * 5 ways to install Python packages:
# * `conda env create ...`
# * `conda env update ...`
# * `pip install -r ...`
# * `pip install -e ...`
# * `pip install --user ...`
# * What to use Where, and Why
# # Python Environments
#
# 1. Directory trees where Python package files are stored
# 2. Manipulation of environment variables, especially PATH, when we enter/leave environments
# # Python Environments
#
#
# ### Directory trees where Python package files are installed
#
# * Each environment has its own directory structure
# * In user file space
# * Isolates those packages from other environments and from the system Python packages, avoiding:
# * `pip install` ... permission denied
# * ~`sudo pip install`~ π±
# * Breaking your operating system by overwriting Python packages it installed
# # Python Environments
#
#
# ### Environment variable manipulation on activate/deactivate
#
# * Set PATH to ensure that the operating system finds the Python packages in the environment before it looks in the usual operating system places
# * Maybe other environment variables too
# * Change the command-line prompt to remind you what environment is activated
# # `conda env` and `virtualenv` Environments
# # `conda env`
#
# The first choice!
#
# Use them everywhere (except on Compute Canada HPC clusters):
# * on your laptop
# * on Waterhole workstations and `salish`
# * on cloud VMs
#
# If you have `anaconda` installed,
# you have `conda env`.
#
# If you are starting from scratch,
# use [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
# to get the `conda` package manager and `conda env` environment manager
# without the hundreds of packages in the `anaconda` meta-package.
# # `python3 -m virtualenv`
#
# Use them on Compute Canada HPC clusters.
#
# * Compute Canada have built wheels for many (but not all)
# of the scientific Python packages.
# In some cases those take advantage of specific features of
# the HPC architecture.
# * Their docs explicitly request us not to install `anaconda` on clusters.
#
# `module load python/3.8.2` (or whatever is the latest version)
# includes `virtualenv` and `pip`.
# # Environments are Rutabagas!!!
#
# ![image.png](attachment:image.png)
# By Picasa user Seedambassadors - http://picasaweb.google.com/seedambassadors/SscVarieties#5296490359767135106&, CC BY-SA 3.0, Link
# # Environments are Rutabagas!!!
#
# They are not pets.
#
# They are not unique snowflakes.
#
# They are not precious, artisinal, heritage artefacts.
#
# Just as rutabagas are planted, harvested, eaten, and composted...
#
# Environments should be created, used, and destroyed and replaced when they get old or rotten.
# # Environment Descriptions are Precious
#
# Create them carefully for specific pieces of work.
#
# Track them using Git.
#
# Use them to update your environments.
#
# Commit changes.
# # 2 Kinds of Environment Descriptions
#
# 1. List of packages
# 2. List of specific package versions (version pinning)
# # `conda env`
#
# * Create an environment description file: often `environment.yaml`
# * Create the environment:
# ```bash
# conda env create -f environment.yaml
# ```
#
# * Activate the environment:
# ```bash
# conda activate env-name
# ```
#
# * To install a new package, edit `environment.yaml`, and update the environment:
# ```bash
# conda env update -f environment.yaml
# ````
# ## A Small Example
#
# The `environment.yaml` for this notebook/slideshow:
# In[4]:
get_ipython().system('cat environment.yaml')
# # Let's Build an Environment!!
# # `conda env`
#
# Recall that environments are (mainly) about 2 things: a directory tree, and managing `PATH`
#
# * `conda env create` sets up the directory tree and installs the isolated packages in it
# * That directory tree also includes an isolated copy of Python itself
# * `conda activate` does the `PATH` environment variable manipulation
# * It also sets some other envvars, and puts the env name in the shell prompt
# # `conda` Environment Description Files
#
# * Store your `environment.yaml` files with your code or notebooks
# * Create environments for specific pieces of work
# * Commit your `environment.yaml` files in Git when you create them,
# and whenever you change them
# ## More Complicated Environment
#
# * More comments
# * List of package channels; order or priority
# * More version pinning
# * `pip` as a dependency
# * Using `pip` to install packages from PyPI because there is no `conda` package
# In[2]:
get_ipython().system('cat /media/doug/warehouse/MEOPAR/NEMO-Cmd/envs/environment-dev.yaml')
# # Progress Check
#
# * *What is a Python package?*
# * *What is a Python environment?*
# * *2 Python package managers: `conda` and `pip`*
# * 2 Python environment managers: *`conda env`* and `virtualenv`
# * 5 ways to install Python packages:
# * *`conda env create ...`*
# * *`conda env update ...`*
# * `pip install -r ...`
# * `pip install -e ...`
# * `pip install --user ...`
# * What to use Where, and Why
# # `python3 -m virtualenv`
#
# * Load a Python module:
# ```bash
# module load python/3.8.2
# ```
#
# * Create the virtual environment:
# ```bash
# python3 -m virtualenv --no-download ~/venvs/env-name
# ```
#
# * Activate the environment:
# ```bash
# source ~/venvs/env-name/bin/activate
# ```
# # `python3 -m virtualenv`
#
# * Create an environment description file: often `requirements.txt`
# * Install the packages:
# ```bash
# python3 -m pip install -r requirements.txt
# ```
#
# * To install a new package, edit `requirements.txt`, and do an install from it:
# ```bash
# python3 -m pip install -r requirements.txt
# ```
# ## A Small Example
#
# The `requirements.txt` for a useful `jupyterlab` environment on `graham`:
# In[3]:
get_ipython().system('cat requirements-graham-jupyter.txt')
# # `python3 -m virtualenv`
#
# Recall that environments are (mainly) about 2 things: a directory tree, and managing `PATH`
#
# * `python3 -m virtualenv` sets up the directory tree
# * That directory tree includes a symbolic link to the Python used to create it
# * `source ~/venvs/env-name/bin/activate` does the `PATH` environment variable manipulation
# * It also sets some other envvars, and puts the env name in the shell prompt
# * `python3 -m pip install -r requirements.txt` installs the packages into the environment
# # `virtualenv` Environment Description Files
#
# * Store your `requirements.txt` files with your code or notebooks
# * Create environments for specific pieces of work
# * Commit your `requirements.txt` files in Git when you create them,
# and whenever you change them
# # Aside: `python3 -m virtualenv` ???
#
# The `-m` option on `python3` means:
#
# Search sys.path for the named module and execute
# its contents
#
# This ensures that the `virtualenv` (or other package module) that you run is the one associated with the presently activate Python environment.
# If there is no environment active,
# it uses the system Python 3,
# or the one from the HPC module you loaded.
#
# Getting things installed in the wrong environment is one of the biggest pain-points of using environments.
# `python3 -m` avoids that.
#
#
# ### Docs
# https://docs.python.org/3/using/cmdline.html#cmdoption-m
# # 2 Kinds of Environment Descriptions
#
# 1. List of packages you decided to install
# * Everyday use of package descriptions
# * Easily updatable or re-creatable to get new releases
#
# 2. List of all packages with exact version details (version pinning)
# * Useful when you want to be able to get back to exact package versions;
# e.g. record the environment you used to do analysis for a paper when you submit the paper
# # `conda env export`
#
# ```bash
# conda env export > environment-pinned.yaml
# ```
# * List of *all* packages in env with version numbers and build specifications
# * Includes `pip` installed packages from original environment description
#
# Docs: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
# # `pip list`
#
# ```bash
# python3 -m pip list --format=freeze > requirements-pinned.txt
# ```
# * List of *all* packages in env with version numbers
# * Use it for `python3 -m virtualenv` environments on Compute Canada clusters
# # Progress Check
#
# * *What is a Python package?*
# * *What is a Python environment?*
# * *2 Python package managers: `conda` and `pip`*
# * *2 Python environment managers: `conda env` and `virtualenv`*
# * 5 ways to install Python packages:
# * *`conda env create ...`*
# * *`conda env update ...`*
# * *`python3 -m pip install -r ...`*
# * `python3 -m pip install -e ...`
# * `python3 -m pip install --user ...`
# * What to use Where, and Why
# # `python3 -m pip install -e` ???
#
# The `-e` option (short for `--editable`) on `pip install` means:
#
# Install the package using symbolic links,
# such that itβs available on sys.path,
# yet can still be edited directly from its source files.
#
# We use this for our group-developed packages.
# It makes the workflow for getting updates into our installed packages (usually) a simple `git pull` in the package repository clone directory.
#
# Example:
# ```bash
# python3 -m pip install -e moad_tools
# ```
# # `python3 -m pip install -e`
#
# Editable installs facilitate:
#
# * you to easily write and test modules without a package build step
# # `python3 -m pip install -e`
#
# Editable installs avoid:
#
# * me having to build releases of our packages and upload them to a package channel
# * me having to decide when the changes that have happened warrant building a release
# * you having to wait for me to do those things
# * you having to install the new release into your environment to get access to the changes (mostly)
# * exceptions:
# * version changes
# * command-line interface changes in scripts/tools that use `Click` or `Cliff` for their CLI
# # Installing to the User Site
#
# `python3 -m pip install --user package`
#
# * Limited usefulness
# * Really only for packages that provide a command-line interface
# (i.e. not about being able to `import` from the package)
# * Installs files into a hidden tree in your home directory
# * Typically `~/.local/`, but `python3 -m site --user-base` will say for sure
# * Also need to ensure that `~/.local/bin` is near the front of your `PATH`
# # Installing to the User Site
#
# We use this on HPC machines to install packages like `NEMO-Cmd`, `SalishSeaCmd`, and `MOHID-Cmd` from our Git clones:
# ```bash
# python3 -m pip install --user -e $PROJECT/$USER/MEOPAR/NEMO-Cmd/
# ```
#
# It makes it so that you can do `nemo run`, `salishsea run`, or `mohid run` without worrying about activating a Python environment.
#
# Relies on:
#
# * Already having done `module load python/3.8.2`
# * Having `$HOME/.local/bin` in your `PATH`
# # Summary
#
# * What is a Python package?
# * What is a Python environment?
# * 2 Python package managers: `conda` and `pip`
# * 2 Python environment managers: `conda env` and `virtualenv`
# * 5 ways to install Python packages:
# * `conda env create ...`
# * `conda env update ...`
# * `pip install -r ...`
# * `pip install -e ...`
# * `pip install --user ...`
# * What to use Where, and Why
# # Take-aways
#
# * Use `conda env` to create and update "project-specific" environments
# * Use `python3 -m virtualenv` on Compute Canada
# * Track your `environment.yaml` or `requirements.txt` environment descriptions with Git
# * Keep environment descriptions with the notebooks/code that use them
# * Don't hesitate to remove and re-create environments to keep them fresh