#!/usr/bin/env python # coding: utf-8 # # Pip vs. Pipenv vs. Poetry # **_A comparitive study_** # ## 1. Introduction # Python application are prolific and its use in various platforms is common case. The bedrock of any these applications is the use of Python libraries and packages. # # Python packages may be standalone or may be dependent on other packages. These dependent packages may be built using other packages and libraries. This leads to the original package being transitively dependant on the sub-package. # While these dependancies allow for the development of powerful applications, they can also be a cause for great concern if not addressed. # # Certain packages may often require updates to patch security issues or improve performance. However, these updates may not be compatible to packages on the top level causing a failure when all related packets are run together. This makes package management an essential component of python development. # # PIP is the default package manager of Python. It allows the installation of packages using the command _pip install_. However PIP on its own, deals a number of issues. # ## 2. The PIP Problem # Before understanding the problems caused by PIP, it is essential to understand how Python works with respect to managing packages, maintaining isolated packages and working with different code stages like development, production etc. # ### 2.1 Environment Friendly # PIP is the default package manager associated to python. Once Python is installed, PIP is used to enable the installation of the required packages and libraries. With PIP, a developer can choose to install the latest version by default or choose to install a specific version of a package. # # In general when python is installed in a system, the version of Python that has been installed applies globally to the system. This follows for any related packages that are installed thereafter. The installation is global. The downside to this, however, is that if a new project requires an older version of python or an older version of an pre-installed package. The developer would be required to uninstall the existing version of the package and re-install the older version which renders any of the projects created with the uninstalled version to be redundent. # # To prevent this scenario, Python enables the installation of **environments**. An environment allows a developer to install different versions of Python and any of its related packages while keeping it isolated from another mix of packages that may be used for another project. This allows a developer to run multiple projects with different package versions without uninstalling packages and rendering older projects obselete. In Python 2.0, this environment can be created using the command _virtualenv_, Python 3.0 uses the command _venv_ # ### 2.2 Freeze... the requirements! # Python also allows a developer to output all of the packages runnining in an environment and required by an application to run smoothly to a separate file. This file is usually named **_requirements.txt_**. # # The command _pip freeze>\/requirements.txt_ prints all packages associated to an environment and their related sub-packages to the mentioned file. This file could be used to install all the packages and sub-packages in the host system and keep it the same as the development system keeping the build **deterministic**. # The use of the _pip freeze_ command ensures that the dependencies are **_pinned_** i.e. the _requirements.txt_ file contains the exact version of the package and relative sub-packages. # # The solution though, is not without flaws. A scenario could arise wherein a sub-package associated to a installed package has had updates by maintainers to fix an issue that will create a future issue such as a memory leak or decreased performance. A developer would, in this case have to manually update the _requirements_ file and make a similar update on the production build. This defeats the intended deterministic approach. # ### 2.3 The Problems # The main problem therefore with PIP is that the developer is left to maintain the _requirements_ file. In addition to this the developer may also be required to maintain slightly different _requirements_ files depending on the environments (_not the same as pythong environments discussed earlier_) in to which the application is being deployed. # # The development environment may have additional packages or libraries that may be required to ensure that what is pushed to production is cleared of any bugs or performance issues. Use of such libraries like pytest in the production environment is unnecessary. However, it deployment in to production using PIP cannot be controlled. # # Until PIP v20.3, it was not possible to [resolve dependencies](https://github.com/pypa/pip/issues/988) created when two or more packages were dependent on multiple versions of an underlying sub-package. i.e. assume that package 1 depended on Matplotlib v>=3.1 and package 2 depended on Matplotlib v<=5. A dependency resolver (which could be a package or a library) would identify the appropriate version of Matplotlib that satisfies the needs of both packages automatically and install the same. While the dependency resolver released by PIP v20.3 does resolve the issue, there is an expectation that it may [lack in performance](https://github.com/pypa/pip/issues/988#issuecomment-738928473) and will require fixes over time. # ## 3. Pipenv aka the envy of pip # **Understanding Pipenv package** # Pipenv is the replacement library for pip. Besides replacing the _requirements_ file using a Pipfile, Pipenv uses a _Pipfile.lock_ file or a piplock file to ensure deterministic builds. The other advantage of Pipenv is that new environments can be generated using Pipenv instead of using the _venv_ or _virtualenv_ command. # # In effect Pipenv combines the best of pip and virtual environments. # ### 3.1 How it works # When first a package is created in an environment generated using Pipenv, the system creates two files namely _Pipfile_ and a _Pipfile.lock_ file. # # The pip file replaces the _requirements_ file of pip. The syntax of the file is [TOML(_Tom's Obvious Minimal Language_)](https://realpython.com/python-toml/). It is created based on additional arguments or previously installed sub-packages installed by the developer during installation of a specific package. # # The file itself has 4 sections namely: # 1. source: Maintains essential details of the package, including its source of installation. # 2. dev_packages: This section only appears if the package is meant only for development environments. This section would be seen, for packages like pytest where the developer would install with the option _--dev_ e.g. pipenv install --dev # 3. packages: Maintains all the minimally sub dependent packages the main package depends on. # 4. requires: Maintain the packages it cannot do without e.g. a specific version of Python # ![image.png](attachment:image.png) # While installing the package, the developer can opt to install from a version control system or choose to install a specific version. The former option helps with resolving sub-dependency issues that we saw earlier with pip. # # In the example image provided above, the _requests_ package will be updated using the version control system, because its option has been set to _editable=True_, it allows Pipenv to resolve dependency issues by selecting an appropriate version from the source the helps to resolve a conflict created by a dependency. # # The _flask_ package in the image is an example of a sub-dependency created using a specific version. # # Another point to take note, is that while the _pytest_ package may have other sub_dependent packages they have not all been mentioned, only the packages that have specific requirements have been added as part of this installation. All other sub-packages will be installed by pipenv. # The _Piplock_ file is written in to once the developer has installed the necessary packages and wants to freeze the packages that must be available in the production environment. The file is a listing of all the packages excluding the package identified as _dev-packages_ in the _Pipfile_. # ![image.png](attachment:image.png) # The file syntax is in the JSON format and essentially contains all the packages and sub-packages along with their version numbers and secure hash keys. Ideally this file is never touched unlike the _requirements_ file used in _pip_. It is notable that unlike the _Pipfile_, every package has all its sub-packages listed along with their respective versions and hash keys ensuring that the environment in production is exactly the same as in development minus the packages specific to development. # ### 3.2 Pipenv vs. Pip # In an earlier section, we identified 3 issues with pip. In this section, we shall revisit those issues and attempt to identify how Pipenv resolves those issues and does better. # # Three of the issues we identified earlier with pip were: # 1. Dependence on the developer to maintain the _requirements_ file in both the development and production environments while keeping watch for updates to packages and sub-packages. # 2. Carry-over of packages that are relevant to only the development environment in to the production environment. # 3. Requirement of developers for sub-package dependency resolution. # # Issues 1 and 3 are resolved by Pipenv by the use of editable installations of a package from the version control system. This feature give Pipenv control to keep watch of updates on any package and update or downgrade a package based on the needs of the installed packages. # # An additional feature of Pipenv, is that if a new package is later installed, Pipenv allows other dependencies and sub-dependencies to update their versions as long as it does not break the existing dependencies. # If there are sub-dependencies that may break because of required installations, Pipenv warns the developer of the same and provides a command _pipenv graph_ to view the dependencies. This gives developers the oppurtunity to resolve something that Pipenv is unable to resolve. The outcome example of the command is as given below: # ![image.png](attachment:image.png) # Issue 2 is resolved by the _dev_package_ section in the _Pipfile_ which highlights all packages that are not required at the production environment thus preventing unnecessary packages from going in to the production environment. # Based on the advantages seen above, it is quite easy to understand why Pipenv is the [recommended package management tool by Python](https://packaging.python.org/en/latest/tutorials/managing-dependencies/#managing-dependencies). # # While Pipenv does resolve many of the issues seen in pip and is recommended, it has an issue relating to performance, especially when installing packages. Moreover Pipenv does not have a _Piplock_ versioning feature. These issues are resolved by another package manager called Poetry. # ## 4. Poetry... slightly better Pipenv # Like Pipenv, Poetry is a package management tool. It has all the features provided by Pipenv and shares the same advantages over Pip. # # Like Pipenv, it maintains files similar to _Pipfile_ and _Piplock_ file called _pyproject.toml_ and _poetry.lock_. Similar to Pipenv, the developer can specificy packages that are meant only for the development environment and like Pipenv, Poetry can display a tree that displays all the dependencies and associated sub-dependencies. # # In effect Poetry is almost very similar to Pipenv with regards to functionality. However, performance wise Pipenv does seem to lag. # ### 4.1 So what makes Poetry better? # #### Performance # While Poetry and Pipenv are relatively similar. Poetry excels with regards to performance. As more packages are installed using Pipenv, the systems starts to slow down with regards to performance. This defect, however is not seen in Poetry. # # An excellent example of the same is highlighted in this [Youtube video](https://www.youtube.com/watch?v=aZTmnCkCa3M), wherein the poster runs an installation of multiple packages on both Poetry and Pipenv. As can be seen, as the number of packages increases Pipenv starts slowing down whereas Poetry remains steady. # # This issue was further highlighted in an in-person seminar conducted by Eddie Antonio S. that can be seen in (_first part_) [this video](https://www.youtube.com/watch?v=1GIIaGbL9qQ). Eddie also goes on to highlight that creating the lock file in Pipenv takes a considerable amount of time. This issue seems to be a highlight on Github among users that use Pipenv. # # Research in the matter reveals that Pipenv runs the existing lock file first and then moves to modifying the same. However, Poetry modifies the existing lock file and then proceeds to run the same. # # However it has to be highlighted that a user named matteius who is assumed to be a collaborator has [pushed a fix for the issue](https://github.com/pypa/pipenv/issues/356#issuecomment-1234096228) and based on the post, the performance of the current version seems to be better than Poetry. # #### Correctness # While both Pipenv and Poetry have the ability to find and install the appropriate sub-packages, Pipenv has a draw back in that it installs an incompatible sub-package first before checking whether there are more suitable subpackages. This renders all relative sub-packages redundant. # # However, Poetry does not install any sub-packages untill it has identified a sub-package that is appropriate for all dependent root packages, this ensures that the most appropriate packages are detected first before defecting the job to a developer who would have to do the same manually. # #### Automatic Versioning # Another point on which Pipenv lags behind Poetry is that Poetry allows automatic versioning of it lock file which is not a feature available in Pipenv. Pipenv requires a user to manually commit a file to versioning. # While there are a couple more advantages that Poetry has over Pipenv. I have not been able to fully comprehend them to detail them here. # ## 5. Conclusion # In this article we attempted to compare Pip, Pipenv and Poetry. While all three are package managers, we found that pip has many drawbacks in comparison to Pipenv. Pipenv attempts to resolve many of the deficiencies seen in pip. # # Besides this Pipenv introduces improvements that enhances a developers ability to work with python packages. However, Pipenv has a couple of faults of its own with regards to performance and maintaining correctness when attempting to find the appropriate sub-packages. Poetry, thus presents itself to be a significantly better improvement, improving upon Pipenv's drawbacks while providing all of the features available in it. # # Overall Poetry seems to be an excellent option for a package manager, however being relatively new to the bunch in comparison to Pipenv and PIP, only time will reveal more of its drawbacks. Moreover there are many other package managers that are currently being worked on that seem to have similar and incase of performance better outcomes than Poetry like [PDM](https://pdm.fming.dev/latest/) # ## 6. References # - [Pipenv: A Guide to the New Python Packaging Tool](https://realpython.com/pipenv-guide/#pipenv-introduction) # - [pip needs a dependency resolver](https://github.com/pypa/pip/issues/988) # - [Poetry, A Better Version of Python Pipenv](https://plainenglish.io/blog/poetry-a-better-version-of-python-pipenv) # - [5 Reasons Why Poetry Beats Pip Python Setup](https://betterprogramming.pub/5-reasons-why-poetry-beats-pip-python-setup-6f6bd3488a04) # - [Why Is Poetry Essential to the Modern Python Stack?](https://andrewbrookins.com/python/why-poetry/) # - [A Poetic Apology: Or Why Should You Use Poetry to Manage Python Dependencies](https://news.ycombinator.com/item?id=26093926) # - [Git - Should Pipfile.lock be committed to version control?](https://stackoverflow.com/questions/46278288/git-should-pipfile-lock-be-committed-to-version-control) # - [Feature Request: add lock file versioning for Poetry #1410 ](https://github.com/python-poetry/poetry/issues/1410) # - [A Review: Pipenv vs. Poetry vs. PDM](https://dev.to/frostming/a-review-pipenv-vs-poetry-vs-pdm-39b4) # - [PDM](https://pdm.fming.dev/latest/)