Python's New Package Landscape

May 11th, 2018 | 15 min. reading time

Dizzy Snake covered in colorful boxes.

Introduction

On April 16, 2018, the Python Packaging Authority (PyPA) deployed a new version of PyPI (pronounced pie-pea-eye), the official online repository of Python projects. The domain https://pypi.org has hosted the new version in its alpha and beta forms; now, with the update, the original URL (https://pypi.python.org/pypi) redirects to the new, simpler URL.

The original, known informally as the cheese shop after the Monty Python skit, has not had a full refresh since its inception in 2003. PyPI 2.0, code-named Warehouse, features a more modern architecture using tools not available at the time the first version was built.

PyPI is not the only part of the packaging ecosystem to evolve: the methods for structuring Python projects, building Python distributions, and installing these distributions have improved in the last two to four years. In light of the new version of PyPI, here is a high-level look at the modifications to keep you up to speed.

Dependency Managers: Simplifying Isolation and Adding Resolution

The pip tool—created in 2008 and released in 2011—has acted as Python's de facto installer for quite some time. It is a great tool, but using pip on its own comes with two key difficulties:

  1. Project isolation: if two different projects require two different versions of the same library, how does a developer ensure that a project is using the correct library version?
  2. Dependency synchronization: if a developer on a project adds a new dependency or upgrades an existing dependency, how does the developer ensure that other developers on the project sync their dependency graphs deterministically?

To solve the first pain point, Python developers have historically relied on virtual environments. Originally, this consisted of installing and configuring virtualenv or virtualenvwrapper. Starting with Python 3.3, Python ships the built-in venv module as well, providing developers with another option.

The second pain point has been largely unsolved in the Python world. Developers have relied on setup.py (discussed later in this article) and the convention of specifying a requirements.txt with a list of dependencies. Depending on the developer's intentions, it's generally recommended that the versions of the dependencies be specified exactly (pinned) or limited (e.g., Django>=2.0). The goal of pinning is to ensure consistent installations regardless of who installs, and when. However, properly pinning or limiting versions requires tricky manual work on the developer's part. One of the central difficulties comes from managing the dependencies of dependencies (and so on). Ensuring repeatable installation of the same versions of dependencies using only pip is thus very difficult.

Pipenv—first announced in January 2017—mitigates both pain points. Pipenv acts as a wrapper around pip and virtual environments and provides a seamless experience for working with the two tools to address the first pain point. Pipenv lessens the second pain point by implementing dependency resolution and by automating behavior. For instance, Pipenv saves the names and versions of dependencies being installed so that developers may forgo manually updating requirements.txt. Rather than relying on a list of dependencies in requirements.txt, Pipenv defines and creates Pipfile and Pipfile.lock files to manage dependencies. The first file defines the direct dependencies for the project, while the second saves all of the dependencies installed, ensuring consistent installations over time. Developers on teams will still need to remember to sync their dependencies when switching branches or pulling from remotes, but Pipenv reduces that work to a single command: pipenv sync.

The benefits of using Pipenv in application development led PyPA to recommend it for dependency management of applications. PyPA first added a tutorial for managing application dependencies with Pipenv in November 2017 and then listed Pipenv as a formal recommendation in February 2018.

To be clear, PyPA recommends Pipenv for applications, but not for libraries. Library dependencies need to be defined flexibly, so Pipenv is not suited to the task, given the strict pinning in Pipfile.lock.

Pipenv has garnered much attention because of PyPA's recommendation, but it is not the only new dependency manager. For instance, Poetry and Hatch both offer functionality that overlaps with Pipenv. All three tools wrap pip and virtualenv to handle the first pain point. However, this is where the tools begin to diverge in their feature sets.

Poetry and Pipenv both seek to resolve dependencies deterministically to solve the second pain point. Notably, Poetry seeks to make dependency resolution more reliable than Pipenv's implementation . What's more, Poetry is intended to manage dependencies for both applications and libraries. We'll discuss the reasons for this later in the article.

Hatch, which is still only version 0.20.0, does not currently focus on the second pain point for installing Python distributions; instead, it focuses on creating, managing, and testing libraries and applications with the aim of simplifying regular development tasks. Poetry and Hatch have some overlap here, but Hatch offers more features.

Another dependency manager—which predates Pipenv, Poetry, and Hatch—pip-tools focuses on the second pain point by ensuring consistent installations. It generates a requirements.txt based on the contents of another file: either requirements.in—a file format specified by pip-tools—or else setup.py (discussed later). Much like Pipenv, it defines a single command for syncing environments, making it easy for developers on teams to stay on the same page.

Not all tools focus on dependency management of codebases during development; some tools handle dependencies outside of development. For instance, pipsi allows for Python command line applications to be installed in separate virtual environments, making them appear global while isolating them from each other. For instance, if two command line scripts require two different versions of Click, pipsi enables the installation of both tools. Jacob Kaplan-Moss, one of Django's original core contributors, uses pipsi in his setup to install Pipenv.

In conclusion, although pip remains the key tool for installing distributions, and virtual environments are still necessary for isolation, a host of new tools make installation and isolation a more seamless experience. Some of these tools introduce dependency resolution, ensuring consistent installations of dependency trees for different developers over time. Pipenv is the official new tool for managing application dependencies, but it is not your only choice, and the alternatives may better suit your needs.

New Recommendations for a Robust Project Structure

Previously, how a developer decided to organize modules and packages in a source code repository was up to her, and was largely viewed as preference. However, the consensus has been growing for how Python libraries—code intended to be shared—should be organized based on known pitfalls in Python.

Ionel Cristian Mărieș first discussed using a src/ directory to protect Python code from specific pitfalls in 2014, but the reception was mixed. Hynek Schlawack noted in 2016 how he first disregarded this method only to be bitten by the problems Ionel had described after his initial post. Finally, tools like PyTest now recommend using this structure, and (as Hynek notes in his article) it makes getting testing right with Tox much easier.

I call it a growing consensus because PyPA's instructions for creating a sample project notably do not follow the src/ structure. With that said, the project also states in the Read Me that it "does not aim to cover best practices for Python project development as a whole."

Regardless, if you're starting a new Python project or running into the problems described in the aforementioned articles, it may be in your interest to switch the directory structure of your project for all of the reasons Ionel laid out.

New Tools for Building Python Libraries

The evolution of build tools for Python source code is perhaps the most important set of changes discussed in this article. These changes necessitate a closer look than the other topics covered (refill your tea/coffee!).

Packages or bundles of Python source code meant to be shared and installed by others are formally called distributions. They are so named to avoid confusion with the concept of a Python package, which is a collection of Python modules.

Distributions can be source distributions or built distributions. If someone is installing a source distribution—for example, installing directly from GitHub—their installer must perform a build step during the process. In contrast, an installer can simply place a built distribution in the right location. The term installation is used casually to refer to either the placement of a built distribution or the combined build process and placement.

Built distributions come in many formats, but we will focus on the two Python-only formats: eggs and wheels. Eggs were first available in Python 2.3 but have been effectively replaced by wheels, which were first proposed in PEP 427 in 2012. You can read more about their differences in PyPA's packaging guide or in the documentation for the wheels package.

Today, two tools are used to build and distribute Python code in these formats: distutils and setuptools.

Python's distutils has been used to bundle Python code since Python 1.6, which was released in parallel with Python 2.0 in late 2000. PEP 229, written in November 2000, first outlined the intent to use distutils to distribute Python code with Python itself.

Originating in 2004 and built on top of distutils, the setuptools project exists to overcome limitations in distutils and includes a tool called easy_install; setuptools was the tool that introduced the eggs format.

Python distributions follow basic rules imposed by distutils. In particular, all Python packaging and distribution tools—including pip and Pipenv—expect the existence of a file named setup.py in the root of a source distribution. This file is how distutils and setuptools create built distributions from source code.

PEP 517 and PEP 518—accepted in September 2017 and May 2016, respectively—changed this status quo by enabling package authors to select different build systems. Said differently, for the first time in Python, developers may opt to use a distribution build tool other than distutils or setuptools. The ubiquitous setup.py file is no longer mandatory in Python libraries.

As PEP 518 describes, developers may now include a TOML file named pyproject.toml in their codebases to specify what tools they want to use to build a distribution. The TOML file may additionally configure these build tools, and can provide settings for other tools. You can use these files today: pip has known to look for these files in repositories and source distributions since PR 4144 was merged in May 2017.

The Github respository for pip provides an example of how to write a pyproject.toml file. In this case, pip defines the use of setuptools and wheels for building distributions and further configures Amber Brown's towncrier project for generating news. However, Python package authors could eventually opt to build distributions with tools like Flit or the aforementioned Poetry. That's right: Poetry is not only a dependency manager but also a distribution builder and publisher that uses pyproject.toml. Although Python core contributor Brett Cannon recommends using Flit, and core contributor Mariatta Wijaya appears to agree, Poetry has also begun to draw attention from people like Jannis Leidel—one of the original authors of pip—likely because of its scope.

I suspect that we still have some time before the dust settles and the tools find their groove. Notably, Flit does not support the src/ project structure discussed earlier in this article, and Poetry is still pre-1.0, having had its first commit in February 2018.

All in all, making distribution building more modular and allowing for new tools are enormous changes.

Conclusion

XKCD Comic Mocking Python Environment

The ecosystem of dependency managers is improving rapidly, but with change comes confusion. Pipenv, Poetry, and Hatch wrap pip and virtual environments, but none of them replaces pip or virtual environments. Each one offers different features, solving different problems. As Thea Flowers, PyPA member, has noted, no single tool fulfills all requirements.

Build tools are also evolving rapidly, and I expect we will see more in the future, while the existing tools mature.

If you are wondering which tools to use, first ask yourself what you are trying to achieve. Do you need a dependency manager or a distribution builder? Are you programming a library or an application? Do you need to support older setups, which may still require the use of setup.py, or can your new distribution be geared toward the future? The answers to questions like these will inform the choices you make. The only place I find myself being prescriptive is in the use of the src/ project structure, as it avoids implicit errors and makes life easier for developers new to your project—but even then, not everyone agrees.

Packaging Python has long been a pain point for the language and the community (see Nick Coghlan's 2013 notes about packaging in Python for fun). The aforementioned changes are very positive, and we owe the individuals involved an enormous thank-you for all their hard work. Take a moment to thank them on Twitter, on GitHub, or in person!