Are you developing a backend service in Python? I have two pieces of advice for you:
- Do not use pip and
requirements.txt
to manage Python dependencies. They lack crucial features that should be built-in. - Use Poetry instead.
To me, the first one is a no-brainer. The second one is more tentative: Poetry is a great option, but it’s hardly the only option worth considering. I’ll explain below.
pip’s missing features
pip is a tool that you can use to install packages from The Python Package Index (PyPI). It comes with Python and if you’re a Python developer, you have probably used it many times.
The traditional way to manage dependencies for a Python project was to list them in a file called requirements.txt
and use pip install -r requirements.txt
to install. However, pip was designed to be a package installer and not a full-fledged project workflow tool. pip lacks two essential features, dependency lockfiles and automatic management of virtualenvs.
Dependency lockfiles
If you want to get same behavier in all environments - your laptop, CI, production - you need to pin the versions of your dependencies and their transitive dependencies. You can pin the versions of your direct dependencies in a requirements.txt
by specifying for example requests==2.31.0
instead of requests
.
However, pip won’t pin the versions of the transitive dependencies. This can be solved by using pip-tools to expand requirements.txt
into a file that lists the full dependency graph with exact versions and checksums for the artifacts. pip-tools is great but you need to set up it yourself and figure out how it fits your workflow.
This feature is table stakes in other languages - for example, npm has had package-lock.json
for years now and Cargo has Cargo.lock
. This really should be a built-in feature in a project workflow tool.
Automatic management of virtualenvs
The way to create isolated environments in Python is by the use of virtualenvs. Traditionally you manage them manually: you create one with a shell command (python -m venv example
to create a virtualenv called example
) and when you want to use it, you need to activate it with another shell command.
This is error-prone: forgetting to activate the virtualenv or activating a wrong virtualenv are common mistakes. There are bunch of workarounds. For example, you can use pyenv-virtualenv to make your shell auto-activate a virtualenv when you enter a project directory. direnv can do it, too.
Again, this too should be a built-in feature in your workflow tool. You should not need to glue multiple tools together. You won’t hear about npm or Cargo users having problems with virtualenvs.
Poetry and other options
Fortunately, a lot of people have identified these problem and worked to solve them. Less fortunately, this has resulted in an explosion of Python project workflow tools. So how to pick one?
My recommendation is: go with Poetry. It has lockfiles, it has virtualenv management, it’s popular and actively developed. In my experience, it’s not perfect but it works.
You could also consider Hatch or PDM. They’re similar to Poetry. I haven’t used them myself, but I’ve heard other people use them with success. Hatch seems to be especially popular with library authors.
If you’re looking for a more powerful option that can deal with e.g. multiple subprojects, Pants build system has great Python support. It has significantly steeper learning curve however.
Finally, if you’re looking for a rustup-style solution that can install Python for you, there’s rye. It’s new and experimental, but maybe it’s the right choice for you?
Where is the canonical workflow tool?
It would be great if Python came with a canonical project workflow tool. A lot of people wish that pip would become one. Node.js comes with npm and Rust comes with Cargo, so why can’t Python come with one? Why are there so many competing options?
The biggest obstacle, to my knowledge, is that since Python is used so widely and for so many different use cases, coming up with a universal official solution is difficult and slow (and underfunded) work. It’s not clear if pip is the right home for these features, either.
If you want to learn more, read and listen to these people who are, unlike me, deeply involved in the Python community:
- Stargirl (Thea Flowers) on Fediverse: So You Want to Solve Python Packaging: A Practical Guide
- Pradyun Gedam: Thoughts on the Python packaging ecosystem
- Talk Python to Me (podcast): Reimagining Python’s Packaging Workflows
An aside on Clojure
Clojurists reading my blog may ask: hey, what about Clojure, how come we do not have lockfiles? That’s a great question!
The Clojure community has solved this by always using explicit versions instead of version ranges for dependencies, even in libraries. The version descriptors would actually support ranges, but nobody ever uses them. This way, as long as the version resolution algorithm is stable, you always get the same versions.
In theory, the transitive dependency version mismatches could be a problem, but Clojure is amenable to a coding style where it rarely causes issues.
In contrast, in Python and Node.js communities it is expected that libraries list version ranges for their dependencies and the package management tools complain about version mismatches.