A Smarter Package Manager

6 September 2017

Software evolves. When incompatible changes are made to the libraries you depend on, finding a set of compatible versions (resolving dependencies) can be very difficult.

That’s why we use package managers.

To resolve dependencies, a package manager needs a list of available versions and a method for determining compatibility. It’s become standard practice to determine compatibility with semantic versioning.

Semantic versioning codifies rules about version numbers so that they describe compatibility. Each version is tagged major.minor.patch. Breaking (incompatible) changes increment the major version, compatible additions increment minor, and bug fixes increment patch. The version numbers carry meaning, hence semantic versioning.

If libraries list compatible versions for their dependencies, then a package manager can find a compatible set of all dependencies automatically. (Or not find, if no compatible set exists for those dependencies and versions.)

This is a dramatic improvement over manual dependency resolution, but it’s far from perfect:

Semantic versioning doesn’t specify whether you’re declaring source compatibility or binary compatibility. For many languages, that’s okay. But it’s a real problem for languages with type inference and overloading like Swift’s.¹
It can be hard to know what changes are actually breaking. (This is true regardless of whether you’re aiming for source compatibility or binary compatibility.)
Even the smallest breaking change requires a new major version—even if it’s an API that no one uses. And when you release a new major version, then any libraries that depend on yours need to explicitly upgrade to the new version.
The pain of new major versions encourages large releases. Large releases require more management from maintainers and make upgrading harder for users.
It’s easy for a library to accidentally depend on a newer version of a dependency. If A is declared to be compatible with B 1.0, but developed against B 1.1, it’s easy to pick up an addition from 1.1 without updating the version requirement.

These are fallout from using version numbers as a proxy for compatibility. What really matters is whether the compiler considers two versions compatible.

Knowing what the compiler will find to be compatible can be very nuanced, requiring intimate knowledge of the compiler. If you want to always get the right answer, the compiler needs to be used to make the decision.² Otherwise maintainers will make mistakes and the whole scheme falls down.³

Elm’s elm-package does this. Version numbers are determined automatically based on changes to the API. This eliminates the potential for mistakes, but doesn’t improve granularity: a new major release still isn’t compatible if the portions of the API that you’re using don’t change.

The reality is that version numbers don’t communicate much about compatibility. If we want to do better, we need to actually calculate compatibility instead of choosing such a conservative approximation. But we can do that! Compilers already know how.

A smarter package manager would require:

A list of the public APIs in a library at any version.
The actual APIs that were used from each dependency.
A minimum version for each dependency. This would be used to build and calculate the APIs that were used from each dependency. But it would also serve as a way to guarantee the availability of bugfixes in dependencies.

By capturing the information that the compiler uses when building, compatibility can be determined. If one of the used APIs is no longer there, the version is incompatible. But a breaking change to an unused API would be compatible! ⁴

This scheme would improve on the problems of semantic versioning:

Maintainers could release breaking changes as needed without a major release.

Only clients who depended on that API would need to update. This should let breaking changes to be released more frequently with less disruption.
Updating could be done in smaller steps.

Instead of updating to the next major version, which had potentially grouped together a large number of breaking changes, a package manager could update one breaking (for you) release at a time.
The package manager could guarantee compatibility.

Maintainers wouldn’t need to worry about mistakes. The package manager could generate—and thus verify—all the compatibility requirements.
Version numbers could be repurposed.

They could be designed for humans instead of machines. We might stick with a familiar major.minor.patch template, or maybe we’d just a year.month.date.patch to communicate the passage of time, or something else. But we could choose whatever was most helpful for us.

Semantic versioning knows nothing about the content it versions. This leaves humans to do the machine’s work when computers should be doing ours. We’re not good at this sort of work, which leads to mistakes and painful workflows.

Let’s build tools that are designed for humans—that are easy to use and prevent mistakes. Smart tools make developers more productive and development more accessible. Our compilers and IDEs are getting smarter. I think it’s time for our package managers to get smarter too.

e.g., changing class A { } to typealias A = B; class B { } is source-compatible but binary-incompatible in Swift. ↩
As the Perl community says, “Only perl can parse Perl”. ↩
I’ve made a few of these mistakes myself over the past couple years. Even if you’re looking for them, they can be hard to catch. ↩
This is most likely an oversimplification. But I’m pretty sure the idea itself is valid. ↩