rfcs/text/2906-cargo-workspace-dedupl...

658 lines
25 KiB
Markdown

- Feature Name: `workspace-deduplicate`
- Start Date: 2020-04-13
- RFC PR: [rust-lang/rfcs#2906](https://github.com/rust-lang/rfcs/pull/2906)
- Rust Issue: [rust-lang/cargo#8415](https://github.com/rust-lang/cargo/issues/8415)
> **Note**: This feature was stabilized in Rust 1.64. Several design changes were made in the course of the implementation. Please see the documentation for [`[workspace.package]`](https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table) and [`[workspace.dependencies]`](https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table) for details on how to use this feature.
# Summary
[summary]: #summary
Deduplicate common dependency and metadata directives amongst a set of workspace
crates in Cargo with extensions to the `[workspace]` section in `Cargo.toml`.
# Motivation
[motivation]: #motivation
Cargo has supported workspaces for quite some time now but when managing a large
workspace there is often a good deal of redundancy between member crates in a
workspace. Currently this proposal attempts to tackle a few major areas of
duplication. Many of these areas of duplication are managed either manually or
with scripts, and the goal of this proposal is to largely eliminate the need for
scripts and also the need to manually manage so much.
## Duplication of `[dependencies]` sections
Often when managing a workspace you'll have a lot of crates that all depend on
the same crate. For example many of your crates may depend on `log`. Today you
must write down the same `log` directive in all your manifests:
```toml
[dependencies]
log = "0.3.1"
```
Depending on how many crates you're working on, that's a lot of times to
remember `0.3.1`! Additionally if you'd like to update this dependency, say if a
`1.0.0` release is made, you need to edit every single `Cargo.toml` to make sure
they all stay in sync. This is a lot of duplicated work!
This duplication gets even worse when you start modifying the features of each
crate. For example:
```toml
[dependencies]
log = { version = "0.3.1", features = ['release_max_level_warn'] }
```
If you wanted to consistently write this across many crates it can get quite
cumbersome quite quickly.
## Duplication in inter-dependent crates
When managing a workspace you'll often have a lot workspace members that all
depend on each other. The "blessed" way to do this is actually quite verbose:
```toml
[dependencies]
other-workspace-member = { path = "../other-member", version = "0.2.3" }
```
Here you need to specify *both* `path` and `version`. Using `path` means that
you're depending on exactly that copy on the local filesystem. This also means
that if you depend on any workspace member via a `git` dependency later on it'll
correctly pull in the other workspace members from the git repo. (note that some
projects use `[patch]` to only write down `other-workspace-member = "0.2.3"` but
this causes issues when crates later use git dependencies)
If you never publish to crates.io, `path` is all you need. If crates eventually
get published, though, they also need a `version` directive to know what version
from crates.io you'll be depending on after the publication.
Naturally, with a highly-interconnected workspace which may be relatively large,
this leads to a lot of duplication very quickly. This is a lot of `path` and
`version` directives that you've got to manage.
## Duplication in crate versions
A frequent pattern in Cargo workspaces which publish to crates.io is to have all
the crate at the same semver version. These crates all move in lockstep during
publication and get bumped at the same time.
While a minor papercut this basically means that anyone and everyone who has a
workspace of a lot of crates makes their own homebrew script for updating
versions and managing updates/publications. It'd be quite convenient if we could
standardize across the Rust ecosystem how to manage this information!
## Duplication in crate metadata
The last primary area of duplication that this proposal attempts to tackle is in
crate metadata in the `[package]` section. This includes items such as:
```toml
[package]
authors = []
license = "..."
repository = "..."
documentation = "..."
```
These metadata directives are often duplicated amongst all crates, especially
author/license/repository information. This is a pretty poor experience if you'e
got to keep writing down the information in so many places!
# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation
Cargo's manifest parsing will be updated with new features to support
deduplicating each of the areas above. While all of these new features are
pretty small in their own right, they all add up to greatly reducing the
overhead of managing a workspace of many crates. The list of new features in
Cargo will look like the following:
## Workspace-level Dependencies
The `[workspace]` section can now have a `dependencies` section which works the
same way as the `[dependencies]` section in `Cargo.toml`:
```toml
# in workspace's Cargo.toml
[workspace.dependencies]
log = "0.3.1"
log2 = { version = "2.0.0", package = "log" }
serde = { git = 'https://github.com/serde-rs/serde' }
wasm-bindgen-cli = { path = "crates/cli" }
```
Each workspace member can then reference this section in the workspace with a
new dependency directive:
```toml
# in a workspace member's Cargo.toml
[dependencies]
log = { workspace = true }
```
This directive indicates that the `log` dependency should be looked up from
`workspace.dependencies` in the workspace root. You can reference any name
defined in `[workspace.dependencies]` too:
```toml
[dependencies]
log2 = { workspace = true }
```
## No longer need both `version` and `path` to publish to Crates.io
When you have a `path` dependency, Cargo's current behavior on publication looks
like this:
* If you have a `version` specifier as well, then the `path` key is deleted and
the crate is uploaded with the specified `version` as a dependency
requirement.
* If you don't have a `version` specifier, then the dependency directive is
deleted and crates.io will not learn about this dependency. This is only
really useful for `dev-dependencies`.
Cargo's behavior will change in this second case, instead following new logic
for a missing `version` specifier. For dev-dependencies where the referenced
package is `publish = false`, then the dependency will be dropped. Otherwise
Cargo will assume that `version = "$dependency_version"` was specified, meaning
that it requires at least the current version and otherwise any
semver-compatible version.
This behavior should mean that you no longer need to write `version = "..."`
with `path` dependencies if you publish to crates.io. Coupled with the
workspace-level dependencies above this means you never have to write the
version of a path dependency anywhere!
## Package metadata can reference other workspace members
To deduplicate `[package]` directives in `Cargo.toml` workspace members, Cargo
will now support declaring that metadata directives should be inherited from the
workspace. For example to version every package the same within a workspace you
can specify:
```toml
[package]
name = "foo"
version = { workspace = true }
```
This directive tells Cargo that the version of `foo` is the same as the
`workspace.version` directive found in the workspace manifest. This means that
in addition to a new `[workspace.dependencies]` section, package metadata keys
can now also be defined inside of a `[workspace]` section:
```toml
[workspace]
version = "0.25.2"
```
Many other package metadata attributes are supported as well
```toml
[package]
authors = { workspace = true }
license = { workspace = true }
```
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation
Cargo's `[workspace]` section will first be extended with a few new attributes.
Like before the `[workspace]` table can only appear in a workspace root, not in
any other manifests. Additionally the `[workspace]` table doesn't have to be
associated with a package, it could be part of a virtual manifest.
## Updates to `[workspace]`
The first addition to the `[workspace]` table is a `dependencies` sub-table,
like so:
```toml
[workspace.dependencies]
foo = "0.1"
```
The `dependencies` sub-table has the same form as the `[dependencies]` table in
manifests with a few exceptions:
* Dependencies cannot be declared as `optional`. The `optional` key must be
omitted or, if present, must be `false`.
* The `workspace` key (defined later in this proposal) is not allowed.
The `[workspace]` table will not support other kinds of dependencies like
`dev-dependencies`, `build-dependencies`, or `target."...".dependencies`. Only
`[workspace.dependencies]` will be supported.
To review, the `[workspace.dependencies]` table will be key/value pairs. Each
key is the name of a dependency while the dependency is a dependency directive.
This could be a string meaning a crates.io dependency or a table which further
configures the dependency.
Dependencies declared in `[workspace.dependencies]` have no meaning as-is. They
do not affect the build nor do they force packages to depend on those
dependencies. This part comes later below.
The `[workspace]` section will also allow the definition of a number of keys
also defined in `[package]` today, namely:
```toml
[workspace]
version = "1.2.3"
authors = ["Nice Folks"]
description = "..."
documentation = "https://example.github.io/example"
readme = "README.md"
homepage = "https://example.com"
repository = "https://github.com/example/example"
license = "MIT"
license-file = "./LICENSE"
keywords = ["cli"]
categories = ["development-tools"]
publish = false
edition = "2018"
[workspace.badges]
# ...
```
Each of these keys have no meaning in a `[workspace]` table yet, but will have
meaning when they're assigned to crates internally. That part comes later though
in this design! Note that the format and accepted values for these keys are the
same as the `[package]` section of `Cargo.toml`.
For now the `metadata` key is explicitly left out (due to complications around
merging table values), but it can always be added in the future if necessary.
## Updates to a package `Cargo.toml`
The interpretation of a `Cargo.toml` manifest within Cargo will now require a
`Workspace` object to be created. This `Workspace` will be used to elaborate and
expand each member's `Cargo.toml` directive. Additionally `Cargo.toml` will
syntactically accept some more forms.
### Placeholder Values
Previously package metadata values must be declared explicitly in each
`Cargo.toml`:
```toml
[package]
version = "1.2.3"
```
Cargo will now accept a table definition of `package.$key` which defines the
`package.$key.workspace` key as a boolean. For example you can specify:
```toml
[package]
name = "foo"
license = { workspace = true }
```
This directive indicates that the license of `foo` is the same as
`workspace.license`. If `workspace.license` isn't defined then this generates an
error.
The following keys in `[package]` can be inherited from `[workspace]` with the
new `workspace = true` directive.
```toml
[package]
version = { workspace = true }
authors = { workspace = true }
description = { workspace = true }
documentation = { workspace = true }
readme = { workspace = true }
homepage = { workspace = true }
repository = { workspace = true }
license = { workspace = true }
license-file = { workspace = true }
keywords = { workspace = true }
categories = { workspace = true }
publish = { workspace = true }
```
Note that directives like `license-file` are resolved relative to their
definition, so `license-file` is relative to the `[workspace]` section that
defined it.
### New dependency directives
Dependencies in the `[dependencies]`, `[dev-dependencies]`,
`[build-dependencies]`, and `[target."...".dependencies]` sections will support
the ability to reference the `[workspace.dependencies]` definition of
dependencies. This is done with a new `workspace` key in the dependency
directive. An example of this looks like:
```toml
[dependencies]
log = { workspace = true }
```
The `workspace` key cannot be defined with other keys that configure the source
of the dependency. This means you cannot define `workspace` with keys like
`version`, `registry`, `registry-index`, `path`, `git`, `branch`, `tag`, `rev`,
or `package`. The `workspace` key can be combined with other keys, however:
* `optional` - this introduces an optional dependency as usual, as well as a
feature named after the key (left hand side) of the dependency directive).
Note that the `[workspace.dependencies]` table is not allowed to specify
`optional`.
* `features` - this indicates, as usual, that extra features are being enabled
over the already-enabled features in the directive found in
`[workspace.dependencies]`. The result set of enabled features is the union of
the features specified inline with the features specified in the directive in
the workspace table.
For now if a `workspace = true` dependency is specified then also specifying the
`default-features` value is disallowed. The `default-features` value for a
directive is inherited from the `[workspace.dependencies]` declaration, which
defaults to `true` if nothing else is specified.
### Path dependencies infer `version` directive
As a final change to `Cargo.toml`, dependencies using the `path` directive and
not specifying a `version` directive will have the `version` directive inferred.
For example if we have:
```toml
# foo/Cargo.toml
[dependencies]
bar = { path = "../bar" }
```
as well as
```toml
# bar/Cargo.toml
[package]
name = "bar"
version = "1.0.1"
```
this is equivalent in `foo/Cargo.toml` to as if this were written:
```toml
# foo/Cargo.toml
[dependencies]
bar = { path = "../bar", version = "1.0.1" }
```
The `version` key for `path` dependencies, if not specified, will be inferred to
the version of the path dependency itself. Note that this is a version
requirement not an actual semver version, and the version requirement will be
interpreted as "at least the current version, and anything semver compatible
with it".
This logic of inferring, however, will also respect the `publish` key. For
example if we had this instead:
```toml
# bar/Cargo.toml
[package]
name = "bar"
version = "1.0.1"
publish = false
```
then Cargo would not alter this dependency directive:
```toml
# foo/Cargo.toml
[dependencies]
bar = { path = "../bar" }
```
## Effect on `cargo publish`
Cargo currently already "elaborates" the manifest during publication. For
example it removes `path` keys in dependency lists to only have the version
requirement pointing to crates.io. During publication Cargo will also elaborate
any substituted information from the `[workspace]`, because `[workspace]` is
also removed during publication!
This means that `workspace = true` will never be present in `Cargo.toml` files
published to crates.io, and additionally no information about `workspace = true`
will make its way to the registry index. Furthermore metadata fields like
`package.repository` will be filled in and will be present on crates.io's UI.
Put another way, `Cargo.toml` files published to crates.io, or metadata found
through crates.io, won't change from what they are today.
## Effect on `Cargo.lock`
When creating a `Cargo.lock` file Cargo will perform crate resolution as-if all
dependencies in `[workspace.dependencies]` are depended on by some crate, even
if no crate actually references an entry in `[workspace.dependencies]`. This
means that if a crate uses an entry in `[workspace.dependencies]` it's
guaranteed to have an entry in the lock file indicating what its dependencies
should be.
Note that for entries in `[workspace.dependencies]` which aren't used by any
crates in the workspace will likely trigger a warning, however, so users can
continue to prune accidentally unused entries.
## Effect on `cargo metadata`
Executing `cargo metadata` to learn about a crate graph will implicitly perform
all subsitution defined in this proposal. Consumers of `cargo metadata` will
continue to get the same output they got before this proposal, meaning that
implicit substitutions, if any, will be invisible to users of `cargo metadata`.
## Effect on `cargo read-manifest`
Similar to `cargo metadata`, the `cargo read-manifest` command will perform all
necessary subsitutions when presenting the output as JSON.
## Effect resolution for relative `path` dependencies
Like today, `path` dependencies will be resolved relative to the file that
defines them. This means that references to dependencies defined in the
workspace means paths are still relative to the workspace root itself.
For example if you write down a `[workspace.depencencies]` directive with a
relative path:
```toml
# Cargo.toml
[workspace.dependencies]
my-crate = { path = "crates/my-crate" }
```
And then you reference this in another crate:
```toml
# crates/other-crate/Cargo.toml
[dependencies]
my-crate = { workspace = true }
```
then the `my-crate` dependency references the crate located at `crates/my-crate`
relative to the workspace root, not located at
`crates/other-crate/crates/my-crate`.
# Drawbacks
[drawbacks]: #drawbacks
This proposal significantly complicates the process of interpreting a
`Cargo.toml`. One of the major purposes of using TOML to specify a crate
manifest was to make it easy for other tools to parse Cargo manifests and work
with them. This not only includes Rust-based tools but also tools in other
languages if necessary. Previously a TOML parser for your language was all you
really needed, but this proposal is adding a layer of indirection on top of TOML
where you have to interpret multiple manifests to figure out what one means. For
example you can no longer quickly and easily be guaranteed to parse the version
of a package, but you might have to go find the workspace root or other crates
to figure that out. Workspace discovery and membership is pretty nontrivial so
non-Cargo based tools will have a difficult time *not* using Cargo to figure out
a full elaborated form of a manifest 100% of the time.
This proposal also extends `Cargo.toml` with changes that will break any
existing tools which assume a particular format of `Cargo.toml`. For example if
a tool expects `package.version` to be a `String` that runs a risk of being
broken in the future due to the ability to specify a table there instead.
Additionally this proposal complicates a reader's understanding of `Cargo.toml`.
While verbose for maintainers having duplication of information is actually
quite nice for readers of `Cargo.toml` because you don't have to chase anything
else down to figure out what a dependency is. If this proposal is implemented
then whenever you see `foo = { workspace = true }` you've got to go consult
something else to figure out what the dependency actually is. This layer of
indirection can cause surprise for readers or otherwise add a speed-bump to
understanding the contents of a manifest.
# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives
Cargo's manifests have been a pretty carefully curated part of Cargo's design to
ensure that they're consistently readable and concise where possible. For
example many of Cargo's manifest idioms gently nudge users towards the same
standards across the community by supporting many zero-configuration situations
such as where to put and how to name tests.
This proposal is an extension of these design principles to provide a gentle
nudge to consistently, across the Rust community, manage workspaces,
dependencies, and metadata. A goal here is to increase consistency in how this
is all managed across projects in a way that still preserves Cargo's existing
flexibility for users.
Note that flexibility is a key part of this proposal where it's possible to
intermingle shorthands with longer versions. For example if the workspace
declares:
```toml
[workspace.dependencies]
log = "0.3"
```
But you really want to try out a new version of `log` in one workspace member,
you can easily do so by changing
```toml
[dependencies]
log = { workspace = true }
```
to
```toml
[dependencies]
log = "0.4"
```
Additionally you can always custom-version your packages, you've just got the
option to reference another package as well. Overall this proposal should
empower more power users of Cargo to manage workspaces easily without taking
away any of the existing configurations that Cargo already supports.
## Alternative Syntax
This proposal is largely a syntactic proposal for `Cargo.toml` and changing how
we can specify a few directives. Naturally that lends itself to quite a lot of
possible bikeshedding! Virtually all of the aspects of the proposal that modify
`Cargo.toml` can be tweaked in various ways such as names used or where they're
placed. In any case discussion about compelling alternatives is always
encouraged!
Some alternative syntaxes:
```toml
[dependencies]
# Instead of `foo = { workspace = true }`
foo = {}
foo = "ws"
foo = "workspace"
foo.workspace = true # technically the same, but idiomatically different
```
## Including metadata by default
This proposal indicates that package metadata is not inherited by default from
the workspace. This may be desired in some scenarios instead of repeating
`license = { workspace = true }` everywhere, and there's likely two possible
ways this could happen.
* Workspace directives could be implicitly and automatically inherited to
members. In the future, however, Cargo will want to support nested
workspaces, and it's unclear how these features will interact. In order to
strik a reasonable middle-ground for now a simple solution which should
address many use cases is proposed and we can continue to refine this over
time as necessary.
* Directives could be flagged to be explicitly inherited to workspace members as
an optional way of specifying this. For now though to keep this proposal
simple this is left out as a possible future extension of Cargo.
## Inheriting metadata from other packages
One possible extension of this RFC is for metadata to not only be inheritable
from the `[workspace]` table but also from other packages. For example a
scenario seen in the wild is that some repositories have multiple "cliques" of
crates which are all versioned as a unit. In this scenario one "clique" can have
its version directives deduplicated with this proposal, but not multiple ones.
It's hoped though that an eventual feature of nested workspaces would solve this
issue in Cargo. That way each "clique" could correspond to one workspace, and
that way we wouldn't need extra support to inherit directives from anywhere.
## Motivating issues
Duplication throughout workspaces has been a thorn in Cargo's since practically
since the inception of workspaces. Naturally there's quite a few bugs filed on
Cargo's issue tracker about this which provide some context for why make a
proposal at all as well as how to design this proposal.
* [#3931] - updating the version of a crate in a workspace means lots of edits
* [#7552] - crates may differ slightly in versions required from crates.io
* [#7964] - current idioms push users towards usage of `[patch]` which breaks
git dependencies
* [#5471] - an issue about shared dependencies in a workspace
* [#6126] - an issue where `[patch]` tables are used seemingly to make it easier
to specify dependencies in a workspace, but having everything in
`[workspace.dependencies]` makes it smaller to specify.
* [#6828] - an issue about inheriting workspace attributes
[#3931]: https://github.com/rust-lang/cargo/issues/3931
[#7552]: https://github.com/rust-lang/cargo/issues/7552
[#7964]: https://github.com/rust-lang/cargo/issues/7964
[#5471]: https://github.com/rust-lang/cargo/issues/5471
[#6126]: https://github.com/rust-lang/cargo/issues/6126
[#6828]: https://github.com/rust-lang/cargo/issues/6828
## Full templating language
One sort of far-out-there alternative we could go for is to be far more
ambitious and make our own sort of "templating language" on top of TOML. This
would arguably be much more flexible than the limited amount of deduplication
proposed here, but you could imagine things like:
```toml
[package]
name = "foo"
version = "1.{workspace.vars.minor}.0"
[dependencies]
bar = "{workspace.dependencies.bar}"
baz = { version = "1", features = "{workspace.vars.baz_features}" }
```
or "insert your own idea for how we can go all out" here. In general though I
think there's a lot to be gained from the simplicity of TOML and prioritizing
other tools reading Cargo manifests, so we may not want to go full-blown
templating language just yet.
# Unresolved questions
[unresolved-questions]: #unresolved-questions
* One thing we'll want to resolve for sure is nailing down all the syntactical
decision here, which is expected to evolve through consensus.
* It's not clear how complex an implementation of this proposal will be in
Cargo. It could be prohibitively complex, but it's hoped that it's a
relatively simple refactoring to implement this in Cargo.