rls/architecture.md

187 lines
12 KiB
Markdown
Raw Permalink Normal View History

2019-01-23 22:33:19 +00:00
# Rust Language Server (RLS) - Architecture
2019-01-23 19:26:09 +00:00
## Preface
In addition to the document below, an architecture overview can be found at @nrc's blog post [How the RLS works](https://www.ncameron.org/blog/how-the-rls-works/) (2017). While some bits have changed, the gist of it stays the same.
2019-01-23 19:26:09 +00:00
2019-01-25 16:37:47 +00:00
Here we aim to explain in-depth how RLS obtains the underlying data to drive its indexing features as the context for the upcoming IDE planning and discussion at the 2019 Rust All-Hands.
2019-01-23 19:26:09 +00:00
Also the [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/blob/1d53f695f0408f47c5cce5cefa471eb0e86b0db7/docs/dev/guide.md) guide is a great resource as it covers a lot of common ground.
2019-01-23 19:26:09 +00:00
## High-level overview
2019-01-23 22:33:19 +00:00
At the time of writing, at the highest level RLS compiles your package/workspace (similar to `cargo check`) and reuses resulting internal data structures of the Rust compiler to power its indexing features.
2019-01-23 19:26:09 +00:00
When initialized, (unless overriden by custom build command) RLS `cargo check`s the current project and collects inter-crate [1] dependency graph along with exact crate compilation invocations, which is used later to run the compiler again itself (but in-process).
2019-01-23 22:33:19 +00:00
In-process compilation runs return populated internal data structures (`rls_data::Analysis`), which are further lowered and cross-referenced to expose a low-level indexing API (`rls_analysis::Analysis`) to finally be consumed by the Rust Language Server in order to answer relevant LSP [2] queries.
2019-01-23 19:26:09 +00:00
2019-01-25 16:37:47 +00:00
The main reason we execute the compilation in-process is to optimize the
latency - we pass the resulting data structures in-memory. For dependencies that
don't change often (non-primary/path dependencies) we perform the compilation out of process
once, where we dump and cache the resulting data into a JSON file, which only needs to be read once at the start of indexing.
2019-01-23 19:26:09 +00:00
[1] *crate* is a single unit of compilation as compiled by `rustc`. For example, Cargo package with bin+lib has *two* crates (sometimes called *targets* by Cargo).
2019-01-23 22:33:19 +00:00
[2] [*Language Server Protocol*](https://microsoft.github.io/language-server-protocol/specification) is a language-agnostic JSON-RPC protocol which serves to expose common language "smartness" operations via standardized interface, regardless of the IDE/editor used.
2019-01-23 19:26:09 +00:00
## Information flow (in-depth)
The current flow is as follows:
```
rustc -> rustc_save_analysis -> rls_data -> rls_analysis -> rls
2019-01-23 19:26:09 +00:00
```
### [rustc_save_analysis](https://github.com/rust-lang/rust/tree/master/compiler/rustc_save_analysis)
2019-01-23 19:26:09 +00:00
The Rust compiler includes the [`rustc_save_analysis`](https://github.com/rust-lang/rust/tree/master/compiler/rustc_save_analysis) crate, which allows to dump the knowledge about the currently compiled crate. The main entry point is [`process_crate`](https://github.com/rust-lang/rust/blob/e08d5693609a659e45025b8ea4dbd9efa342fa68/compiler/rustc_save_analysis/src/lib.rs#L978), which walks the post-macro-expansion AST and [saves](https://github.com/rust-lang/rust/blob/e08d5693609a659e45025b8ea4dbd9efa342fa68/compiler/rustc_save_analysis/src/lib.rs#L1011) the collected knowledge either by [dumping to a JSON file](https://github.com/rust-lang/rust/blob/e08d5693609a659e45025b8ea4dbd9efa342fa68/compiler/rustc_save_analysis/src/lib.rs#L953-L965) or by [calling back with resulting data structure](https://github.com/rust-lang/rust/blob/e08d5693609a659e45025b8ea4dbd9efa342fa68/compiler/rustc_save_analysis/src/lib.rs#L967-L976).
2019-01-23 19:26:09 +00:00
### [rls_data](https://github.com/rust-lang/rls/tree/master/rls-data)
As mentioned previously, the returned data structure is [`rls_data::Analysis`](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls-data/src/lib.rs#L11-L26) inside the [`rls_data`](https://github.com/rust-lang/rls/tree/master/rls-data) crate:
2019-01-23 19:26:09 +00:00
2019-01-23 22:33:19 +00:00
```rust
/// Basically a type alias, we refer to nodes with HIR ids.
/// All of the below nodes are either identified by or refer to those IDs.
type Id = rustc::hir::def_id::DefId;
pub struct Analysis {
...
/// Contains rustc invocation producing this save-analysis data. Currently
/// used to support RLS in custom build system context, namely Buck).
pub compilation: Option<CompilationOptions>,
/// Path to current crate root file and current crate `GlobalCrateId`. Also
/// includes externally referred crates' `GlobalCrateId` and their local
/// `CrateNum` index as stored by rustc from this crate compilation PoV.
pub prelude: Option<CratePreludeData>,
/// Nodes of use tree forests, incl. relation, span, optional alias value
/// and kind (extern crate, simple use or glob).
pub imports: Vec<Import>,
/// Main data nodes. Roughly correspond to post-expansion AST nodes, incl.
/// span, qualified name, relations, optional signature (if a function),
/// docs and attributes.
pub defs: Vec<Def>,
/// Nodes for `impl` items, incl. kind (inherent, trait impl, ...), span,
/// children ids, docs, attributes and signature.
pub impls: Vec<Impl>,
/// Span which refers an `Id` of function/module/type/variable kind.
pub refs: Vec<Ref>,
/// Contains callsite and callee span along with macro qualified name.
pub macro_refs: Vec<MacroRef>,
/// Impl/SuperTrait relation between `Id`s with an associated span.
pub relations: Vec<Relation>,
}
```
2019-01-23 19:26:09 +00:00
### [rls_analysis](https://github.com/rust-lang/rls/tree/master/rls-analysis)
2019-01-23 19:26:09 +00:00
This [crate](https://github.com/rust-lang/rls/tree/master/rls-analysis) is responsible for loading and stitching multiple of
2019-01-23 22:33:19 +00:00
the `rls_data::Analysis` data structures into a single, coherent interface.
2019-01-25 16:37:47 +00:00
Whereas `rls_data` format can be considered an implementation detail that might
change, this crate aims to provide a 'stable' API.
Another reason behind that is that each of those structures contains data centric
to the crate that was being compiled - this [lowering](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls-analysis/src/lowering.rs#L161)
2019-01-23 22:33:19 +00:00
cross-references the data
and indexes it, resulting in a database spanning multiple crates that can be
queried like 'across known crates, find all references to a definition at a
given span' or similarly.
We are capable of updating the index with new crate data. Whenever we encounter
a new crate, we [record and translate](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls-analysis/src/lowering.rs#L122-L144)
2019-01-23 22:33:19 +00:00
the crate id into our database-wide crate id mapping.
However, if data for an already lowered crate is loaded again, we simply
replace the definitions for a given crate and re-index.
One interesting edge case is when we lower data for crates having the same name, such
2019-01-23 22:33:19 +00:00
as binary and `#[cfg(test)]`-compiled version of it. We need to ensure we lower a given definition
[only once](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls-analysis/src/lowering.rs#L271-L276)
2019-01-23 22:33:19 +00:00
, even if it technically is repeated across multiple crates.
2019-01-23 19:26:09 +00:00
### rls
2019-01-23 22:33:19 +00:00
With all the data lowering logic in place, all we have to do is actually fetch
the data - that happens inside the RLS.
In general, apart from being an LSP server, the RLS is also concerned with
build orchestration, and coordination of other components, such as
2019-01-23 22:33:19 +00:00
* Racer for autocompletion
* Cargo for project layout detection and initial build coordination
2019-01-25 16:37:47 +00:00
* internal virtual file system (VFS) for handling in-memory text buffers,
2019-01-23 22:33:19 +00:00
* rls-analysis serving as our knowledge database
2019-01-25 16:37:47 +00:00
* Clippy performing additional lints
* Rustfmt driving our formatting capabilities
2019-01-23 22:33:19 +00:00
After doing initial compilation with Cargo, we cache a subgraph of the inter-crate
dependency graph along with compilation invocations and input files for the
crates we're interested in (inside the [primary or path-based packages](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/cargo.rs#L358-L363))
2019-01-23 22:33:19 +00:00
which we later rerun manually.
In our case Cargo is configured to use a separate target directory
(`$target-dir/rls`) so the analysis does not interfere with regular builds.
We hijack the Cargo build process not only to record the build orchestration
data mentioned above but also to inject additional compiler flags forcing the compiler
2019-01-23 22:33:19 +00:00
to dump the JSON save-analysis files for each dependency. This serves as an
2019-01-25 16:37:47 +00:00
optimization so that we don't have to re-check dependencies in a fresh run.
2019-01-23 22:33:19 +00:00
Because RLS aims to provide a truthful view of the compilation and to maintain
parity with regular `cargo check` flow, our Cargo runs also run `build.rs` and
proc macros initially and when needed (e.g. by modifying a file causing
`build.rs` to be rerun).
2019-01-23 19:26:09 +00:00
## Build scheduling
On every relevant file change we [mark files as dirty](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/actions/notifications.rs#L113) and schedule a normal build.
We currently discern two [build priorities](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/mod.rs#L114-L124):
2019-01-23 22:33:19 +00:00
* Normal
* Cargo
The latter is scheduled whenever a change happened that can impact entire
project. This includes:
* [initial build](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/actions/mod.rs#L328)
* [configuration change](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/actions/notifications.rs#L116)
2019-01-23 22:33:19 +00:00
(can potentially build different set of packages)
* [Cargo.toml change](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/actions/notifications.rs#L264)
2019-01-23 22:33:19 +00:00
(ditto)
* [build directory](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/mod.rs#L468-L472) changed
* [modified file](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/cargo_plan.rs#L350-L354) in a package we didn't build
* [build.rs](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/cargo_plan.rs#L359-L360) modification
2019-01-23 22:33:19 +00:00
On a normal build, we map from dirty files to dirty crates, sort those
topologically and run rustc in-process for each crate ourselves.
With each compilation in-process we directly
[receive `rls_data::Analysis`](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/rustc.rs#L293-L305)
2019-01-23 22:33:19 +00:00
in a callback,
[mark corresponding files as built](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/mod.rs#L478-L490)
2019-01-23 22:33:19 +00:00
and finally
[update our analysis database](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/actions/post_build.rs#L180-L184)
2019-01-23 22:33:19 +00:00
with currently built data for each rebuilt crate.
If there are still files that are modified after we scheduled a build (user kept
typing), we don't mark it as done yet and schedule a regular build again).
It's worth noting that we squash builds whenever user happened to type before
a build kicked off (we buffer build requests not to waste work on something we
might potentially invalidate).
2019-01-23 19:26:09 +00:00
## I/O
2019-01-23 22:33:19 +00:00
As mentioned previously, we run Cargo using a separate target directory so we
do the same kind of work that Cargo does, in addition to also saving
JSON save-analysis files for our non-path dependencies.
### VFS
To allow running analysis on unsaved in-memory text buffers, we use the
[`rls-vfs`](https://github.com/rust-lang/rls/tree/master/rls-vfs)
2019-01-23 22:33:19 +00:00
crate to act as our virtual file system.
The Rust compiler supports using custom file providers via [`FileLoader`](https://github.com/rust-lang/rust/blame/f19851069efd6ee1fe899a469f08ad2d66e76050/compiler/rustc_span/src/source_map.rs#L98-L105) trait, which [we use](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/rustc.rs#L355-L367).
2019-01-23 19:26:09 +00:00
2019-01-23 22:33:19 +00:00
It delegates to the real file system
whenever there are no buffered changes to a file but serves the unsaved buffers [from the VFS](https://github.com/rust-lang/rls/blob/3df74381f37617ec800537c11fb0c3130f5f3616/rls/src/build/rustc.rs#L54) otherwise.