Merge 3cd8b7b123 into 51817951d0

Merge pull request #1468 from petrochenkov/debmac
Add docs for `#[collapse_debuginfo]` attribute
2024-04-27 12:17:53 -07:00 · 2024-04-27 17:54:45 +00:00 · 2024-04-27 10:53:08 -07:00 · 2024-04-21 13:47:07 +00:00 · 2024-04-20 14:05:08 +00:00 · 2024-04-17 15:06:26 +00:00
15 changed files with 297 additions and 97 deletions
--- a/src/attributes.md
+++ b/src/attributes.md
@ -196,7 +196,7 @@ struct S {
 pub fn f() {}
 ```

-> Note: `rustc` currently recognizes the tools "clippy" and "rustfmt".
+> Note: `rustc` currently recognizes the tools "clippy", "rustfmt" and "diagnostic".

 ## Built-in attributes index

@ -224,6 +224,8 @@ The following is an index of all built-in attributes.
  - [`allow`], [`warn`], [`deny`], [`forbid`] — Alters the default lint level.
  - [`deprecated`] — Generates deprecation notices.
  - [`must_use`] — Generates a lint for unused values.
+  - [`diagnostic::on_unimplemented`] — Hints the compiler to emit a certain error
+    message if a trait is not implemented.
 - ABI, linking, symbols, and FFI
  - [`link`] — Specifies a native library to link with an `extern` block.
  - [`link_name`] — Specifies the name of the symbol for functions or statics
@ -273,6 +275,7 @@ The following is an index of all built-in attributes.
    added in future.
 - Debugger
  - [`debugger_visualizer`] — Embeds a file that specifies debugger output for a type.
+  - [`collapse_debuginfo`] — Controls how macro invocations are encoded in debuginfo.

 [Doc comments]: comments.md#doc-comments
 [ECMA-334]: https://www.ecma-international.org/publications-and-standards/standards/ecma-334/
@ -291,6 +294,7 @@ The following is an index of all built-in attributes.
 [`cfg_attr`]: conditional-compilation.md#the-cfg_attr-attribute
 [`cfg`]: conditional-compilation.md#the-cfg-attribute
 [`cold`]: attributes/codegen.md#the-cold-attribute
+[`collapse_debuginfo`]: attributes/debugger.md#the-collapse_debuginfo-attribute
 [`crate_name`]: crates-and-source-files.md#the-crate_name-attribute
 [`crate_type`]: linkage.md
 [`debugger_visualizer`]: attributes/debugger.md#the-debugger_visualizer-attribute
@ -352,3 +356,4 @@ The following is an index of all built-in attributes.
 [closure]: expressions/closure-expr.md
 [function pointer]: types/function-pointer.md
 [variadic functions]: items/external-blocks.html#variadic-functions
+[`diagnostic::on_unimplemented`]: attributes/diagnostics.md#the-diagnosticon_unimplemented-attribute
--- a/src/attributes/codegen.md
+++ b/src/attributes/codegen.md
@ -262,7 +262,7 @@ Feature     | Implicitly Enables  | Description
 [rv-zks]: https://github.com/riscv/riscv-crypto/blob/e2dd7d98b7f34d477e38cb5fd7a3af4379525189/doc/scalar/riscv-crypto-scalar-zks.adoc
 [rv-zksed]: https://github.com/riscv/riscv-crypto/blob/e2dd7d98b7f34d477e38cb5fd7a3af4379525189/doc/scalar/riscv-crypto-scalar-zksed.adoc
 [rv-zksh]: https://github.com/riscv/riscv-crypto/blob/e2dd7d98b7f34d477e38cb5fd7a3af4379525189/doc/scalar/riscv-crypto-scalar-zksh.adoc
-[rv-zkt]: https://github.com/riscv/riscv-crypto/blob/e2dd7d98b7f34d477e38cb5fd7a3af4379525189/doc/scalar/riscv-crypto-scalar-zkr.adoc
+[rv-zkt]: https://github.com/riscv/riscv-crypto/blob/e2dd7d98b7f34d477e38cb5fd7a3af4379525189/doc/scalar/riscv-crypto-scalar-zkt.adoc

 #### `wasm32` or `wasm64`

@ -273,10 +273,20 @@ attempting to use instructions unsupported by the Wasm engine will fail at load
 time without the risk of being interpreted in a way different from what the
 compiler expected.

-Feature     | Description
------------|-------------------
-`simd128`   | [WebAssembly simd proposal][simd128]
+Feature               | Description
+----------------------|-------------------
+`bulk-memory`         | [WebAssembly bulk memory operations proposal][bulk-memory]
+`extended-const`      | [WebAssembly extended const expressions proposal][extended-const]
+`mutable-globals`     | [WebAssembly mutable global proposal][mutable-globals]
+`nontrapping-fptoint` | [WebAssembly non-trapping float-to-int conversion proposal][nontrapping-fptoint]
+`sign-ext`            | [WebAssembly sign extension operators Proposal][sign-ext]
+`simd128`             | [WebAssembly simd proposal][simd128]

+[bulk-memory]: https://github.com/WebAssembly/bulk-memory-operations
+[extended-const]: https://github.com/WebAssembly/extended-const
+[mutable-globals]: https://github.com/WebAssembly/mutable-global
+[nontrapping-fptoint]: https://github.com/WebAssembly/nontrapping-float-to-int-conversions
+[sign-ext]: https://github.com/WebAssembly/sign-extension-ops
 [simd128]: https://github.com/webassembly/simd

 ### Additional information
--- a/src/attributes/debugger.md
+++ b/src/attributes/debugger.md
@ -139,3 +139,32 @@ When the crate's debug executable is passed into GDB[^rust-gdb], `print bob` wil
 [Natvis documentation]: https://docs.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects
 [pretty printing documentation]: https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html
 [_MetaListNameValueStr_]: ../attributes.md#meta-item-attribute-syntax
+
+## The `collapse_debuginfo` attribute
+
+The *`collapse_debuginfo` [attribute]* controls whether code locations from a macro definition are collapsed into a single location associated with the macro's call site,
+when generating debuginfo for code calling this macro.
+
+The attribute uses the [_MetaListIdents_] syntax to specify its inputs, and can only be applied to macro definitions.
+
+Accepted options:
+- `#[collapse_debuginfo(yes)]` — code locations in debuginfo are collapsed.
+- `#[collapse_debuginfo(no)]` — code locations in debuginfo are not collapsed.
+- `#[collapse_debuginfo(external)]` — code locations in debuginfo are collapsed only if the macro comes from a different crate.
+
+The `external` behavior is the default for macros that don't have this attribute, unless they are built-in macros.
+For built-in macros the default is `yes`.
+
+> **Note**: `rustc` has a `-C collapse-macro-debuginfo` CLI option to override both the default collapsing behavior and `#[collapse_debuginfo]` attributes.
+
+```rust
+#[collapse_debuginfo(yes)]
+macro_rules! example {
+    () => {
+        println!("hello!");
+    };
+}
+```
+
+[attribute]: ../attributes.md
+[_MetaListIdents_]: ../attributes.md#meta-item-attribute-syntax
--- a/src/attributes/diagnostics.md
+++ b/src/attributes/diagnostics.md
@ -301,6 +301,76 @@ When used on a function in a trait implementation, the attribute does nothing.
 > let _ = five();
 > ```

+## The `diagnostic` tool attribute namespace
+
+The `#[diagnostic]` attribute namespace is a home for attributes to influence compile-time error messages.
+The hints provided by these attributes are not guaranteed to be used.
+Unknown attributes in this namespace are accepted, though they may emit warnings for unused attributes.
+Additionally, invalid inputs to known attributes will typically be a warning (see the attribute definitions for details).
+This is meant to allow adding or discarding attributes and changing inputs in the future to allow changes without the need to keep the non-meaningful attributes or options working.
+
+### The `diagnostic::on_unimplemented` attribute
+
+The `#[diagnostic::on_unimplemented]` attribute is a hint to the compiler to supplement the error message that would normally be generated in scenarios where a trait is required but not implemented on a type.
+The attribute should be placed on a [trait declaration], though it is not an error to be located in other positions.
+The attribute uses the [_MetaListNameValueStr_] syntax to specify its inputs, though any malformed input to the attribute is not considered as an error to provide both forwards and backwards compatibility.
+The following keys have the given meaning:
+
+* `message` — The text for the top level error message.
+* `label` — The text for the label shown inline in the broken code in the error message.
+* `note` — Provides additional notes.
+
+The `note` option can appear several times, which results in several note messages being emitted.
+If any of the other options appears several times the first occurrence of the relevant option specifies the actually used value.
+Any other occurrence generates an lint warning.
+For any other non-existing option a lint-warning is generated.
+
+All three options accept a string as an argument, interpreted using the same formatting as a [`std::fmt`] string.
+Format parameters with the given named parameter will be replaced with the following text:
+
+* `{Self}` — The name of the type implementing the trait.
+* `{` *GenericParameterName* `}` — The name of the generic argument's type for the given generic parameter.
+
+Any other format parameter will generate a warning, but will otherwise be included in the string as-is.
+
+Invalid format strings may generate a warning, but are otherwise allowed, but may not display as intended.
+Format specifiers may generate a warning, but are otherwise ignored.
+
+In this example:
+
+```rust,compile_fail,E0277
+#[diagnostic::on_unimplemented(
+    message = "My Message for `ImportantTrait<{A}>` implemented for `{Self}`",
+    label = "My Label",
+    note = "Note 1",
+    note = "Note 2"
+)]
+trait ImportantTrait<A> {}
+
+fn use_my_trait(_: impl ImportantTrait<i32>) {}
+
+fn main() {
+    use_my_trait(String::new());
+}
+```
+
+the compiler may generate an error message which looks like this:
+
+```text
+error[E0277]: My Message for `ImportantTrait<i32>` implemented for `String`
+  --> src/main.rs:14:18
+   |
+14 |     use_my_trait(String::new());
+   |     ------------ ^^^^^^^^^^^^^ My Label
+   |     |
+   |     required by a bound introduced by this call
+   |
+   = help: the trait `ImportantTrait<i32>` is not implemented for `String`
+   = note: Note 1
+   = note: Note 2
+```
+
+[`std::fmt`]: ../../std/fmt/index.html
 [Clippy]: https://github.com/rust-lang/rust-clippy
 [_MetaListNameValueStr_]: ../attributes.md#meta-item-attribute-syntax
 [_MetaListPaths_]: ../attributes.md#meta-item-attribute-syntax
--- a/src/attributes/type_system.md
+++ b/src/attributes/type_system.md
@ -20,6 +20,12 @@ pub struct Config {
    pub window_height: u16,
 }

+#[non_exhaustive]
+pub struct Token;
+
+#[non_exhaustive]
+pub struct Id(pub u64);
+
 #[non_exhaustive]
 pub enum Error {
    Message(String),
@ -34,11 +40,13 @@ pub enum Message {

 // Non-exhaustive structs can be constructed as normal within the defining crate.
 let config = Config { window_width: 640, window_height: 480 };
+let token = Token;
+let id = Id(4);

 // Non-exhaustive structs can be matched on exhaustively within the defining crate.
-if let Config { window_width, window_height } = config {
-    // ...
-}
+let Config { window_width, window_height } = config;
+let Token = token;
+let Id(id_number) = id;

 let error = Error::Other;
 let message = Message::Reaction(3);
@ -64,30 +72,49 @@ Non-exhaustive types cannot be constructed outside of the defining crate:

 - Non-exhaustive variants ([`struct`][struct] or [`enum` variant][enum]) cannot be constructed
  with a [_StructExpression_] \(including with [functional update syntax]).
+- The implicitly defined same-named constant of a [unit-like struct][struct],
+  or the same-named constructor function of a [tuple struct][struct],
+  has a [visibility] no greater than `pub(crate)`.
+  That is, if the struct’s visibility is `pub`, then the constant or constructor’s visibility
+  is `pub(crate)`, and otherwise the visibility of the two items is the same
+  (as is the case without `#[non_exhaustive]`).
 - [`enum`][enum] instances can be constructed.

+The following examples of construction do not compile when outside the defining crate:
+
 <!-- ignore: requires external crates -->
 ```rust,ignore
-// `Config`, `Error`, and `Message` are types defined in an upstream crate that have been
-// annotated as `#[non_exhaustive]`.
-use upstream::{Config, Error, Message};
+// These are types defined in an upstream crate that have been annotated as
+// `#[non_exhaustive]`.
+use upstream::{Config, Token, Id, Error, Message};

-// Cannot construct an instance of `Config`, if new fields were added in
+// Cannot construct an instance of `Config`; if new fields were added in
 // a new version of `upstream` then this would fail to compile, so it is
 // disallowed.
 let config = Config { window_width: 640, window_height: 480 };

-// Can construct an instance of `Error`, new variants being introduced would
+// Cannot construct an instance of `Token`; if new fields were added, then
+// it would not be a unit-like struct any more, so the same-named constant
+// created by it being a unit-like struct is not public outside the crate;
+// this code fails to compile.
+let token = Token;
+
+// Cannot construct an instance of `Id`; if new fields were added, then
+// its constructor function signature would change, so its constructor
+// function is not public outside the crate; this code fails to compile.
+let id = Id(5);
+
+// Can construct an instance of `Error`; new variants being introduced would
 // not result in this failing to compile.
 let error = Error::Message("foo".to_string());

-// Cannot construct an instance of `Message::Send` or `Message::Reaction`,
+// Cannot construct an instance of `Message::Send` or `Message::Reaction`;
 // if new fields were added in a new version of `upstream` then this would
 // fail to compile, so it is disallowed.
 let message = Message::Send { from: 0, to: 1, contents: "foo".to_string(), };
 let message = Message::Reaction(0);

-// Cannot construct an instance of `Message::Quit`, if this were converted to
+// Cannot construct an instance of `Message::Quit`; if this were converted to
 // a tuple-variant `upstream` then this would fail to compile.
 let message = Message::Quit;
 ```
@ -95,16 +122,18 @@ let message = Message::Quit;
 There are limitations when matching on non-exhaustive types outside of the defining crate:

 - When pattern matching on a non-exhaustive variant ([`struct`][struct] or [`enum` variant][enum]),
-  a [_StructPattern_] must be used which must include a `..`. Tuple variant constructor visibility
-  is lowered to `min($vis, pub(crate))`.
+  a [_StructPattern_] must be used which must include a `..`. A tuple variant's constructor's
+  [visibility] is reduced to be no greater than `pub(crate)`.
 - When pattern matching on a non-exhaustive [`enum`][enum], matching on a variant does not
  contribute towards the exhaustiveness of the arms.

+The following examples of matching do not compile when outside the defining crate:
+
 <!-- ignore: requires external crates -->
 ```rust, ignore
-// `Config`, `Error`, and `Message` are types defined in an upstream crate that have been
-// annotated as `#[non_exhaustive]`.
-use upstream::{Config, Error, Message};
+// These are types defined in an upstream crate that have been annotated as
+// `#[non_exhaustive]`.
+use upstream::{Config, Token, Id, Error, Message};

 // Cannot match on a non-exhaustive enum without including a wildcard arm.
 match error {
@ -118,6 +147,13 @@ if let Ok(Config { window_width, window_height }) = config {
    // would compile with: `..`
 }

+// Cannot match a non-exhaustive unit-like or tuple struct except by using
+// braced struct syntax with a wildcard.
+// This would compile as `let Token { .. } = token;`
+let Token = token;
+// This would compile as `let Id { 0: id_number, .. } = id;`
+let Id(id_number) = id;
+
 match message {
  // Cannot match on a non-exhaustive struct enum variant without including a wildcard.
  Message::Send { from, to, contents } => { },
@ -147,3 +183,4 @@ Non-exhaustive types are always considered inhabited in downstream crates.
 [enum]: ../items/enumerations.md
 [functional update syntax]: ../expressions/struct-expr.md#functional-update-syntax
 [struct]: ../items/structs.md
+[visibility]: ../visibility-and-privacy.md
--- a/src/comments.md
+++ b/src/comments.md
@ -30,7 +30,7 @@
 > &nbsp;&nbsp; | INNER_BLOCK_DOC
 >
 > _IsolatedCR_ :\
-> &nbsp;&nbsp; _A `\r` not followed by a `\n`_
+> &nbsp;&nbsp; \\r

 ## Non-doc comments

@ -53,8 +53,9 @@ that follows.  That is, they are equivalent to writing `#![doc="..."]` around
 the body of the comment. `//!` comments are usually used to document
 modules that occupy a source file.

-Isolated CRs (`\r`), i.e. not followed by LF (`\n`), are not allowed in doc
-comments.
+The character `U+000D` (CR) is not allowed in doc comments.
+
+> **Note**:  The sequence `U+000D` (CR) immediately followed by `U+000A` (LF) would have been previously transformed into a single `U+000A` (LF).

 ## Examples

--- a/src/crates-and-source-files.md
+++ b/src/crates-and-source-files.md
@ -2,16 +2,9 @@

 > **<sup>Syntax</sup>**\
 > _Crate_ :\
-> &nbsp;&nbsp; UTF8BOM<sup>?</sup>\
-> &nbsp;&nbsp; SHEBANG<sup>?</sup>\
 > &nbsp;&nbsp; [_InnerAttribute_]<sup>\*</sup>\
 > &nbsp;&nbsp; [_Item_]<sup>\*</sup>

-> **<sup>Lexer</sup>**\
-> UTF8BOM : `\uFEFF`\
-> SHEBANG : `#!` \~`\n`<sup>\+</sup>[†](#shebang)
-
-
 > Note: Although Rust, like any other language, can be implemented by an
 > interpreter as well as a compiler, the only existing implementation is a
 > compiler, and the language has always been designed to be compiled. For these
@ -53,6 +46,8 @@ that apply to the containing module, most of which influence the behavior of
 the compiler. The anonymous crate module can have additional attributes that
 apply to the crate as a whole.

+> **Note**: The file's contents may be preceded by a [shebang].
+
 ```rust
 // Specify the crate name.
 #![crate_name = "projx"]
@ -65,34 +60,6 @@ apply to the crate as a whole.
 #![warn(non_camel_case_types)]
 ```

-## Byte order mark
-
-The optional [_UTF8 byte order mark_] (UTF8BOM production) indicates that the
-file is encoded in UTF8. It can only occur at the beginning of the file and
-is ignored by the compiler.
-
-## Shebang
-
-A source file can have a [_shebang_] (SHEBANG production), which indicates
-to the operating system what program to use to execute this file. It serves
-essentially to treat the source file as an executable script. The shebang
-can only occur at the beginning of the file (but after the optional
-_UTF8BOM_). It is ignored by the compiler. For example:
-
-<!-- ignore: tests don't like shebang -->
-```rust,ignore
-#!/usr/bin/env rustx
-
-fn main() {
-    println!("Hello!");
-}
-```
-
-A restriction is imposed on the shebang syntax to avoid confusion with an
-[attribute]. The `#!` characters must not be followed by a `[` token, ignoring
-intervening [comments] or [whitespace]. If this restriction fails, then it is
-not treated as a shebang, but instead as the start of an attribute.
-
 ## Preludes and `no_std`

 This section has been moved to the [Preludes chapter](names/preludes.md).
@ -119,6 +86,17 @@ fn main() -> impl std::process::Termination {
 }
 ```

+The `main` function may be an import, e.g. from an external crate or from the current one.
+
+```rust
+mod foo {
+    pub fn bar() {
+        println!("Hello, world!");
+    }
+}
+use foo::bar as main;
+```
+
 > **Note**: Types with implementations of [`Termination`] in the standard library include:
 >
 > * `()`
@ -161,20 +139,17 @@ or `_` (U+005F) characters.
 [_InnerAttribute_]: attributes.md
 [_Item_]: items.md
 [_MetaNameValueStr_]: attributes.md#meta-item-attribute-syntax
-[_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix)
-[_utf8 byte order mark_]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
 [`ExitCode`]: ../std/process/struct.ExitCode.html
 [`Infallible`]: ../std/convert/enum.Infallible.html
 [`Termination`]: ../std/process/trait.Termination.html
 [attribute]: attributes.md
 [attributes]: attributes.md
-[comments]: comments.md
 [function]: items/functions.md
 [module]: items/modules.md
 [module path]: paths.md
+[shebang]: input-format.md#shebang-removal
 [trait or lifetime bounds]: trait-bounds.md
 [where clauses]: items/generics.md#where-clauses
-[whitespace]: whitespace.md

 <script>
 (function() {
--- a/src/expressions/literal-expr.md
+++ b/src/expressions/literal-expr.md
@ -76,7 +76,7 @@ The escaped value is the character whose [Unicode scalar value] is the result of

 The escape sequence consists of `\u{`, followed by a sequence of characters each of which is a hexadecimal digit or `_`, followed by `}`.

-The escaped value is the character whose [Unicode scalar value] is the result of interpreting the hexadecimal digits contained in the escape sequence as a hexadecimal integer, as if by [`u8::from_str_radix`] with radix 16.
+The escaped value is the character whose [Unicode scalar value] is the result of interpreting the hexadecimal digits contained in the escape sequence as a hexadecimal integer, as if by [`u32::from_str_radix`] with radix 16.

 > **Note**: the permitted forms of a [CHAR_LITERAL] or [STRING_LITERAL] token ensure that there is such a character.

@ -438,6 +438,7 @@ The expression's type is the primitive [boolean type], and its value is:
 [`f64::INFINITY`]: ../../core/primitive.f64.md#associatedconstant.INFINITY
 [`f64::NAN`]: ../../core/primitive.f64.md#associatedconstant.NAN
 [`u8::from_str_radix`]: ../../core/primitive.u8.md#method.from_str_radix
+[`u32::from_str_radix`]: ../../core/primitive.u32.md#method.from_str_radix
 [`u128::from_str_radix`]: ../../core/primitive.u128.md#method.from_str_radix
 [CHAR_LITERAL]: ../tokens.md#character-literals
 [STRING_LITERAL]: ../tokens.md#string-literals
--- a/src/expressions/method-call-expr.md
+++ b/src/expressions/method-call-expr.md
@ -18,14 +18,14 @@ This requires a more complex lookup process than for other functions, since ther
 The following procedure is used:

 The first step is to build a list of candidate receiver types.
-Obtain these by repeatedly [dereferencing][dereference] the receiver expression's type, adding each type encountered to the list, then finally attempting an [unsized coercion] at the end, and adding the result type if that is successful.
+Obtain these by repeatedly adding each type encountered in the receiver expression's type's [`Receiver::Target`] to the list, then finally attempting an [unsized coercion] at the end, and adding the result type if that is successful.
 Then, for each candidate `T`, add `&T` and `&mut T` to the list immediately after `T`.

 For instance, if the receiver has type `Box<[i32;2]>`, then the candidate types will be `Box<[i32;2]>`, `&Box<[i32;2]>`, `&mut Box<[i32;2]>`, `[i32; 2]` (by dereferencing), `&[i32; 2]`, `&mut [i32; 2]`, `[i32]` (by unsized coercion), `&[i32]`, and finally `&mut [i32]`.

 Then, for each candidate type `T`, search for a [visible] method with a receiver of that type in the following places:

-1. `T`'s inherent methods (methods implemented directly on `T`).
+1. `T`'s inherent methods, or receivers to `T`'s inherent methods (methods implemented directly on `T`, or on receivers to `T`).
 1. Any of the methods provided by a [visible] trait implemented by `T`.
   If `T` is a type parameter, methods provided by trait bounds on `T` are looked up first.
   Then all remaining methods in scope are looked up.
@ -94,3 +94,4 @@ Just don't define inherent methods on trait objects with the same name as a trai
 [methods]: ../items/associated-items.md#methods
 [unsized coercion]: ../type-coercions.md#unsized-coercions
 [`IntoIterator`]: ../../std/iter/trait.IntoIterator.html
+[`Receiver::Target`]: ../../std/ops/trait.Receiver.html#associatedtype.Target
--- a/src/input-format.md
+++ b/src/input-format.md
@ -1,3 +1,55 @@
 # Input format

-Rust input is interpreted as a sequence of Unicode code points encoded in UTF-8.
+This chapter describes how a source file is interpreted as a sequence of tokens.
+
+See [Crates and source files] for a description of how programs are organised into files.
+
+## Source encoding
+
+Each source file is interpreted as a sequence of Unicode characters encoded in UTF-8.
+It is an error if the file is not valid UTF-8.
+
+## Byte order mark removal
+
+If the first character in the sequence is `U+FEFF` ([BYTE ORDER MARK]), it is removed.
+
+## CRLF normalization
+
+Each pair of characters `U+000D` (CR) immediately followed by `U+000A` (LF) is replaced by a single `U+000A` (LF).
+
+Other occurrences of the character `U+000D` (CR) are left in place (they are treated as [whitespace]).
+
+## Shebang removal
+
+If the remaining sequence begins with the characters `#!`, the characters up to and including the first `U+000A` (LF) are removed from the sequence.
+
+For example, the first line of the following file would be ignored:
+
+<!-- ignore: tests don't like shebang -->
+```rust,ignore
+#!/usr/bin/env rustx
+
+fn main() {
+    println!("Hello!");
+}
+```
+
+As an exception, if the `#!` characters are followed (ignoring intervening [comments] or [whitespace]) by a `[` token, nothing is removed.
+This prevents an [inner attribute] at the start of a source file being removed.
+
+> **Note**: The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not.
+
+## Tokenization
+
+The resulting sequence of characters is then converted into tokens as described in the remainder of this chapter.
+
+
+[`include!`]: ../std/macro.include.md
+[`include_bytes!`]: ../std/macro.include_bytes.md
+[`include_str!`]: ../std/macro.include_str.md
+[inner attribute]: attributes.md
+[BYTE ORDER MARK]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
+[comments]: comments.md
+[Crates and source files]: crates-and-source-files.md
+[_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix)
+[whitespace]: whitespace.md
--- a/src/paths.md
+++ b/src/paths.md
@ -53,7 +53,7 @@ mod m {
 > &nbsp;&nbsp; | `<` ( _GenericArg_ `,` )<sup>\*</sup> _GenericArg_ `,`<sup>?</sup> `>`
 >
 > _GenericArg_ :\
-> &nbsp;&nbsp; [_Lifetime_] | [_Type_] | _GenericArgsConst_ | _GenericArgsBinding_
+> &nbsp;&nbsp; [_Lifetime_] | [_Type_] | _GenericArgsConst_ | _GenericArgsBinding_ | _GenericArgsBounds_
 >
 > _GenericArgsConst_ :\
 > &nbsp;&nbsp; &nbsp;&nbsp; [_BlockExpression_]\
@ -62,7 +62,10 @@ mod m {
 > &nbsp;&nbsp; | [_SimplePathSegment_]
 >
 > _GenericArgsBinding_ :\
-> &nbsp;&nbsp; [IDENTIFIER] `=` [_Type_]
+> &nbsp;&nbsp; [IDENTIFIER] _GenericArgs_<sup>?</sup> `=` [_Type_]
+>
+> _GenericArgsBounds_ :\
+> &nbsp;&nbsp; [IDENTIFIER] _GenericArgs_<sup>?</sup> `:` [_TypeParamBounds_]

 Paths in expressions allow for paths with generic arguments to be specified. They are
 used in various places in [expressions] and [patterns].
@ -396,6 +399,7 @@ mod without { // crate::without
 [_SimplePathSegment_]: #simple-paths
 [_Type_]: types.md#type-expressions
 [_TypeNoBounds_]: types.md#type-expressions
+[_TypeParamBounds_]: trait-bounds.md
 [literal]: expressions/literal-expr.md
 [item]: items.md
 [variable]: variables.md
--- a/src/procedural-macros.md
+++ b/src/procedural-macros.md
@ -234,8 +234,8 @@ shown in the comments after the function prefixed with "out:".

 #[proc_macro_attribute]
 pub fn show_streams(attr: TokenStream, item: TokenStream) -> TokenStream {
-    println!("attr: \"{}\"", attr.to_string());
-    println!("item: \"{}\"", item.to_string());
+    println!("attr: \"{attr}\"");
+    println!("item: \"{item}\"");
    item
 }
 ```
--- a/src/special-types-and-traits.md
+++ b/src/special-types-and-traits.md
@ -80,6 +80,7 @@ types:

 * Types with a built-in `Copy` implementation (see above)
 * [Tuples] of `Clone` types
+* [Closures] that only capture values of `Clone` types or capture no values from the environment

 ## `Send`

--- a/src/tokens.md
+++ b/src/tokens.md
@ -37,6 +37,8 @@ Literals are tokens used in [literal expressions].

 [^nsets]: The number of `#`s on each side of the same literal must be equivalent.

+> **Note**:  Character and string literal tokens never include the sequence of `U+000D` (CR) immediately followed by `U+000A` (LF): this pair would have been previously transformed into a single `U+000A` (LF).
+
 #### ASCII escapes

 |   | Name |
@ -156,13 +158,10 @@ A _string literal_ is a sequence of any Unicode characters enclosed within two
 `U+0022` (double-quote) characters, with the exception of `U+0022` itself,
 which must be _escaped_ by a preceding `U+005C` character (`\`).

-Line-breaks are allowed in string literals.
-A line-break is either a newline (`U+000A`) or a pair of carriage return and newline (`U+000D`, `U+000A`).
-Both byte sequences are translated to `U+000A`.
-
+Line-breaks, represented by the  character `U+000A` (LF), are allowed in string literals.
 When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
 See [String continuation escapes] for details.
-
+The character `U+000D` (CR) may not appear in a string literal other than as part of such a string continuation escape.

 #### Character escapes

@ -198,10 +197,10 @@ following forms:

 Raw string literals do not process any escapes. They start with the character
 `U+0072` (`r`), followed by fewer than 256 of the character `U+0023` (`#`) and a
-`U+0022` (double-quote) character. The _raw string body_ can contain any sequence
-of Unicode characters and is terminated only by another `U+0022` (double-quote)
-character, followed by the same number of `U+0023` (`#`) characters that preceded
-the opening `U+0022` (double-quote) character.
+`U+0022` (double-quote) character.
+
+The _raw string body_ can contain any sequence of Unicode characters other than `U+000D` (CR).
+It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character.

 All Unicode characters contained in the raw string body represent themselves,
 the characters `U+0022` (double-quote) (except when followed by at least as
@ -259,6 +258,11 @@ the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
 Alternatively, a byte string literal can be a _raw byte string literal_, defined
 below.

+Line-breaks, represented by the  character `U+000A` (LF), are allowed in byte string literals.
+When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
+See [String continuation escapes] for details.
+The character `U+000D` (CR) may not appear in a byte string literal other than as part of such a string continuation escape.
+
 Some additional _escapes_ are available in either byte or non-raw byte string
 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
 following forms:
@ -281,19 +285,19 @@ following forms:
 > &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT SUFFIX<sup>?</sup>
 >
 > RAW_BYTE_STRING_CONTENT :\
-> &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`\
+> &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII_FOR_RAW<sup>* (non-greedy)</sup> `"`\
 > &nbsp;&nbsp; | `#` RAW_BYTE_STRING_CONTENT `#`
 >
-> ASCII :\
-> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F)_
+> ASCII_FOR_RAW :\
+> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F) except IsolatedCR_

 Raw byte string literals do not process any escapes. They start with the
 character `U+0062` (`b`), followed by `U+0072` (`r`), followed by fewer than 256
-of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
-_raw string body_ can contain any sequence of ASCII characters and is terminated
-only by another `U+0022` (double-quote) character, followed by the same number of
-`U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
-character. A raw byte string literal can not contain any non-ASCII byte.
+of the character `U+0023` (`#`), and a `U+0022` (double-quote) character.
+
+The _raw string body_ can contain any sequence of ASCII characters other than `U+000D` (CR).
+It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character.
+A raw byte string literal can not contain any non-ASCII byte.

 All characters contained in the raw string body represent their ASCII encoding,
 the characters `U+0022` (double-quote) (except when followed by at least as
@ -339,6 +343,11 @@ C strings are implicitly terminated by byte `0x00`, so the C string literal
 literal `b"\x00"`. Other than the implicit terminator, byte `0x00` is not
 permitted within a C string.

+Line-breaks, represented by the  character `U+000A` (LF), are allowed in C string literals.
+When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
+See [String continuation escapes] for details.
+The character `U+000D` (CR) may not appear in a C string literal other than as part of such a string continuation escape.
+
 Some additional _escapes_ are available in non-raw C string literals. An escape
 starts with a `U+005C` (`\`) and continues with one of the following forms:

@ -381,11 +390,10 @@ c"\xC3\xA6";

 Raw C string literals do not process any escapes. They start with the
 character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256
-of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
-_raw C string body_ can contain any sequence of Unicode characters (other than
-`U+0000`) and is terminated only by another `U+0022` (double-quote) character,
-followed by the same number of `U+0023` (`#`) characters that preceded the
-opening `U+0022` (double-quote) character.
+of the character `U+0023` (`#`), and a `U+0022` (double-quote) character.
+
+The _raw C string body_ can contain any sequence of Unicode characters other than `U+0000` (NUL) and `U+000D` (CR).
+It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character.

 All characters contained in the raw C string body represent themselves in UTF-8
 encoding. The characters `U+0022` (double-quote) (except when followed by at
@ -630,11 +638,14 @@ Examples of reserved forms:

 > **<sup>Lexer</sup>**\
 > LIFETIME_TOKEN :\
-> &nbsp;&nbsp; &nbsp;&nbsp; `'` [IDENTIFIER_OR_KEYWORD][identifier]\
+> &nbsp;&nbsp; &nbsp;&nbsp; `'` [IDENTIFIER_OR_KEYWORD][identifier]
+>   _(not immediately followed by `'`)_\
 > &nbsp;&nbsp; | `'_`
+>   _(not immediately followed by `'`)_
 >
 > LIFETIME_OR_LABEL :\
 > &nbsp;&nbsp; &nbsp;&nbsp; `'` [NON_KEYWORD_IDENTIFIER][identifier]
+>   _(not immediately followed by `'`)_

 Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any
 LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in
--- a/src/type-layout.md
+++ b/src/type-layout.md
@ -44,17 +44,20 @@ The size of most primitives is given in this table.
 | `u32` / `i32`     | 4                  |
 | `u64` / `i64`     | 8                  |
 | `u128` / `i128`   | 16                 |
+| `usize` / `isize` | See below          |
 | `f32`             | 4                  |
 | `f64`             | 8                  |
 | `char`            | 4                  |

 `usize` and `isize` have a size big enough to contain every address on the
-target platform. For example, on a 32 bit target, this is 4 bytes and on a 64
+target platform. For example, on a 32 bit target, this is 4 bytes, and on a 64
 bit target, this is 8 bytes.

-Most primitives are generally aligned to their size, although this is
-platform-specific behavior. In particular, on x86 u64 and f64 are only
-aligned to 32 bits.
+The alignment of primitives is platform-specific.
+In most cases, their alignment is equal to their size, but it may be less.
+In particular, `i128` and `u128` are often aligned to 4 or 8 bytes even though
+their size is 16, and on many 32-bit platforms, `i64`, `u64`, and `f64` are only
+aligned to 4 bytes, not 8.

 ## Pointers and References Layout
Author	SHA1	Message	Date
Mads Marquart	19a40fb09b	Merge `3cd8b7b123` into `51817951d0`	2024-04-27 12:17:53 -07:00
Eric Huss	51817951d0	Merge pull request #1468 from petrochenkov/debmac Add docs for `#[collapse_debuginfo]` attribute	2024-04-27 17:54:45 +00:00
Eric Huss	2d51a2aec4	Add an example of collapse_debuginfo	2024-04-27 10:53:08 -07:00
Eric Huss	5854fcc286	Merge pull request #1420 from daxpedda/wasm-target-feature-phase-4-5 Stabilize Wasm target features that are in phase 4 and 5	2024-04-21 13:47:07 +00:00
Eric Huss	5e68de3dc2	Merge pull request #1493 from kpreid/patch-1 Expand and clarify primitive alignment	2024-04-20 14:05:08 +00:00
Eric Huss	735c5dbf05	Merge pull request #1492 from conradludgate/patch-1 Update clone reference to include closures	2024-04-17 15:06:26 +00:00
Eric Huss	330ef95694	Clone: Also mention closures that don't capture anything	2024-04-17 08:04:47 -07:00
Kevin Reid	a432cf4afd	Expand and clarify primitive alignment These changes are intended to make the section more informative and readable, without making any new normative claims. * Specify that the alignment might be _less_ than the size, rather than just that it might be different. This is mandatory and stated in the previous section, but I think it's useful to reiterate here. * Mention `u128`/`i128` as another example of alignment less than size, so that this doesn't sound like a mainly 32-bit thing. * Add `usize`/`isize` to the size table, so it can be spotted at a glance.	2024-04-16 09:35:51 -07:00
Conrad Ludgate	4f47e3ffe7	Update clone reference to include closures	2024-04-16 06:55:59 +01:00
Eric Huss	585b9bcb72	Merge pull request #1491 from kpreid/neunit Document how `non_exhaustive` interacts with tuple and unit-like structs.	2024-04-15 19:59:53 +00:00
Kevin Reid	076a798583	Replace “min()” visibility notation with English.	2024-04-15 11:13:52 -07:00
Eric Huss	a60221ad9c	Merge pull request #1490 from jlokier/patch-1 Fix link to RISC-V Zkt spec; it was pointing to Zkr	2024-04-15 16:12:40 +00:00
Kevin Reid	ec0065fd92	Document how `non_exhaustive` interacts with tuple and unit-like structs.	2024-04-14 10:48:46 -07:00
Jamie Lokier	b4311de691	Fix link to RISC-V Zkt spec; it was pointing to Zkr	2024-04-14 14:07:07 +01:00
Eric Huss	55694913b1	Merge pull request #1449 from weiznich/diagnostic_namespace Add the `#[diagnostic]` attribute namespace and the `#[diagnostic::on_unimplemented]` feature to the reference	2024-04-03 21:31:14 +00:00
Eric Huss	52874b8312	Update on_unimplemented for format string changes. Updated in https://github.com/rust-lang/rust/pull/122402	2024-04-03 14:29:34 -07:00
Eric Huss	1c03c9d3b8	Merge pull request #1393 from dvdhrm/pr/align32 type-layout: be more specific about 32-bit alignments	2024-04-03 02:21:31 +00:00
Eric Huss	1e1fec30f1	Merge pull request #1488 from yotamofek/patch-1 Fix clippy warning in procedural macro example	2024-04-01 19:56:13 +00:00
Yotam Ofek	a7a86824fa	Fix clippy warning in procedural macro example I copy+pasted this example into my code and the `clippy::to_string_in_format_args` lint fired.	2024-03-30 23:14:03 +03:00
Eric Huss	984b36eca4	Merge pull request #1486 from aoyama-val/patch-1 Fix typo of shebang	2024-03-25 14:05:19 +00:00
Shotaro Aoyama	0b153cb607	fix typo of shebang	2024-03-24 14:02:49 +09:00
Eric Huss	824b9156b2	Merge pull request #1461 from clubby789/imported-main Document importing `main`	2024-03-20 15:54:19 +00:00
Eric Huss	b6779f40a1	Merge pull request #1481 from compiler-errors/atb add grammar for `associated_type_bounds` in reference	2024-03-20 15:52:40 +00:00
Eric Huss	be4f7be926	Merge pull request #1483 from mattheww/2024-03_unicode_escape_fix Literal expressions: fix mistake in the definition of unicode escapes	2024-03-19 20:01:14 +00:00
Matthew Woodcraft	659915cc11	Literal expressions: fix mistake in the definition of unicode escapes	2024-03-19 19:36:32 +00:00
Eric Huss	5e29b0135e	Various fixes and editing.	2024-03-12 09:21:17 -07:00
Georg Semmler	99b19d92c1	Apply more review suggestions manually Co-authored-by: Eric Huss <eric@huss.org>	2024-03-12 08:23:37 -07:00
Georg Semmler	5baf87cdd9	Apply suggestions from code review Co-authored-by: Eric Huss <eric@huss.org>	2024-03-12 08:23:37 -07:00
Georg Semmler	81fe01a111	Add the `#[diagnostic]` attribute namespace and the `#[diagnostic::on_unimplemented]` feature to the reference	2024-03-12 08:23:37 -07:00
Michael Goulet	6c77f499ea	Update src/paths.md	2024-03-08 11:42:50 -05:00
Michael Goulet	9ad55f00b1	Fix copy/paste error	2024-03-08 11:42:32 -05:00
Michael Goulet	684b549fc7	add support for ATB in reference	2024-03-07 19:47:28 +00:00
Eric Huss	5afb503a4c	Merge pull request #1459 from mattheww/2024-01_input_format Input format	2024-03-06 21:29:54 +00:00
Eric Huss	54400709b0	Merge pull request #1479 from mattheww/2024-03_lifetime_tokens Lexer: say that lifetime-like tokens can't be immediately followed by '	2024-03-06 19:01:19 +00:00
Matthew Woodcraft	7bd81a6a03	tokens.md: say that lifetime-like tokens can't be immediately followed by ' Forms like 'ab'c are rejected, so we need some way to explain why they don't tokenise as two consecutive LIFETIME_OR_LABEL tokens. Address this by adding "not immediately followed by `'`" to each of the lexer rules for the lifetime-like tokens. This also means there can be no ambiguity between CHAR_LITERAL and these tokens (at present we don't say how such ambiguities are resolved).	2024-03-04 21:32:01 +00:00
Eric Huss	c495b9660f	Link `collapse_debuginfo` in the index of built-in attributes.	2024-02-14 10:21:12 -08:00
Eric Huss	860fe4acc1	Use semantic line wrapping.	2024-02-14 10:18:40 -08:00
Eric Huss	4e9c91f0ec	Place `rustc` behavior in a side note. Generally the reference tries to stay focused on the language, and only provide implementation notes as side-information.	2024-02-14 10:17:29 -08:00
Eric Huss	224b6c5306	Use em-dash separator	2024-02-14 10:16:40 -08:00
Eric Huss	bb166095d1	Use standard template introducing an attribute.	2024-02-14 10:16:23 -08:00
Vadim Petrochenkov	0bf5d4e44c	Add docs for `#[collapse_debuginfo]` attribute	2024-02-13 16:26:42 +03:00
clubby789	50a2c87f82	Document importing `main`	2024-01-31 16:22:22 +00:00
Matthew Woodcraft	8ba3c49114	Input format: note about include! macros	2024-01-28 18:44:28 +00:00
Matthew Woodcraft	e364b6c6f9	lexical structure: move the description of shebang-removal This takes place after CRLF normalization. It's better not to list the shebang in a Lexer block, as it isn't a token that can be fed to a macro.	2024-01-28 18:42:40 +00:00
Matthew Woodcraft	5f512692d3	lexical structure: move the description of BOM-removal This takes place at the same time as CRLF normalisation. It's better not to list it in a Lexer block, as it isn't a token that can be fed to a macro.	2024-01-28 18:42:40 +00:00
Matthew Woodcraft	fa56fdba0e	Lexical structure: move the description of CRLF normalization We now say that CRLF normalization happens as a separate pass before tokenization.	2024-01-28 18:42:40 +00:00
Mads Marquart	3cd8b7b123	Proposal for update after RFC 3519	2023-11-21 15:11:08 +01:00
daxpedda	d035af92a1	Stabilize Wasm target features that are in phase 4 and 5	2023-11-01 00:19:00 +01:00
David Rheinsberg	fdee1043ca	type-layout: be more specific about 32-bit alignments The rust-reference implies that 64-bit types are aligned to 32-bit for platforms with 32-bit addresses. This is not necessarily correct. Fix the wording. Note that there is no general rule how data-types greater than the native address size are aligned. On most Unix'y systems, they use the native alignment of the platform. However, the Windows ABI aligns them to their size (up to at least 64-bit). There are advantages for either of those decisions. But we should at least make clear that there is no fixed rule for 32-bit platforms. Signed-off-by: David Rheinsberg <david@readahead.eu>	2023-08-11 10:11:43 +02:00