mirror of https://github.com/rust-lang/reference
lexical structure: move the description of BOM-removal
This takes place at the same time as CRLF normalisation. It's better not to list it in a Lexer block, as it isn't a token that can be fed to a macro.
This commit is contained in:
parent
fa56fdba0e
commit
5f512692d3
|
@ -2,13 +2,11 @@
|
|||
|
||||
> **<sup>Syntax</sup>**\
|
||||
> _Crate_ :\
|
||||
> UTF8BOM<sup>?</sup>\
|
||||
> SHEBANG<sup>?</sup>\
|
||||
> [_InnerAttribute_]<sup>\*</sup>\
|
||||
> [_Item_]<sup>\*</sup>
|
||||
|
||||
> **<sup>Lexer</sup>**\
|
||||
> UTF8BOM : `\uFEFF`\
|
||||
> SHEBANG : `#!` \~`\n`<sup>\+</sup>[†](#shebang)
|
||||
|
||||
|
||||
|
@ -65,19 +63,13 @@ apply to the crate as a whole.
|
|||
#![warn(non_camel_case_types)]
|
||||
```
|
||||
|
||||
## Byte order mark
|
||||
|
||||
The optional [_UTF8 byte order mark_] (UTF8BOM production) indicates that the
|
||||
file is encoded in UTF8. It can only occur at the beginning of the file and
|
||||
is ignored by the compiler.
|
||||
|
||||
## Shebang
|
||||
|
||||
A source file can have a [_shebang_] (SHEBANG production), which indicates
|
||||
to the operating system what program to use to execute this file. It serves
|
||||
essentially to treat the source file as an executable script. The shebang
|
||||
can only occur at the beginning of the file (but after the optional
|
||||
_UTF8BOM_). It is ignored by the compiler. For example:
|
||||
can only occur at the beginning of the file.
|
||||
It is ignored by the compiler. For example:
|
||||
|
||||
<!-- ignore: tests don't like shebang -->
|
||||
```rust,ignore
|
||||
|
@ -162,7 +154,6 @@ or `_` (U+005F) characters.
|
|||
[_Item_]: items.md
|
||||
[_MetaNameValueStr_]: attributes.md#meta-item-attribute-syntax
|
||||
[_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix)
|
||||
[_utf8 byte order mark_]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
|
||||
[`ExitCode`]: ../std/process/struct.ExitCode.html
|
||||
[`Infallible`]: ../std/convert/enum.Infallible.html
|
||||
[`Termination`]: ../std/process/trait.Termination.html
|
||||
|
|
|
@ -9,6 +9,10 @@ See [Crates and source files] for a description of how programs are organised in
|
|||
Each source file is interpreted as a sequence of Unicode characters encoded in UTF-8.
|
||||
It is an error if the file is not valid UTF-8.
|
||||
|
||||
## Byte order mark removal
|
||||
|
||||
If the first character in the sequence is `U+FEFF` ([BYTE ORDER MARK]), it is removed.
|
||||
|
||||
## CRLF normalization
|
||||
|
||||
Each pair of characters `U+000D` (CR) immediately followed by `U+000A` (LF) is replaced by a single `U+000A` (LF).
|
||||
|
@ -19,4 +23,5 @@ Other occurrences of the character `U+000D` (CR) are left in place (they are tre
|
|||
|
||||
The resulting sequence of characters is then converted into tokens as described in the remainder of this chapter.
|
||||
|
||||
[BYTE ORDER MARK]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
|
||||
[Crates and source files]: crates-and-source-files.md
|
||||
|
|
Loading…
Reference in New Issue