mirror of https://github.com/rust-lang/reference
Update reference for https://github.com/rust-lang/rust/pull/119172.
Which moved the checking for NUL chars in C string literals earlier.
This commit is contained in:
parent
8c77e8be9d
commit
a393aaf4a1
|
@ -337,9 +337,9 @@ b"\\x52"; br"\x52"; // \x52
|
|||
> **<sup>Lexer</sup>**\
|
||||
> C_STRING_LITERAL :\
|
||||
> `c"` (\
|
||||
> ~\[`"` `\` _IsolatedCR_]\
|
||||
> | BYTE_ESCAPE\
|
||||
> | UNICODE_ESCAPE\
|
||||
> ~\[`"` `\` _IsolatedCR_ _NUL_]\
|
||||
> | BYTE_ESCAPE _except `\0` or `\x00`_\
|
||||
> | UNICODE_ESCAPE _except `\u{0}`, `\u{00}`, …, `\u{000000}`_\
|
||||
> | STRING_CONTINUE\
|
||||
> )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
|
||||
|
||||
|
@ -372,10 +372,6 @@ starts with a `U+005C` (`\`) and continues with one of the following forms:
|
|||
* The _backslash escape_ is the character `U+005C` (`\`) which must be
|
||||
escaped in order to denote its ASCII encoding `0x5C`.
|
||||
|
||||
The escape sequences `\0`, `\x00`, and `\u{0000}` are permitted within the token
|
||||
but will be rejected as invalid, as C strings may not contain byte `0x00` except
|
||||
as the implicit terminator.
|
||||
|
||||
A C string represents bytes with no defined encoding, but a C string literal
|
||||
may contain Unicode characters above `U+007F`. Such characters will be replaced
|
||||
with the bytes of that character's UTF-8 representation.
|
||||
|
@ -398,16 +394,16 @@ c"\xC3\xA6";
|
|||
> `cr` RAW_C_STRING_CONTENT SUFFIX<sup>?</sup>
|
||||
>
|
||||
> RAW_C_STRING_CONTENT :\
|
||||
> `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`\
|
||||
> `"` ( ~ _IsolatedCR_ _NUL_ )<sup>* (non-greedy)</sup> `"`\
|
||||
> | `#` RAW_C_STRING_CONTENT `#`
|
||||
|
||||
Raw C string literals do not process any escapes. They start with the
|
||||
character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256
|
||||
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
|
||||
_raw C string body_ can contain any sequence of Unicode characters and is
|
||||
terminated only by another `U+0022` (double-quote) character, followed by the
|
||||
same number of `U+0023` (`#`) characters that preceded the opening `U+0022`
|
||||
(double-quote) character.
|
||||
_raw C string body_ can contain any sequence of Unicode characters (other than
|
||||
`U+0000`) and is terminated only by another `U+0022` (double-quote) character,
|
||||
followed by the same number of `U+0023` (`#`) characters that preceded the
|
||||
opening `U+0022` (double-quote) character.
|
||||
|
||||
All characters contained in the raw C string body represent themselves in UTF-8
|
||||
encoding. The characters `U+0022` (double-quote) (except when followed by at
|
||||
|
|
Loading…
Reference in New Issue