rfcs/text/0093-remove-format-intl.md

4.5 KiB

Summary

Remove localization features from format!, and change the set of escapes accepted by format strings. The plural and select methods would be removed, # would no longer need to be escaped, and {{/}} would become escapes for { and }, respectively.

Motivation

Localization is difficult to implement correctly, and doing so will likely not be done in the standard library, but rather in an external library. After talking with others much more familiar with localization, it has come to light that our ad-hoc "localization support" in our format strings are woefully inadequate for most real use cases of support for localization.

Instead of having a half-baked unused system adding complexity to the compiler and libraries, the support for this functionality would be removed from format strings.

Detailed design

The primary localization features that format! supports today are the plural and select methods inside of format strings. These methods are choices made at format-time based on the input arguments of how to format a string. This functionality would be removed from the compiler entirely.

As fallout of this change, the # special character, a back reference to the argument being formatted, would no longer be necessary. In that case, this character no longer needs an escape sequence.

The new grammar for format strings would be as follows:

format_string := <text> [ format <text> ] *
format := '{' [ argument ] [ ':' format_spec ] '}'
argument := integer | identifier

format_spec := [[fill]align][sign]['#'][0][width]['.' precision][type]
fill := character
align := '<' | '>'
sign := '+' | '-'
width := count
precision := count | '*'
type := identifier | ''
count := parameter | integer
parameter := integer '$'

The current syntax can be found at http://doc.rust-lang.org/std/fmt/#syntax to see the diff between the two

Choosing a new escape sequence

Upon landing, there was a significant amount of discussion about the escape sequence that would be used in format strings. Some context can be found on some old pull requests, and the current escape mechanism has been the source of much confusion. With the removal of localization methods, and namely nested format directives, it is possible to reconsider the choices of escaping again.

The only two characters that need escaping in format strings are { and }. One of the more appealing syntaxes for escaping was to double the character to represent the character itself. This would mean that {{ is an escape for a { character, while }} would be an escape for a } character.

Adopting this scheme would avoid clashing with Rust's string literal escapes. There would be no "double escape" problem. More details on this can be found in the comments of an old PR.

Drawbacks

The localization methods of select/plural are genuinely used for applications that do not involve localization. For example, the compiler and rustdoc often use plural to easily create plural messages. Removing this functionality from format strings would impose a burden of likely dynamically allocating a string at runtime or defining two separate format strings.

Additionally, changing the syntax of format strings is quite an invasive change. Raw string literals serve as a good use case for format strings that must escape the { and } characters. The current system is arguably good enough to pass with for today.

Alternatives

The major localization approach explored has been l20n, which has shown itself to be fairly incompatible with the way format strings work today. Different localization systems, however, have not been explored. Systems such as gettext would be able to leverage format strings quite well, but it was claimed that gettext for localization is inadequate for modern use-cases.

It is also an unexplored possibility whether the current format string syntax could be leveraged by l20n. It is unlikely that time will be allocated to polish off an localization library before 1.0, and it is currently seen as undesirable to have a half-baked system in the libraries rather than a first-class well designed system.

Unresolved questions

  • Should localization support be left in std::fmt as a "poor man's" implementation for those to use as they see fit?