[go: up one dir, main page]

coreutils 0.3.0

coreutils ~ GNU coreutils (updated); implemented as universal (cross-platform) utils, written in Rust
# ๐ŸŒ Localization (L10n) in uutils coreutils

This guide explains how localization (L10n) is implemented in the **Rust-based coreutils project**, detailing the use of [Fluent](https://projectfluent.org/) files, runtime behavior, and developer integration.

## ๐Ÿ—๏ธ Architecture Overview

**English (US) locale files (`en-US.ftl`) are embedded directly in the binary**, ensuring that English always works regardless of how the software is installed. Other language locale files are loaded from the filesystem at runtime.

### Source Repository Structure

- **Main repository**: Contains English (`en-US.ftl`) locale files embedded in binaries
- **Translation repository**: [uutils/coreutils-l10n]https://github.com/uutils/coreutils-l10n contains all other language translations
- **Online Translation**: [weblate/rust-coreutils]https://hosted.weblate.org/projects/rust-coreutils/ to translate the strings.


---

## ๐Ÿ“ Fluent File Layout

Each utility has its own set of translation files under:

```
    src/uu/<utility>/locales/<locale>.ftl
```

Examples:

```
    src/uu/ls/locales/en-US.ftl    # Embedded in binary
    src/uu/ls/locales/fr-FR.ftl    # Loaded from filesystem
```

These files follow Fluent syntax and contain localized message patterns.

The French translation is the only locale with English to be part of the tree. The goal is to be able to run tests with
a different locale to verify that they work.

---

## โš™๏ธ Initialization

Localization must be explicitly initialized at runtime using:

```
    setup_localization(path)
```

This is typically done:
- In `src/bin/coreutils.rs` for **multi-call binaries**
- In `src/uucore/src/lib.rs` for **single-call utilities**

The string parameter determines the lookup path for Fluent files. **English always works** because it's embedded, but other languages need their `.ftl` files to be available at runtime.

---

## ๐ŸŒ Locale Detection

Locale selection is automatic and performed via:

```
    fn detect_system_locale() -> Result<LanguageIdentifier, LocalizationError>
```

It reads the `LANG` environment variable (e.g., `fr-FR.UTF-8`), strips encoding, and parses the identifier.

If parsing fails or `LANG` is not set, it falls back to:

```
    const DEFAULT_LOCALE: &str = "en-US";
```

You can override the locale at runtime by running:

```
    LANG=ja-JP ./target/debug/ls
```

---

## ๐Ÿ“ฅ Retrieving Messages

We have a single macro to handle translations.
It can be used in two ways:

### `translate!(id: &str) -> String`

Returns the message from the current locale bundle.

```
    let msg = translate!("id-greeting");
```

If not found, falls back to `en-US`. If still missing, returns the ID itself.

---

### `translate!(id: &str, args: key-value pairs) -> String`

Supports variable interpolation and pluralization.

```
    let msg = translate!(
        "error-io",
        "error" => std::io::Error::last_os_error()
    );
```

Fluent message example:

```
    error-io = I/O error occurred: { $error }
```

Variables must match the Fluent placeholder keys (`$error`, `$name`, `$count`, etc.).

---

## ๐Ÿ“ฆ Fluent Syntax Example

```
    id-greeting = Hello, world!
    welcome = Welcome, { $name }!
    count-files = You have { $count ->
        [one] { $count } file
       *[other] { $count } files
    }
```

Use plural rules and inline variables to adapt messages dynamically.

---

## ๐Ÿงช Testing Localization

Run all localization-related unit tests with:

```
    cargo test --lib -p uucore
```

Tests include:
- Loading bundles
- Plural logic
- Locale fallback
- Fluent parse errors
- Thread-local behavior
- ...

---

## ๐Ÿงต Thread-local Storage

Localization is stored per thread using a `OnceLock`.
Each thread must call `setup_localization()` individually.
Initialization is **one-time-only** per thread โ€” re-initialization results in an error.

---

## ๐Ÿงช Development vs Release Mode

During development (`cfg(debug_assertions)`), paths are resolved relative to the crate source:

```
    $CARGO_MANIFEST_DIR/../uu/<utility>/locales/
```

In release mode, **paths are resolved relative to the executable**:

```
    <executable_dir>/locales/<utility>/
    <prefix>/share/locales/<utility>/
    ~/.local/share/coreutils/locales/<utility>/
    ~/.cargo/share/coreutils/locales/<utility>/
    /usr/share/coreutils/locales/<utility>/
```

If external locale files aren't found, the system falls back to embedded English locales.

---

## ๐Ÿ”ค Unicode Isolation Handling

By default, the Fluent system wraps variables with Unicode directional isolate characters (`U+2068`, `U+2069`) to protect against visual reordering issues in bidirectional text (e.g., mixing Arabic and English).

In this implementation, isolation is **disabled** via:

```
    bundle.set_use_isolating(false);
```

This improves readability in CLI environments by preventing extraneous characters around interpolated values:

Correct (as rendered):

```
    "Welcome, Alice!"
```

Fluent default (disabled here):

```
    "\u{2068}Alice\u{2069}"
```

---

## ๐Ÿ”ง Embedded English Locales

English locale files are always embedded directly in the binary during the build process. This ensures that:

- **English always works** regardless of installation method (e.g., `cargo install`)
- **No runtime dependency** on external `.ftl` files for English
- **Fallback behavior** when other language files are missing

The embedded English locales are generated at build time and included in the binary, providing a reliable fallback while still supporting full localization for other languages when their `.ftl` files are available.