Crate unicode_width
source ·Expand description
Determine displayed width of char and str types according to
Unicode Standard Annex #11,
other portions of the Unicode standard, and common implementations of
POSIX wcwidth().
See the Rules for determining width section
for the exact rules.
This crate is #![no_std].
use unicode_width::UnicodeWidthStr;
let teststr = "Hello, world!";
let width = UnicodeWidthStr::width(teststr);
println!("{}", teststr);
println!("The above string is {} columns wide.", width);
let width = teststr.width_cjk();
println!("The above string is {} columns wide (CJK).", width);§Rules for determining width
This crate currently uses the following rules to determine the width of a character or string, in order of decreasing precedence. These may be tweaked in the future.
- Emoji presentation sequences have width 2. (The width of a string may therefore differ from the sum of the widths of its characters.)
'\u{00AD}'SOFT HYPHEN has width 1.'\u{115F}'HANGUL CHOSEONG FILLER has width 2.- The following have width 0:
- Characters
with the
Default_Ignorable_Code_Pointproperty. - Characters
with the
Grapheme_Extendproperty. - The following 8 characters, all of which have NFD decompositions consisting of two
Grapheme_Extendchracters:'\u{0CC0}'KANNADA VOWEL SIGN II,'\u{0CC7}'KANNADA VOWEL SIGN EE,'\u{0CC8}'KANNADA VOWEL SIGN AI,'\u{0CCA}'KANNADA VOWEL SIGN O,'\u{0CCB}'KANNADA VOWEL SIGN OO,'\u{1B3B}'BALINESE VOWEL SIGN RA REPA TEDUNG,'\u{1B3D}'BALINESE VOWEL SIGN LA LENGA TEDUNG, and'\u{1B43}'BALINESE VOWEL SIGN PEPET TEDUNG.
- Characters
with a
Hangul_Syllable_TypeofVowel_Jamo(V) orTrailing_Jamo(T). '\0'NUL.
- Characters
with the
- The control characters have no defined width, and are ignored when determining the width of a string.
- Characters
with an
East_Asian_WidthofFullwidth(F) orWide(W) have width 2. - Characters
with an
East_Asian_WidthofAmbiguous(A) have width 2 in an East Asian context, and width 1 otherwise. - All other characters have width 1.
§Canonical equivalence
The non-CJK width methods guarantee that canonically equivalent strings are assigned the same width. However, this guarantee does not currently hold for the CJK width variants.
Constants§
- The version of Unicode that this version of unicode-width is based on.
Traits§
- Methods for determining displayed width of Unicode characters.
- Methods for determining displayed width of Unicode strings.