[go: up one dir, main page]

string-interner 0.10.0

Efficient string interner with minimal memory footprint and fast access to the underlying strings.
Documentation

String Interner

Continuous Integration Test Coverage Documentation Crates.io
travisCI codecov docs crates

A data structure to cache strings efficiently, with minimal memory footprint and the ability to assicate the interned strings with unique symbols. These symbols allow for constant time comparisons and look-ups to the underlying interned string contents. Also, iterating through the interned strings is cache efficient.

Internals

  • Internally a hashmap M and a vector V is used.
  • V stores the contents of interned strings while M has internal references into the string of V to avoid duplicates.
  • V stores the strings with an indirection to avoid iterator invalidation.
  • Returned symbols usually have a low memory footprint and are efficiently comparable.

Planned Features

  • Safe abstraction wrapper that protects the user from the following misusages:
    • Using symbols of a different string interner instance to resolve string in another.
    • Using symbols that are already no longer valid (i.e. the associated string interner is no longer available).

License

Licensed under either of

at your option.

Dual licence: badge badge

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Changelog

  • 0.10.0

    • Implement pluggable backends for StringInterner. Uses the new BucketBackend by default which results in significant performance boosts and lower memory consumption as well as fewer overall memory allocations.

      This makes it possible for dependencies to alter the behavior of internment. The string-interner crate comes with 2 predefined backends:

      1. SimpleBackend: Which is how the StringInterner of previous versions worked by default. It performs one allocation per interned string.
      2. BucketBackend: Tries to minimize memory allocations and packs interned strings densely. This is the new default behavior for this crate.
    • Due to the above introduction of backends some APIs have been removed:

      • reserve
      • capacity
      • the entire iter module
        • Note: Simple iteration through the StringInterer's interned strings and their symbols is still possible if the used backend supports iteration.
      • resolve_unchecked: Has no replacement, yet but might be reintroduced in future versions again.
      • shrink_to_fit: The API design was never really a good fit for interners.
  • 0.9.0

    • Remove Ord trait bound from Symbol trait
      • Also change Symbol::from_usize(usize) -> Self to Symbol::try_from_usize(usize) -> Option<Self>
    • Minor performance improvements for DefaultSymbol::try_from_usize
    • Put all iterator types into the iter sub module
    • Put all symbol types into the symbol sub module
    • Add new symbol types:
      • SymbolU16: 16-bit wide symbol
      • SymbolU32: 32-bit wide symbol (default)
      • SymbolUsize: same size as usize
    • Various internal improvements and reorganizations
  • 0.8.0

    • Make it possible to use this crate in no_std environments
      • Use the new hashbrown crate feature together with no_std
    • Rename Sym to DefaultSymbol
    • Add IntoIterator impl for &StringInterner
    • Add some #[inline] annotations which improve performance for queries
    • Various internal improvements (uses Pin self-referentials now)
  • 0.7.1

    • CRITICAL fix use after free bug in StringInterner::clone()
    • implement std::iter::Extend for StringInterner
    • Sym::from_usize now avoids using unsafe code
    • optimize FromIterator impl of StringInterner
    • move to Rust 2018 edition

    Thanks YOSHIOKA Takuma for implementing this release.

  • 0.7.0

    • changed license from MIT to MIT/APACHE2.0
    • removed generic impl of Symbol for types that are From<usize> and Into<usize>
    • removed StringInterner::clear API since its usage breaks invariants
    • added StringInterner::{capacity, reserve} APIs
    • introduced a new default symbol type Sym that is a thin wrapper around NonZeroU32 (idea by koute)
    • made DefaultStringInterner a type alias for the new StringInterner<Sym>
    • added convenient FromIterator impl to StringInterner<S: Sym>
    • dev
      • rewrote all unit tests (serde tests are still missing)
      • entirely refactored benchmark framework
      • added html_root_url to crate root

    Thanks matklad for suggestions and impulses

  • 0.6.3

    • fixed a bug that StringInterner's Send impl didn't respect its generic HashBuilder parameter. Fixes GitHub issue #4.
  • 0.6.2

    • added shrink_to_fit public method to StringInterner - (by artemshein)
  • 0.6.1

    • fixed a bug that inserting non-owning string types (e.g. str) was broken due to dangling pointers (Thanks to artemshein for fixing it!)
  • 0.6.0

    • added optional serde serialization and deserialization support
    • more efficient and generic PartialEq implementation for StringInterner
    • made StringInterner generic over BuildHasher to allow for custom hashers
  • 0.5.0

    • added IntoIterator trait implementation for StringInterner
    • greatly simplified iterator code
  • 0.4.0

    • removed restrictive constraint for Unsigned for Symbol
  • 0.3.3

    • added Send and Sync to InternalStrRef to make StringInterner itself Send and Sync