[go: up one dir, main page]

content_inspector 0.2.1

Fast inspection of binary buffers to guess/determine the encoding
Documentation

content_inspector

Crates.io Documentation

A simple library for fast inspection of binary buffers to guess/determine the type of content.

This is mainly intended to quickly determine whether a given buffer contains "binary" or "text" data. The analysis is based on a very simple heuristic: Detection of special byte order marks and searching for NULL bytes. Note that this analysis can fail. For example, even if unlikely, UTF-8-encoded text can legally contain NULL bytes. Also, for performance reasons, only the first 1024 bytes are checked for the NULL-byte (if no BOM) is detected.

Usage

use content_inspector::{ContentType, inspect};

assert_eq!(ContentType::UTF_8, inspect(b"Hello"));
assert_eq!(ContentType::BINARY, inspect(b"\xFF\xE0\x00\x10\x4A\x46\x49\x46\x00"));

assert!(inspect(b"Hello").is_text());

CLI example

This crate also comes with a small example command-line program (see examples/inspect.rs) that demonstrates the usage:

> inspect
USAGE: inspect FILE [FILE...]

> inspect testdata/*
testdata/create_text_files.py: UTF-8
testdata/file_sources.md: UTF-8
testdata/test.jpg: binary
testdata/test.pdf: binary
testdata/test.png: binary
testdata/text_UTF-16BE-BOM.txt: UTF-16BE
testdata/text_UTF-16LE-BOM.txt: UTF-16LE
testdata/text_UTF-32BE-BOM.txt: UTF-32BE
testdata/text_UTF-32LE-BOM.txt: UTF-32LE
testdata/text_UTF-8-BOM.txt: UTF-8-BOM
testdata/text_UTF-8.txt: UTF-8

If you only want to detect whether something is a binary or text file, this is about a factor of 250 faster than file --mime ....