docs.rs failed to build extractous-0.1.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
extractous-0.3.0
Extractous
Extractous is a Rust crate that provides a unified approach for detecting and extracting metadata and text content from various documents types such as PDF, Word, HTML, and many other formats.
Features
- High-level Rust API for extracting text and metadata content for many file formats.
- Strives to be efficient and fast.
- Internally it calls the Apache Tika for any file format that is not natively supported in the Rust core.
- Comprehensive documentation and examples to help you get started quickly.
Installation
To use extractous in your Rust project, add the following line to your Cargo.toml file:
[]
= "0.1.1"
Supported file formats
| File Format | Native Rust | Through Tika |
|---|---|---|
| - | ✅ | |
| csv | ✅ | - |
Building
- GraalVm is required to build tika_native. We recommend using sdkman
- To be able to use awt on macOS, please use Bellsoft Liberica NIK java 22
sdk install java 24.0.1.r22-nik- We use gradle to perform the build. Gradle wrapper is included in the project, no need to install gradle.
- Make sure
JAVA_HOMEis pointing to the graalvm jdk and not any other jdk in your environment. Tryjava --versionyou should see something like:
openjdk 22.0.1 2024-04-16
OpenJDK Runtime Environment Liberica-NIK-24.0.1-1 (build 22.0.1+10)
OpenJDK 64-Bit Server VM Liberica-NIK-24.0.1-1 (build 22.0.1+10, mixed mode, sharing)