A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. See go help install for details on the installation location of the installed docd executable. Make sure that the full path to the executable is in your PATH environment variable. To add image support to the docconv library you first need to install and build gosseract. Now you can add -tags ocr to any go command when building/fetching/testing docconv to include support for processing images. Documents can be sent as a multipart POST request and the plain text (body) and meta information are then returned as a JSON object.

Features

  • Add image support to the docconv library
  • Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT
  • Now you can add -tags ocr to any go command when building/fetching/testing docconv to include support for processing images
  • The docd tool runs as a service on port 8888
  • Run locally
  • Request over the network

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow docconv

docconv Web Site

You Might Also Like
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of docconv!

Additional Project Details

Operating Systems

Windows

Programming Language

Go

Related Categories

Go HTML XHTML, Go PDF Editors

Registered

2023-04-27