[go: up one dir, main page]

Showing 493 open source projects for "pdf tool"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • See what everyone is allocated to. Projects, clients, meetings - all in one tool. Icon
    See what everyone is allocated to. Projects, clients, meetings - all in one tool.

    The fast, simple way to schedule people, equipment and other resources online.

    Designed to replace clunky, old scheduling spreadsheets, Resource Guru helps managers get organized fast. The platform covers resource planning, resource scheduling, resource management, staff leave management, reporting, and more.
    Free Trial
  • 1
    Stirling-PDF

    Stirling-PDF

    Web application that allows you to perform operations on PDF files

    Stirling PDF is a powerful, locally hosted web-based PDF manipulation tool offering a wide range of editing, conversion, and utility features. It allows users to merge, split, compress, convert, OCR, and perform other operations on PDF files directly from a browser without uploading data to third-party servers. The tool is privacy-conscious, self-hostable via Docker, and built with modularity in mind to allow future expansion and integration.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 2
    Markdown to PDF

    Markdown to PDF

    Hackable CLI tool for converting Markdown files to PDF using Node.js

    A simple and hackable CLI tool for converting markdown to pdf. It uses Marked to convert markdown to HTML and Puppeteer (headless Chromium) to further convert the HTML to PDF. It also uses highlight.js for code highlighting. The whole source code of this tool is only ~250 lines of JS ~500 lines of Typescript and ~100 lines of CSS, so it is easy to clone and customize.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Malicious PDF Generator

    Malicious PDF Generator

    Generate a bunch of malicious pdf files with phone-home functionality

    Generate ten different malicious PDF files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh. Used for penetration testing and/or red-teaming etc. I created this tool because I needed a third-party tool to generate a bunch of PDF files with various links.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 8 This Week
    Last Update:
    See Project
  • Eurekos LMS - Build a Smarter Customer Icon
    Eurekos LMS - Build a Smarter Customer

    The Eurekos customer training LMS makes it easy to deliver product training that retains more customers and transforms partners into advocates.

    Eurekos is a purpose-built LMS that engages customers throughout the entire learning journey from pre-sales, to onboarding, and everything after.
    Learn More
  • 5
    PDF4QT

    PDF4QT

    Open source PDF editor

    PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts.
    Downloads: 60 This Week
    Last Update:
    See Project
  • 6
    PDFsam

    PDFsam

    PDFsam, a desktop application to split, merge, mix, rotate PDF files

    PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx installed in order to run.
    Downloads: 124 This Week
    Last Update:
    See Project
  • 7
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    GROBID

    GROBID

    A machine learning software for extracting information

    GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 9
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Online Project Management Platform - Zoho Icon
    Online Project Management Platform - Zoho

    A plan put together with small businesses and startups in mind.

    Zoho Projects is a cloud-based project management solution that helps teams plan, track, collaborate, and achieve project goals.
    Learn More
  • 10
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 265 This Week
    Last Update:
    See Project
  • 12
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 13
    Stirling-PDF

    Stirling-PDF

    #1 Locally hosted web application that allows you to work on PDFs

    This is a robust, locally hosted web-based PDF manipulation tool using Docker. It enables you to carry out various operations on PDF files, including splitting, merging, converting, reorganizing, adding images, rotating, compressing, and more. This locally hosted web application has evolved to encompass a comprehensive set of features, addressing all your PDF requirements.
    Downloads: 125 This Week
    Last Update:
    See Project
  • 14
    ExifTool

    ExifTool

    ExifTool meta information reader/writer

    ExifTool is a battle-tested Perl application for reading, writing, and batch-editing metadata in thousands of file types—images, videos, audio, documents, and more. It understands major standards like EXIF, IPTC, and XMP as well as an enormous range of camera maker notes and container formats (for example, QuickTime/MP4, PDF, TIFF). Typical workflows include extracting metadata to JSON/CSV/XML, renaming files from timestamps or tags, shifting capture times, copying tags between files, and cleaning or normalizing fields for archives. The tool is meticulous about character encodings, time zones, and tag groups, helping avoid silent corruption when moving metadata between ecosystems. ...
    Downloads: 87 This Week
    Last Update:
    See Project
  • 15
    Gotenberg

    Gotenberg

    A Docker-powered stateless API for PDF files

    Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more! Thanks to Docker, you don't have to install each tool in your environments; drop the Docker image in your stack, and you're good to go! The webhook feature allows you to upload the output file to the destination of your choice. There are many options to fit your requirements, from the custom HTTP headers sent to your webhook to the HTTP method used to call it. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...
    Downloads: 51 This Week
    Last Update:
    See Project
  • 17
    rga

    rga

    rga: ripgrep, but also search in PDFs, E-Books, Office documents, etc.

    rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in PDF, docx, sqlite, JPG, movie subtitles (mkv, mp4), etc.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    DocuSeal

    DocuSeal

    Open source DocuSign alternative

    Open source document filling and signing. DocuSeal is an open-source platform that provides secure and efficient digital document signing and processing. Create PDF forms to have them filled and signed online on any device with an easy-to-use, mobile-optimized web tool. Use embeddable code snippets to seamlessly implement the document signing workflows directly on your website or app. Build fillable document forms using our pixel-perfect HTML API, reducing the time for creating personalized documents. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Percollate

    Percollate

    A command-line tool to turn web pages into beautiful, readable PDF

    Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, or HTML files. By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero. By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    File Converter

    File Converter

    Simple tool which allows you to convert and compress files

    File Converter is a minimalist open‑source tool (GPL‑3.0) that lets users convert and compress one or multiple files directly via the Windows Explorer context menu. It integrates with powerful back-end utilities—FFmpeg, ImageMagick, Ghostscript—to handle a broad range of media and document transformations. File Converter is a personal open source project started in 2014. I have put hundreds of hours adding, refining and tuning File Converter with the goal of making the conversion and...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 21
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ripgrep is a line-oriented search tool that actively searches the directory you're currently in for a regex pattern. By default, ripgrep will ignore your .gitignore and skip hidden files or directories and binary files automatically. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. ripgrep supports arbitrary input preprocessing filters which could be PDF text extraction, less supported decompression, decrypting, automatic encoding detection and so on. ...
    Downloads: 75 This Week
    Last Update:
    See Project
  • 22
    PDF-utility

    PDF-utility

    PDF Utility is a tool designed to efficiently manipulate PDF files

    Digna PDF Utility is a tool designed to efficiently manipulate PDF documents. It offers a range of functionalities including adding page numbers, deleting unwanted pages, merging multiple PDFs into a single file, converting PDF to DOCX and vice versa, protect a PDF file with password and displaying PDF content.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Everything cURL

    Everything cURL

    The book documenting the curl project, the curl tool, libcurl

    Everything curl is an extensive, continuously maintained book that documents the entire curl ecosystem: the curl command-line tool, the libcurl library, the project’s history and development practices, and practical guidance for using and contributing to curl. The project is written as an open source book (CC-BY-4.0) and is available in multiple formats and locations, including an online website, PDF, and ePub so readers can pick the format that suits them. Content ranges from beginner-friendly tutorials and usage examples to deep dives into internals, protocols, bindings, build instructions, and advanced deployment scenarios, making the book useful for both casual users and experienced developers. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    dvisvgm

    dvisvgm

    A fast DVI, EPS, and PDF to SVG converter

    The command-line utility dvisvgm is a tool for TEX/LATEX users. It converts DVI, EPS, and PDF files to the XML-based vector graphics format SVG. In contrast to bitmap graphics, vector graphics are arbitrarily scalable without loss of quality. All modern web browsers support a large amount of the current SVG standard 1.1. Furthermore, SVG files can also be displayed with the Java-based Squiggle SVG browser which is part of the Apache Batik project, and the free vector graphics editor Inkscape.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next