Search Results for "pdf to text delphi"

Sort By:

Showing 938 open source projects for "pdf to text delphi"

View related business solutions

Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
Online Project Management Platform - Zoho
A plan put together with small businesses and startups in mind.

Zoho Projects is a cloud-based project management solution that helps teams plan, track, collaborate, and achieve project goals.

Learn More
1

PDF Editor

Offline PDF editor. Add images, signatures, text to PDF in the browser

Offline PDF editor. Add images, signatures, text to PDF in your browser.

Downloads: 24 This Week

Last Update: 2024-09-04
See Project
2

Text Editors

Sempare Template (scripting) Engine for Delphi

Sempare Template (scripting) Engine for Delphi allows for flexible dynamic text generation. It can be used for generating email, HTML, reports, source code, xml, configuration, etc.

Downloads: 2 This Week

Last Update: 2025-05-08
See Project
3

Asciidoctor PDF

Asciidoctor PDF: A native PDF converter for AsciiDoc

A fast text processor & publishing toolchain for converting AsciiDoc to HTML5, DocBook & more. Asciidoctor is a fast, open source, Ruby-based text processor for parsing AsciiDoc® into a document model and converting it to output formats such as HTML 5, DocBook 5, manual pages, PDF, EPUB 3, and other formats. Asciidoctor also has an ecosystem of extensions, converters, build plugins, and tools to help you author and publish content written in AsciiDoc.

Downloads: 3 This Week

Last Update: 2025-11-15
See Project
4

Nano PDF Editor

Edit PDF files with Nano Banana

Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms.

Downloads: 32 This Week

Last Update: 2026-02-05
See Project
The Easy Way To Build A Referral Program
Referral Factory is the #1 referral software used by SMEs and Marketers.

Referral Factory offers over 1000 pre-built referral program templates you can use as your own, or you can build your own referral program from scratch. You get unlimited referral campaigns on all plans, and brilliant support from their team of referral marketing experts.

Learn More
5

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 134 This Week

Last Update: 3 days ago
See Project
6

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 8 This Week

Last Update: 2025-04-28
See Project
7

TeXworks

A simple interface for working with TeX documents

TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.

1 Review

Downloads: 101 This Week

Last Update: 4 days ago
See Project
8

PdfPig

Read and extract text and other content from PDFs in C#

This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.

Downloads: 12 This Week

Last Update: 2025-12-23
See Project
9

PDF4QT

Open source PDF editor

PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts.

Downloads: 60 This Week

Last Update: 2026-01-22
See Project
Run your private office with the ONLYOFFICE
Secure office and productivity apps

A Comprehensive Alternative to Office 365 for Business

Learn More
10

pdfly

CLI tool to extract (meta)data from PDF and manipulate PDF files

A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.

Downloads: 7 This Week

Last Update: 2025-10-13
See Project
11

PDFsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files

PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx installed in order to run.

Downloads: 124 This Week

Last Update: 2026-01-26
See Project
12

PDFCraft

PDFCraft is a free, privacy-focused PDF toolkit

PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.

Downloads: 23 This Week

Last Update: 17 hours ago
See Project
13

borb

borb is a library for reading, creating and manipulating PDF files

borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.

Downloads: 4 This Week

Last Update: 2026-01-24
See Project
14

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...

Downloads: 1 This Week

Last Update: 1 day ago
See Project
15

unipdf

Golang PDF library for creating and processing PDF files (pure go)

UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services. Every release of our libraries is automatically tested against known vulnerabilities and do not pass unless everything is remediated. All changes are carefully reviewed by our team.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
16

PyMuPDF

Python bindings for MuPDF's rendering library.

MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen.

Downloads: 11 This Week

Last Update: 3 days ago
See Project
17

tinypdf

Minimal PDF creation library

...It also supports clickable links so generated documents can include interactive URLs, and it can create multi-page documents with custom page sizes. A notable convenience is built-in markdown-to-PDF conversion for common structures like headers and lists, letting you go from formatted text to a PDF layout quickly.

Downloads: 2 This Week

Last Update: 2026-02-01
See Project
18

npm-pdfreader

Parse text and tables from PDF files.

npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs.

Downloads: 0 This Week

Last Update: 2025-11-01
See Project
19

zpdf

Zero-copy PDF text extraction library written in Zig

...It implements multiple PDF decompression filters and handles common font encoding pathways, which are essential for turning raw PDF content streams into readable text. It also understands both classic cross-reference tables and newer cross-reference streams, including PDF 1.5+ features, and it offers configurable strict vs permissive error handling depending on whether you prioritize correctness or robustness.

Downloads: 2 This Week

Last Update: 2026-02-01
See Project
20

PyPDF

A pure-python PDF library capable of splitting, merging, cropping

pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.

Downloads: 5 This Week

Last Update: 6 days ago
See Project
21

Crowbook

Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

Crowbook's aim is to allow you to write a book in Markdown without worrying about formatting or typography and let the program generate HTML, PDF and EPUB output for you. Its focus is novels and fiction, and the default settings should (hopefully) generate readable books with correct typography without requiring you to worry about it. To see what Crowbook's output looks like, you can read the Crowbook guide rendered in HTML, PDF or EPUB. Crowbook will parse this file and generate HTML, EPUB,...

Downloads: 0 This Week

Last Update: 2025-06-07
See Project
22

Unredact

A simple tool for reading in poorly redacted documents

Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...

Downloads: 39 This Week

Last Update: 2026-02-03
See Project
23

Tesseract OCR

Open Source OCR Engine

Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns. Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. ...

Downloads: 2,233 This Week

Last Update: 2025-12-26
See Project
24

GROBID

A machine learning software for extracting information

GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The...

Downloads: 18 This Week

Last Update: 2025-05-11
See Project
25

xhtml2pdf

A library for converting HTML into PDFs using ReportLab

xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.

Downloads: 5 This Week

Last Update: 2025-02-23
See Project