[go: up one dir, main page]

Showing 98 open source projects for "pdf data mining"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • QA Wolf | We Write, Run and Maintain Tests Icon
    QA Wolf | We Write, Run and Maintain Tests

    For developer teams searching for a testing software

    QA Wolf is an AI-native service that delivers 80% automated E2E test coverage for web & mobile apps in weeks not years.
    Learn More
  • 1
    PDF.js

    PDF.js

    A PDF Reader in JavaScript

    PDF.js is a web standards-based platform for parsing and rendering Portable Document Formats (PDFs). Open source and built with HTML5, this PDF viewer is supported by a great community and Mozilla Labs. PDF.js can be used on both modern and older browsers, and is built into version 19+ of Firefox.
    Downloads: 134 This Week
    Last Update:
    See Project
  • 2
    BentoPDF

    BentoPDF

    A Privacy First PDF Toolkit

    BentoPDF is a self-hosted, open-source PDF toolkit that provides a suite of local PDF manipulation features for users who want full control over their documents without relying on cloud PDF services. It offers functionality to merge, split, compress, rotate, and convert PDFs through an easy-to-deploy container or local installation, making it ideal for individuals and teams that handle large volumes of PDF files regularly.
    Downloads: 50 This Week
    Last Update:
    See Project
  • 3
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 4
    jsPDF

    jsPDF

    HTML5 client solution for generating PDFs

    The leading HTML5 client solution for generating PDFs. Perfect for event tickets, reports, certificates, you name it! PDFs are ubiquitous across the web, with virtually every enterprise relying on them to share documents. We created jsPDF to solve a major problem with how pdf files were being generated. We decided to make it open-source to allow a community of developers to expand on it.
    Downloads: 45 This Week
    Last Update:
    See Project
  • The CI/CD Platform built for Mobile DevOps Icon
    The CI/CD Platform built for Mobile DevOps

    For mobile app developers interested in a powerful CI/CD platform for mobile app development and mobile DevOps

    Save time, money, and developer frustration with fast, flexible, and scalable mobile CI/CD that just works. Whether you swear by native or would rather go cross-platform, we have you covered. From Swift to Objective-C, Java to Kotlin, as well as Xamarin, Cordova, Ionic, React Native, and Flutter: Whatever you choose, we will automatically configure your initial workflows and have you building in minutes.
    Learn More
  • 5
    npm-pdfreader

    npm-pdfreader

    Parse text and tables from PDF files.

    npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    pdfmake

    pdfmake

    Client/server side PDF printing in pure JavaScript

    Print PDFs directly in the browser or delegate it to your NodeJS backend. Use the same document definition in both cases. Forget about manual x, y calculations. Declare document structure and let pdfmake do the rest. Use paragraphs, columns, lists, tables, canvas, etc. Declare your own styles, use custom fonts, build a DSL and extend the framework. Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    DeckTape

    DeckTape

    PDF exporter for HTML presentations

    DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    WebViewer UI

    WebViewer UI

    WebViewer UI built in React

    WebViewer UI sits on top of WebViewer, a powerful JavaScript-based PDF Library that's part of the PDFTron PDF SDK. Built in React, WebViewer UI provides a slick out-of-the-box responsive UI that interacts with the core library to view, annotate and manipulate PDFs that can be embedded into any web project. This repo is specifically designed for any users interested in advanced customizations. With the source code access, it gives developers full control to customize & style the UI, build...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    tableExport.jquery.plugin

    tableExport.jquery.plugin

    jQuery plugin to export a html table to JSON, XML, CSV, TSV, TXT, SQL

    jQuery plugin to export an html table to JSON, XML, CSV, TSV, TXT, SQL, Word, Excel, PNG, and PDF.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Shoplogix Smart Factory Platform Icon
    Shoplogix Smart Factory Platform

    For manufacturers looking for a powerful Manufacturing Execution solution

    Real-time Visibility into Your Shop Floor's Performance. The Shoplogix smart factory platform enables manufacturers to increase overall equipment effectiveness, reduce operational costs, sustain growth and improve profitability by allowing them to visualize, integrate and act on production and machine performance in real-time. Manufacturers that trust us to drive efficiency in their factories. Real-time visual data and analytics provide valuable insights to make better informed decisions. Uncover hidden shop floor potential and drive rapid time to value. Develop a continuously improving culture through training, education and data-driven decisions. Compete in the i4.0 world by making the Shoplogix Smart Factory Platform the cornerstone of your digital transformation. Connect to any equipment or device to automate data collection and exchange it with other manufacturing technologies. Automatically monitor, report and analyze machine states to track real-time production.
    Learn More
  • 10
    carbone

    carbone

    Fast and simple report generator, from JSON to pdf, xslx, docx, odt

    Turn your JSON into PDF, DOCX, XLSX, PPTX, ODS and many more. Fast, Simple and Powerful report generator in any format PDF, DOCX, XLSX, ODT, PPTX, ODS, XML, CSV using templates and your JSON data as input.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Fidus Writer

    Fidus Writer

    Fidus Writer is an online collaborative editor for academics

    Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    OrgChart

    OrgChart

    It's a simple and direct organization chart plugin

    It's a simple and direct organization chart plugin. Anytime you want a tree-like chart, you can turn to OrgChart.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Percollate

    Percollate

    A command-line tool to turn web pages into beautiful, readable PDF

    Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, or HTML files. By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero. By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file. Additional CSS styles you can pass from the command line to...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Element

    Element

    A glossy Matrix collaboration client for the web

    Element, formerly known as Vector and Riot, is a glossy Matrix collaboration client built using the Matrix React SDK. It offers teams, friends and organizations a secure, all in one chat app that is protected from pesky ads and data mining methods. All communications are done through the open global Matrix network, secured with end-to-end encryption. Element gives you all the services you need from a chat app: group chat, video calls, file sharing and more-- all done securely and in total privacy. Element has three different tiers of support for different environments, the most supported being the latest versions of Chrome, Firefox, and Safari on desktop OSes.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    JupyterLab

    JupyterLab

    JupyterLab computational environment

    ...Documents and activities integrate with each other, enabling new workflows for interactive computing. JupyterLab also offers a unified model for viewing and handling data formats. JupyterLab understands many file formats (images, CSV, JSON, Markdown, PDF, Vega, Vega-Lite, etc.) and can also display rich kernel output in these formats. See File and Output Formats for more information. To navigate the user interface, JupyterLab offers customizable keyboard shortcuts and the ability to use key maps from vim, emacs, and Sublime Text in the text editor.
    Downloads: 272 This Week
    Last Update:
    See Project
  • 16
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Collabora Online

    Collabora Online

    Collabora Online is a collaborative online office suite

    Collabora Online is a powerful online office suite that you can integrate into your own infrastructure or access via one of our trusted hosting Partners. Your digital sovereignty is our priority. We provide you with all the tools to keep your data secure, without compromising on features. Collabora Online’s text document editor provides a true WYSIWYG editing experience, making visualizing your document layout incredibly easy. Open any document, add comments and track changes from anywhere,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    Superalgos

    Superalgos

    Free, open-source crypto trading bot, automated bitcoin trading

    Free, open-source crypto trading bot, automated bitcoin/cryptocurrency trading software, algorithmic trading bots. Visually design your crypto trading bot, leveraging an integrated charting system, data-mining, backtesting, paper trading, and multi-server crypto bot deployments. Superalgos is not just another open-source project. We are an open and welcoming community nurtured and incentivized with the project's native Superalgos (SA) Token, building an open trading intelligence network. You will notice the difference as soon as you join the Telegram Community Group or the new Discord Server! ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 55 This Week
    Last Update:
    See Project
  • 20
    Une interface pour la saisie des comptes rendus d'activité mensuels, avec génération d'un PDF résultat.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Jmol

    Jmol

    An interactive viewer for three-dimensional chemical structures.

    Over 1,000,000 page views per month. Jmol/JSmol is a molecular viewer for 3D chemical structures that runs in four independent modes: an HTML5-only web application utilizing jQuery, a Java applet, a stand-alone Java program (Jmol.jar), and a "headless" server-side component (JmolData.jar). Jmol can read many file types, including PDB, CIF, SDF, MOL, PyMOL PSE files, and Spartan files, as well as output from Gaussian, GAMESS, MOPAC, VASP, CRYSTAL, CASTEP, QuantumEspresso, VMD, and many other...
    Leader badge">
    Downloads: 788 This Week
    Last Update:
    See Project
  • 22
    SuiteCRM

    SuiteCRM

    The multi-award winning SuiteCRM is the world's best open source CRM.

    SuiteCRM, developed and maintained by SuiteCRM Ltd, is the world’s most popular open-source CRM solution. With nearly 2 million downloads and an estimated 5 million users globally, it is the go-to choice for businesses seeking flexibility without compromise. As a fully open-source platform, SuiteCRM allows organisations to scale and customise modules to fit their unique workflows. Whether deployed on-premise, as a SaaS solution, or in a private cloud, it ensures total data sovereignty and...
    Leader badge">
    Downloads: 182 This Week
    Last Update:
    See Project
  • 23
    QXRD is software for the acquisition and analysis of X-ray data taken with 2 dimensional detectors. The software can drive a Perkin Elmer XRD series flat panel detector and can be remote-controlled via a socket interface, or directly from SPEC
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    HuMo-genealogy software

    HuMo-genealogy software

    Genealogy program

    HuMo-genealogy is an open-source server-side genealogy program, that dynamically displays genealogical data from a MySQL database as a website with numerous reports and charts. Webmasters can do online editing and users may choose from several languages.
    Leader badge">
    Downloads: 41 This Week
    Last Update:
    See Project
  • 25
    Kiwix

    Kiwix

    Wikipedia offline & more

    Kiwix is an offline reader for Web content. It's especially intended to make Wikipedia available offline. With Kiwix, you can enjoy Wikipedia on a boat, in the middle of nowhere... or in Jail. Kiwix manages to do that by reading ZIM files, a highly compressed open format with additional meta-data.
    Leader badge">
    Downloads: 133 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next