[go: up one dir, main page]

Search Results for "metadata extraction tool"

Showing 471 open source projects for "metadata extraction tool"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • La version gratuite d'Auth0 s'enrichit ! Icon
    La version gratuite d'Auth0 s'enrichit !

    Gratuit pour 25 000 utilisateurs avec intégration Okta illimitée : concentrez-vous sur le développement de vos applications.

    Vous l'avez demandé, nous l'avons fait ! Les versions gratuite et payante d'Auth0 incluent des options qui vous permettent de développer, déployer et faire évoluer vos applications en toute sécurité. Utilisez Auth0 dès maintenant pour découvrir tous ses avantages.
    Essayez Auth0 gratuitement
  • 1
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Amazon EC2 Metadata Mock

    Amazon EC2 Metadata Mock

    A tool to simulate Amazon EC2 instance metadata

    Instance metadata is data about your instance that you can use to configure or manage the running instance. Instance metadata is divided into categories, for example, hostname, events, and security groups. You can also use instance metadata to access user data that you specified when launching your instance. For example, you can specify parameters for configuring your instance, or include a simple script. You can build generic AMIs and use user data to modify the configuration files supplied...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    yt-dlp

    yt-dlp

    A youtube-dl fork with additional features and fixes

    yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keeping up to date with the original project
    Downloads: 291 This Week
    Last Update:
    See Project
  • 4
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 159 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    DBeaver

    DBeaver

    Free universal database tool

    DBeaver is a free, multi-platform database tool that supports any database having a JDBC driver. It is useful for developers, SQL programmers, database administrators and analysts. DBeaver comes with plenty of great features such as metadata and SQL editors, ERD, data export/import/migration and more. Plugins are available for certain databases, and there are also several database management utilities. DBeaver’s Enterprise Edition provides even more features and supports non-JDBC...
    Downloads: 113 This Week
    Last Update:
    See Project
  • 6
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...
    Downloads: 62 This Week
    Last Update:
    See Project
  • 7
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    ExifTool

    ExifTool

    ExifTool meta information reader/writer

    ..., and cleaning or normalizing fields for archives. The tool is meticulous about character encodings, time zones, and tag groups, helping avoid silent corruption when moving metadata between ecosystems. It is scriptable and composable, with options to operate recursively, write sidecars, preserve originals, or do dry runs for safety. Professionals rely on ExifTool for digital asset management, forensic workflows, and any pipeline where metadata quality and traceability matter.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 9
    S3cmd

    S3cmd

    Command line tool for managing Amazon S3 and CloudFront services

    S3cmd (s3cmd) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command-line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc. S3cmd is written in Python. It's an open-source project available under GNU Public License v2...
    Downloads: 48 This Week
    Last Update:
    See Project
  • OpManager the network monitoring software used by over 1 million IT admins Icon
    OpManager the network monitoring software used by over 1 million IT admins

    Network performance monitoring, uncomplicated.

    ManageEngine OpManager is a powerful network monitoring software that provides deep visibility into the performance of your routers, switches, firewalls, load balancers, wireless LAN controllers, servers, VMs, printers, and storage devices. It is an easy-to-use and affordable network monitoring solution that allows you to drill down to the root cause of an issue and eliminate it.
    Learn More
  • 10
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 11
    Namida

    Namida

    A Beautiful and Feature-rich Music & Video Player

    Namida is a modern, local-first music player that focuses on high-quality playback and robust library management. It scans device or desktop storage to build an organized library from folders and tags, presenting artists, albums, and tracks with artwork and rich metadata. The player emphasizes smooth navigation and playlist workflows, enabling favorites, custom lists, and quick queueing for flexible listening sessions. Audio features commonly include gapless playback and adjustable options...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 12
    NetBox

    NetBox

    The premiere source of truth powering network automation

    .... It is a web-based application that can be used to manage IP addresses and the devices and cables connected to them, as well as providing a data center infrastructure management (DCIM) tool. It supports virtualization, inventory management, and cable management. It has a web-based user interface and RESTful API, to easily integrate with other tools and automate tasks.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 13
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 14
    get_iplayer

    get_iplayer

    A utility for downloading TV and radio programmes from BBC iPlayer

    get_iplayer is a command-line tool for downloading and streaming content from BBC iPlayer and BBC Sounds. It provides access to TV programs and radio broadcasts available on the BBC's platforms and allows users to archive content for offline use. The tool includes search, recording, and metadata tagging features and is popular among users looking to maintain access to BBC content globally.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 15
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning. Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 16
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ripgrep is a line-oriented search tool that actively searches the directory you're currently in for a regex pattern. By default, ripgrep will ignore your .gitignore and skip hidden files or directories and binary files automatically. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. ripgrep supports arbitrary input preprocessing filters which...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 17
    Yoast WordPress SEO

    Yoast WordPress SEO

    Yoast SEO for WordPress

    The Yoast SEO plugin is the most popular SEO tool for WordPress, offering comprehensive tools to optimize content for search engines. It provides real-time page analysis, readability checks, and automated metadata handling to improve website visibility.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 18
    uv

    uv

    An extremely fast Python package and project manager, written in Rust

    An extremely fast Python package and project manager, written in Rust.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    EasyTier

    EasyTier

    A simple, decentralized mesh VPN with WireGuard support

    EasyTier is a user-friendly file management tool for creating and managing tiered storage solutions, allowing users to offload rarely used files to alternative storage while keeping the system clean and efficient. Built for Windows, it helps users analyze disk usage, identify large or unused files, and move them to other volumes or cloud drives with minimal effort. Its intuitive interface and automation capabilities make it suitable for both personal and small business use, particularly when...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    Dungbeetle

    Dungbeetle

    A distributed job server

    Dungbeetle is a metadata and data lineage tracking tool developed by Zerodha to map and visualize how data flows across systems. It helps teams maintain data transparency by tracking dependencies between databases, tables, and reports, offering a centralized view of data pipelines. Dungbeetle is designed to enhance observability and trust in analytics ecosystems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    ContextGem

    ContextGem

    ContextGem: Effortless LLM extraction from documents

    ContextGem is an open-source framework designed to simplify the extraction of structured data and insights from documents using large language models (LLMs). It provides a flexible, intuitive API that minimizes boilerplate code, enabling developers to build complex extraction workflows efficiently. ContextGem supports various document formats and integrates with multiple LLM providers, making it a versatile tool for tasks like contract analysis, anomaly detection, and information retrieval.​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Lantern

    Lantern

    Tool to access videos, messaging, and other popular apps

    Can't access your favorite apps? Download Lantern to easily access videos, messaging, and other popular apps while at school or work. Lantern is an application that allows you to bypass firewalls to use your favorite applications and access your favorite websites. Lantern does not cooperate with any law enforcement in any country. Lantern encrypts all of your traffic to blocked sites and services to protect your data and privacy. Lantern passed multiple third party white box security audits...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    MyDumper

    MyDumper

    MyDumper project

    MyDumper is a MySQL Logical Backup Tool. It has 2 tools. mydumper which is responsible to export a consistent backup of MySQL databases. myloader reads the backup from mydumper, connects the to destination database and imports the backup. Both tools use multithreading capabilities. MyDumper is Open Source and maintained by the community, it is not a Percona, MariaDB or MySQL product. Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient code...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 24
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    Dagster

    Dagster

    An orchestration platform for the development, production

    ..., multi-tool engine that scales technically and organizationally. Dagster as a unified control plane: The ‘single plane of glass’ data teams love to use. Rein in the chaos and maintain control over your data as the complexity scales. Centralize your metadata in one tool with built-in observability, diagnostics, cataloging, and lineage. Spot any issues and identify performance improvement opportunities.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next