LlamaIndex’s cover photo
LlamaIndex

LlamaIndex

Technology, Information and Internet

San Francisco, California 277,878 followers

AI agents for document OCR + workflows

About us

LlamaIndex delivers the world's most accurate agentic document processing platform. We bring together industry-leading agentic OCR with a natural language workflow builder to power intelligent agents that read and extract over complex documents, adapt to business logic, and scale reliably to production. Our SDK is downloaded more than 25M+ every month and used by the fastest growing AI companies and the Fortune 50.

Website
https://www.llamaindex.ai/
Industry
Technology, Information and Internet
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Public Company

Locations

Employees at LlamaIndex

Updates

  • Turn your PDF charts into pandas DataFrames with specialized chart parsing in LlamaParse! This tutorial walks you through extracting structured data from charts and graphs in PDFs, then running data analysis with pandas - no manual data entry required. 📊 Enable specialized chart parsing to convert visual charts into structured table data 🐼 Extract table rows directly from parsed PDF pages and load them into DataFrames 📈 Perform year-over-year analysis, calculate gaps between metrics, and create visualizations ⚡ Use the items view to get per-page structured data including tables and figures We demonstrate this using a 2024 Executive Summary PDF, extracting a fiscal year chart showing Budget Deficit vs Net Operating Cost data spanning 2020-2024, and reproducing the key financial insights. Check out the full tutorial: https://lnkd.in/eP--CdkC

    • No alternative text description for this image
  • Build a private equity deal sourcing agent that automatically classifies investment opportunities and extracts key financial metrics using our LlamaAgents Builder. This step-by-step guide shows you how to create an agent that processes deal files like teasers and financial summaries: 🎯 Classify deals into buyout, growth, or minority investment strategies 📊 Extract critical metrics including revenue, EBITDA, growth rates, and debt levels 🚀 Deploy directly to GitHub and get a working UI without writing code 🔧 Iterate and refine your agent through natural language conversations The tutorial covers prompt engineering best practices, using example files effectively, visualizing agent workflows, and deploying to production. We demonstrate the complete process from initial prompt to testing the deployed application with real deal documents. Read the full tutorial: https://lnkd.in/eebVC553

    • No alternative text description for this image
  • LlamaIndex reposted this

    The Model Harness is Everything We are already living in a world of incredible frontier models and incredible agent tools (Claude Code, OpenClaw). But the biggest barrier to getting value from AI is your own ability to context and workflow engineer the models. This is *especially* true the more horizontal the tool that you’re using. If you’re using a very generic tool like ChatGPT and Claude Code, you need to spend a lot of work clearly articulating your requirements and specifications so that the agent can actually solve the task relative to your specifications. Today that looks like being extremely thoughtful about the tools that you select, and writing English very precisely in a skills.md file to articulate the agent these requirements. Some of the work around defining the business workflow is inherently time consuming. Think about any document SOP - simply writing the English can take hours to refine, iterate, and optimize. This is where more vertically focused agents come in; they handle the burden of equipping the agents with relevant prompts to solve a given workflow, so that you can just go in and use the application directly. Another approach is to be specialized services that offer *context* to these agents. This is the space that we (LlamaIndex) are operating in. We are providing the infrastructure to parse the most complex documents into agent-ready context. For other companies it could be offering web data, sales data, documentation, or codebases as a service. At a high-level any AI startup should provide context or workflows on top of these agents. We’re excited about building enduring tech even as the agent landscape evolves. If you’re specifically excited to unlock the billions of context stored within your documents, come talk to us! https://lnkd.in/gE9aSVV7

    • No alternative text description for this image
  • LlamaIndex reposted this

    OmniDocBench is getting saturated VLMs are getting increasingly better at document understanding, from OSS (DeepSeek-OCR2, GLM-OCR), to frontier (Gemini 3, Kimi 5.2, GPT-5.2). A popular benchmark to measure document understanding progress has been OmniDocBench. But we're quickly approaching the point where we need a new benchmark. 1. The latest models are pushing ~95% on OmniDocBench and are already overfitting the benchmark, while still having real gaps on document capabilities. 2. The evaluation metrics of OmniDocBench depend completely on exact match and not on semantic correctness. The latter is much more important, especially in today's world where LLMs can reason over text tokens regardless of non-important formatting differences. *(see example below; our own service llamaparse is penalized even though our parsing is 100% semantically correct on scientific notation parsing)* There are so many real-world documents that haven't yet been solved by even the latest models. We'd love to welcome discussion on advancing document understanding benchmarks that are more diverse and properly score models on semantic correctness. Blog: https://lnkd.in/gRF7z7t8

    • No alternative text description for this image
  • Document OCR benchmarks are hitting a ceiling - and that's a problem for real-world AI applications. Our latest analysis reveals why OmniDocBench, the go-to standard for document parsing evaluation, is becoming inadequate as models like GLM-OCR achieve 94.6% accuracy while still failing on complex real-world documents. 📊 Models are saturating OmniDocBench scores but still struggle with complex financial reports, legal filings, and domain-specific documents 🎯 Rigid exact-match evaluation penalizes semantically correct outputs that differ in formatting (HTML vs markdown, spacing, etc.) ⚡ AI agents need semantic correctness, not perfect formatting matches - current benchmarks miss this critical distinction 🔬 The benchmark's 1,355 pages can't capture the full complexity of production document processing needs The document parsing challenge isn't solved just because benchmark scores look impressive. We need evaluation methods that reward semantic understanding over exact formatting, especially as AI agents become the primary consumers of parsed content. We're building parsing models focused on semantic correctness for complex visual documents. If you're scaling OCR workloads in production, LlamaParse handles the edge cases that benchmarks miss. Read our full analysis: https://lnkd.in/eRyRtv8K

    • No alternative text description for this image
  • LlamaIndex reposted this

    🔊 On February 26th, we’re diving into one of the most overlooked parts of enterprise AI: document ingestion. Join us for an exciting live session with StackAI  and LlamaIndex ! In this session, we’ll cover: • How to process massive document volumes in real-world environments • Why parsing and retrieval quality directly impact agent performance • Hard lessons from production deployments • How to scale ingestion without building a massive data engineering team February 26th, are you in? Register for free 👇

    • No alternative text description for this image
  • 🚀 LlamaAgents Builder just leveled up: File uploads are here! Our natural language interface for building agentic document workflows now supports file uploads. You can provide example documents as context, and the agent will use them as a starting point to design and tailor your workflow. The result? Applications that better match your real-world use case. The more representative your sample files, the more accurate your final app. 🎥 Watch the full walkthrough: https://lnkd.in/e5Ep2WKf 🦙 Get started with LlamaCloud: https://lnkd.in/eJ6zujZ5

  • LlamaIndex reposted this

    Anthropic recently published a report that analyzes how people are using and building more and more autonomous agents in production, across various contexts. Unsurprisingly, software engineering accounts for the vast majority of both usage (50%+ of all tool calls) and people’s first exposure to using autonomous agents (through Claude Code). But an underrated detail in the report is the growing usage of agents within backoffice automation, which is the second largest proportion of tool calls: 9.1% and growing. Backoffice automation is a fantastic use case for agents. A lot of backoffice work depends on routine operations over unstructured documents (invoices, claims packets, loan files). The best interface to automate these operations is enabling users to create deterministic workflows at scale, instead of solving ad-hoc tasks through chat. To make this work well, agents need to be semi-autonomous but low-risk. Humans can trust that the agents can perform large scale document extraction and processing across various types of backoffice work, but the agent needs to be able to surface sources and alert on low confidence scores so that humans can efficiently review and approve the outputs. The traditional Intelligent Document Processing (IDP) and Robotic Process Automation (RPA) industries are truly becoming obsolete, and agentic workflows are taking its place. Link to the report: https://lnkd.in/ggymKwsg Our mission with LlamaIndex is to build the document processing infrastructure to empower the agentic knowledge work automation, from backoffice to beyond.  Our core platform LlamaCloud has state-of-the-art document parsing and extraction capabilities to extract from all the messy documents administrators need to deal with . With LlamaAgents, we are also starting to build an agentic layer within our own document processing product, LlamaCloud, that lets users "vibe-code" these workflows through natural language.  If you’re interested, come check it out (https://lnkd.in/g9Wpqn7w) or talk to us: https://lnkd.in/gE9aSVV7

    • No alternative text description for this image

Similar pages

Browse jobs

Funding