LlamaIndex

Technology, Information and Internet

San Francisco, California 277,878 followers

AI agents for document OCR + workflows

See jobs Follow

View all 85 employees

About us

LlamaIndex delivers the world's most accurate agentic document processing platform. We bring together industry-leading agentic OCR with a natural language workflow builder to power intelligent agents that read and extract over complex documents, adapt to business logic, and scale reliably to production. Our SDK is downloaded more than 25M+ every month and used by the fastest growing AI companies and the Fortune 50.

Website: https://www.llamaindex.ai/
External link for LlamaIndex
Industry: Technology, Information and Internet
Company size: 11-50 employees
Headquarters: San Francisco, California
Type: Public Company

Locations

Primary

San Francisco, California, US

Get directions
447 Sutter St

San Francisco, California 94108, US

Get directions

Employees at LlamaIndex

See all employees

Updates

LlamaIndex

277,878 followers
1d
Report this post
Turn your PDF charts into pandas DataFrames with specialized chart parsing in LlamaParse! This tutorial walks you through extracting structured data from charts and graphs in PDFs, then running data analysis with pandas - no manual data entry required. 📊 Enable specialized chart parsing to convert visual charts into structured table data 🐼 Extract table rows directly from parsed PDF pages and load them into DataFrames 📈 Perform year-over-year analysis, calculate gaps between metrics, and create visualizations ⚡ Use the items view to get per-page structured data including tables and figures We demonstrate this using a 2024 Executive Summary PDF, extracting a fiscal year chart showing Budget Deficit vs Net Operating Cost data spanning 2020-2024, and reproducing the key financial insights. Check out the full tutorial: https://lnkd.in/eP--CdkC
1 Comment

Like Comment Share
LlamaIndex

277,878 followers
2d
Report this post
Build a private equity deal sourcing agent that automatically classifies investment opportunities and extracts key financial metrics using our LlamaAgents Builder. This step-by-step guide shows you how to create an agent that processes deal files like teasers and financial summaries: 🎯 Classify deals into buyout, growth, or minority investment strategies 📊 Extract critical metrics including revenue, EBITDA, growth rates, and debt levels 🚀 Deploy directly to GitHub and get a working UI without writing code 🔧 Iterate and refine your agent through natural language conversations The tutorial covers prompt engineering best practices, using example files effectively, visualizing agent workflows, and deploying to production. We demonstrate the complete process from initial prompt to testing the deployed application with real deal documents. Read the full tutorial: https://lnkd.in/eebVC553
Like Comment Share
LlamaIndex reposted this
Jerry Liu
3d
Report this post
The Model Harness is Everything We are already living in a world of incredible frontier models and incredible agent tools (Claude Code, OpenClaw). But the biggest barrier to getting value from AI is your own ability to context and workflow engineer the models. This is *especially* true the more horizontal the tool that you’re using. If you’re using a very generic tool like ChatGPT and Claude Code, you need to spend a lot of work clearly articulating your requirements and specifications so that the agent can actually solve the task relative to your specifications. Today that looks like being extremely thoughtful about the tools that you select, and writing English very precisely in a skills.md file to articulate the agent these requirements. Some of the work around defining the business workflow is inherently time consuming. Think about any document SOP - simply writing the English can take hours to refine, iterate, and optimize. This is where more vertically focused agents come in; they handle the burden of equipping the agents with relevant prompts to solve a given workflow, so that you can just go in and use the application directly. Another approach is to be specialized services that offer *context* to these agents. This is the space that we (LlamaIndex) are operating in. We are providing the infrastructure to parse the most complex documents into agent-ready context. For other companies it could be offering web data, sales data, documentation, or codebases as a service. At a high-level any AI startup should provide context or workflows on top of these agents. We’re excited about building enduring tech even as the agent landscape evolves. If you’re specifically excited to unlock the billions of context stored within your documents, come talk to us! https://lnkd.in/gE9aSVV7
6 Comments

Like Comment Share
LlamaIndex reposted this
Toni Rosiñol
3d
Report this post
🚀 TOMORROW let’s talk about the mess no one wants to deal with: document ingestion. Scanned PDFs, old spreadsheets, data rooms with 6,000 files, financial statements messy reports The stuff that breaks your agent in production. If you want to build agents that handle messy documents and still work when it matters… Be there! Link in the comments. StackAI LlamaIndex
12 Comments

Like Comment Share
LlamaIndex reposted this
Jerry Liu
4d
Report this post
OmniDocBench is getting saturated VLMs are getting increasingly better at document understanding, from OSS (DeepSeek-OCR2, GLM-OCR), to frontier (Gemini 3, Kimi 5.2, GPT-5.2). A popular benchmark to measure document understanding progress has been OmniDocBench. But we're quickly approaching the point where we need a new benchmark. 1. The latest models are pushing ~95% on OmniDocBench and are already overfitting the benchmark, while still having real gaps on document capabilities. 2. The evaluation metrics of OmniDocBench depend completely on exact match and not on semantic correctness. The latter is much more important, especially in today's world where LLMs can reason over text tokens regardless of non-important formatting differences. *(see example below; our own service llamaparse is penalized even though our parsing is 100% semantically correct on scientific notation parsing)* There are so many real-world documents that haven't yet been solved by even the latest models. We'd love to welcome discussion on advancing document understanding benchmarks that are more diverse and properly score models on semantic correctness. Blog: https://lnkd.in/gRF7z7t8
5 Comments

Like Comment Share
LlamaIndex

277,878 followers
4d
Report this post
Document OCR benchmarks are hitting a ceiling - and that's a problem for real-world AI applications. Our latest analysis reveals why OmniDocBench, the go-to standard for document parsing evaluation, is becoming inadequate as models like GLM-OCR achieve 94.6% accuracy while still failing on complex real-world documents. 📊 Models are saturating OmniDocBench scores but still struggle with complex financial reports, legal filings, and domain-specific documents 🎯 Rigid exact-match evaluation penalizes semantically correct outputs that differ in formatting (HTML vs markdown, spacing, etc.) ⚡ AI agents need semantic correctness, not perfect formatting matches - current benchmarks miss this critical distinction 🔬 The benchmark's 1,355 pages can't capture the full complexity of production document processing needs The document parsing challenge isn't solved just because benchmark scores look impressive. We need evaluation methods that reward semantic understanding over exact formatting, especially as AI agents become the primary consumers of parsed content. We're building parsing models focused on semantic correctness for complex visual documents. If you're scaling OCR workloads in production, LlamaParse handles the edge cases that benchmarks miss. Read our full analysis: https://lnkd.in/eRyRtv8K
1 Comment

Like Comment Share
LlamaIndex reposted this
Bernard Aceituno
5d
Report this post
🔊 On February 26th, we’re diving into one of the most overlooked parts of enterprise AI: document ingestion. Join us for an exciting live session with StackAI and LlamaIndex ! In this session, we’ll cover: • How to process massive document volumes in real-world environments • Why parsing and retrieval quality directly impact agent performance • Hard lessons from production deployments • How to scale ingestion without building a massive data engineering team February 26th, are you in? Register for free 👇
6 Comments

Like Comment Share
LlamaIndex

277,878 followers
5d
Report this post
Boyang Zhang's internship already started with us here at LlamaIndex and we are thrilled with how much he's already done. Thanks for everything so far, we're excited to see what else you can do! 🦙🚀
2 Comments

Like Comment Share
LlamaIndex

277,878 followers
5d
Report this post
🚀 LlamaAgents Builder just leveled up: File uploads are here! Our natural language interface for building agentic document workflows now supports file uploads. You can provide example documents as context, and the agent will use them as a starting point to design and tailor your workflow. The result? Applications that better match your real-world use case. The more representative your sample files, the more accurate your final app. 🎥 Watch the full walkthrough: https://lnkd.in/e5Ep2WKf 🦙 Get started with LlamaCloud: https://lnkd.in/eJ6zujZ5

Like Comment Share
LlamaIndex reposted this
Jerry Liu
1w
Report this post
Anthropic recently published a report that analyzes how people are using and building more and more autonomous agents in production, across various contexts. Unsurprisingly, software engineering accounts for the vast majority of both usage (50%+ of all tool calls) and people’s first exposure to using autonomous agents (through Claude Code). But an underrated detail in the report is the growing usage of agents within backoffice automation, which is the second largest proportion of tool calls: 9.1% and growing. Backoffice automation is a fantastic use case for agents. A lot of backoffice work depends on routine operations over unstructured documents (invoices, claims packets, loan files). The best interface to automate these operations is enabling users to create deterministic workflows at scale, instead of solving ad-hoc tasks through chat. To make this work well, agents need to be semi-autonomous but low-risk. Humans can trust that the agents can perform large scale document extraction and processing across various types of backoffice work, but the agent needs to be able to surface sources and alert on low confidence scores so that humans can efficiently review and approve the outputs. The traditional Intelligent Document Processing (IDP) and Robotic Process Automation (RPA) industries are truly becoming obsolete, and agentic workflows are taking its place. Link to the report: https://lnkd.in/ggymKwsg Our mission with LlamaIndex is to build the document processing infrastructure to empower the agentic knowledge work automation, from backoffice to beyond. Our core platform LlamaCloud has state-of-the-art document parsing and extraction capabilities to extract from all the messy documents administrators need to deal with . With LlamaAgents, we are also starting to build an agentic layer within our own document processing product, LlamaCloud, that lets users "vibe-code" these workflows through natural language. If you’re interested, come check it out (https://lnkd.in/g9Wpqn7w) or talk to us: https://lnkd.in/gE9aSVV7
4 Comments

Like Comment Share

Browse jobs

Funding

LlamaIndex 4 total rounds

Last Round

Series unknown Jun 1, 2025

Investors

KPMG ventures Databricks Ventures

See more info on crunchbase

LlamaIndex

Technology, Information and Internet

San Francisco, California 277,878 followers

AI agents for document OCR + workflows

About us

Locations

Employees at LlamaIndex

Jerry Chen

Donald Tucker

Dave Zilberman

Pierre-Loic Doulcet

Updates

Join now to see what you are missing

Similar pages

LangChain

Hugging Face

Perplexity

Ollama

CrewAI

Anthropic

Mistral AI

DeepLearning.AI

Qdrant

OpenAI

Browse jobs

Engineer jobs

Scientist jobs

Machine Learning Engineer jobs

Software Engineer jobs

Developer jobs

Analyst jobs

Senior Software Engineer jobs

Python Developer jobs

Intern jobs

Full Stack Engineer jobs

Solutions Engineer jobs

Associate jobs

Specialist jobs

Director jobs

Product Manager jobs

Frontend Developer jobs

Manager jobs

Researcher jobs

Junior Developer jobs

Data Engineer jobs

Funding