lopdf
A Rust library for PDF document manipulation.
A useful reference for understanding the PDF file format and the eventual usage of this library is the PDF 1.7 Reference Document. The PDF 2.0 specification is available here.
Example Code
- Create PDF document
use dictionary;
use ;
use ;
// `with_version` specifes the PDF version this document complies with.
let mut doc = with_version;
// Object IDs are used for cross referencing in PDF documents.
// `lopdf` helps keep track of them for us. They are simple integers.
// Calls to `doc.new_object_id` and `doc.add_object` return an object ID.
// "Pages" is the root node of the page tree.
let pages_id = doc.new_object_id;
// Fonts are dictionaries. The "Type", "Subtype" and "BaseFont" tags
// are straight out of the PDF spec.
//
// The dictionary macro is a helper that allows complex
// key-value relationships to be represented in a simpler
// visual manner, similar to a match statement.
// A dictionary is implemented as an IndexMap of Vec<u8>, and Object
let font_id = doc.add_object;
// Font dictionaries need to be added into resource
// dictionaries in order to be used.
// Resource dictionaries can contain more than just fonts,
// but normally just contains fonts.
// Only one resource dictionary is allowed per page tree root.
let resources_id = doc.add_object;
// `Content` is a wrapper struct around an operations struct that contains
// a vector of operations. The operations struct contains a vector of
// that match up with a particular PDF operator and operands.
// Refer to the PDF spec for more details on the operators and operands
// Note, the operators and operands are specified in a reverse order
// from how they actually appear in the PDF file itself.
let content = Content ;
// Streams are a dictionary followed by a (possibly encoded) sequence of bytes.
// What that sequence of bytes represents, depends on the context.
// The stream dictionary is set internally by lopdf and normally doesn't
// need to be manually manipulated. It contains keys such as
// Length, Filter, DecodeParams, etc.
let content_id = doc.add_object;
// Page is a dictionary that represents one page of a PDF file.
// Its required fields are "Type", "Parent" and "Contents".
let page_id = doc.add_object;
// Again, "Pages" is the root of the page tree. The ID was already created
// at the top of the page, since we needed it to assign to the parent element
// of the page dictionary.
//
// These are just the basic requirements for a page tree root object.
// There are also many additional entries that can be added to the dictionary,
// if needed. Some of these can also be defined on the page dictionary itself,
// and not inherited from the page tree root.
let pages = dictionary! ;
// Using `insert()` here, instead of `add_object()` since the ID is already known.
doc.objects.insert;
// Creating document catalog.
// There are many more entries allowed in the catalog dictionary.
let catalog_id = doc.add_object;
// The "Root" key in trailer is set to the ID of the document catalog,
// the remainder of the trailer is set during `doc.save()`.
doc.trailer.set;
doc.compress;
// Store file in current working directory.
// Note: Line is excluded when running tests
if false
- Merge PDF documents
use dictionary;
use BTreeMap;
use ;
use ;
- Modify PDF document
use Document;
// For this example to work a parser feature needs to be enabled
FAQ
-
Why does the library keep everything in memory as high-level objects until finally serializing the entire document?
Normally, a PDF document won't be very large, ranging from tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keeping the whole document in memory, the stream length can be pre-calculated, no need to use a reference object for the Length entry. The resulting PDF file is smaller for distribution and faster for PDF consumers to process.
Producing is a one-time effort, while consuming is many more.