[go: up one dir, main page]

|
|
Log in / Subscribe / Register

PDF to HTML

PDF to HTML

Posted Dec 18, 2014 16:19 UTC (Thu) by anselm (subscriber, #2796)
In reply to: PDF to HTML by brouhaha
Parent article: Plug-and-play sanitization of USB thumb drives

It should be possible, and not even exceptionally difficult, to strip Javascript out of PDF files, without having to convert them into a non-PDF format. It also shouldn't be too difficult to validate that the PDF and embedded images, fonts, etc. are well-formed, and strip out any malformed constructs that could break PDF viewers that aren't coded sufficiently defensively.

That sounds like a job for Ghostscript.


to post comments

PDF to HTML

Posted Dec 22, 2014 7:37 UTC (Mon) by kleptog (subscriber, #1183) [Link]

Indeed. I've used ghostscript to postprocess PDFs generated from other sources to do things like resample images and subset embedded fonts, mostly to make the resulting PDF smaller and load faster. But as a side-effect it'll toss out any scripts and other crap lurking in there.

This actually rerenders the PDF to PDF, which seems safer then trying to strip stuff out.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds