Plug-and-play sanitization of USB thumb drives
Malware is a nasty problem for all computer users, and there are countermeasures available (such as scanning email attachments) to help neutralize malware threats in many common tasks. But there are certain vocations that regularly require people to do risky things like accept a USB flash drive from a veritable stranger. Reporters exchanging information with alleged NSA whistleblowers in dark alleyways is the most dramatic example, but hardly the only one—consider, for example, how many flash drives of unknown provenance are exchanged or handed out at software conferences over the course of the year.
The safe approach to reading the contents of an untrusted flash drive is to only open it on a read-only, live-CD system (not connected to the Internet) and to scan and sanitize the files it contains before opening any of them. But doing this correctly can be arduous while on the road and tricky for those who are not technically inclined. This is where the CIRCLean project comes in: CIRCLean is a minimalist Debian system that turns the Raspberry Pi into an automated USB drive sanitization box.
CIRCLean is a project hosted by the Computer Incident Response Center Luxembourg (CIRCL), which is Luxembourg's national Computer Emergency Response Team (CERT). The goal is to provide a simple method for users to extract the important contents from an untrusted USB flash drive, filtering out any viruses, spyware, and other potentially hazardous hidden payloads—while not endangering the main system, such as the user's laptop.
CIRCLean accomplishes this by turning a Raspberry Pi into single-purpose, black-box tool. The user plugs the untrusted "source" USB drive into one USB port on the Pi, plugs their own "target" USB drive into another port, and only then plugs the Pi into a power source. The Pi boots up (mounting the root filesystem read-only), processes the contents of the source drive, saves sanitized versions of the files onto the target drive, then shuts down. At present, the main threats targeted by the tool are malicious macros embedded in office documents (which are naturally more of a concern for Windows and Mac users) and PDFs with hidden executables (which, at least in theory, can contain JavaScript or even arbitrary executable PostScript code).
No monitor or input devices are required: CIRCLean can provide either of two possible feedback mechanisms to let the user know that the sanitization is complete. The first is through an LED attached to the GPIO headers on the Pi: when the LED is blinking, the process is still working; when the LED switches off, the sanitization is complete and the machine has shut down. Alternatively, the system can play MIDI music over the Pi's audio-out port; again, when the music stops, the process is complete.
The system is built on top of Raspbian (the Raspberry Pi distribution based on Debian), and even includes a subtle security measure intended to evade detection. At boot time, if the OS detects that the only USB devices attached are two mass-storage devices, it launches the CIRCLean sanitization process. If any other combination of USB devices is attached, it boots into the standard Raspbian desktop.
How it works
The genesis of the idea evidently came from security consultant Maya Bonkowski, who spoke to the Raspberry Pi blog in August. Bonkowski said that the Pi was chosen as the hardware platform because of its portability and price. Traveling with a second laptop might be the obvious solution for some journalists or activists that need to sanitize strange USB sticks, she said, but a second laptop can attract suspicion (as well as being bulky). Notably, the Pi also comes without built-in wireless connectivity, which makes it easier to use without worrying about the network.
Bonkowski wrote the first version of the code in 2012 (calling it KittenGroomer), after which Raphaël Vinot took over as lead developer and moved the project to CIRCL. Vinot's code is available on GitHub (still under the name KittenGroomer) and is still the main development branch. CIRCL's official stable branch has been rebranded as CIRCLean. The last update was in October 2014, which added support for NTFS drives and included a handful of security fixes (two Bash vulnerabilities were fixed and the user account that processes the files was removed from sudoers).
The software in the repository is a suite of shell scripts designed to be run on a Raspbian system. The scripts create the necessary user account, install the package dependencies, and sets up the required startup scripts. CIRCL also provides pre-built SD card images for those not interested in installing the software manually.
The goal of the sanitation step is to identify risky file formats and strip out any potentially hazardous content like macros or embedded executables. Currently, the code focuses on four specific file types: "office" documents (meaning word processor, spreadsheet, and presentation files), PDFs, auto-run files, and archive files. Auto-run files are risky for the obvious reason: they execute unknown code. The other three file types can encapsulate hidden executable content even while presenting other, seemingly innocuous (or even valuable) content to the user.
CIRCLean uses Poppler to convert PDFs to HTML documents and LibreOffice to convert office files to PDFs, which are then converted to HTML by Poppler. Archive files are uncompressed with 7-Zip, then their contents are processed file-by-file, and the results placed into a new archive file on the target drive. Auto-run files on the source drive are simply ignored; all other document types are copied without conversion. Executables, although not converted, are renamed, with DANGEROUS both prepended and appended to the original file name.
That is a relatively short list of file types to sanitize, but it accounts for the largest threats (particularly in the Windows world). There are also ways for image and multimedia files to contain malware, of course. Bonkowski said on the Raspberry Pi blog that there were already other tools that can convert such media files to safe formats before opening them—but it is nonetheless a curious omission. There are also issues open on GitHub to deal with other file types, such as Java, which in early versions of CIRCLean was not correctly treated as an executable file type (although one might well ask whether it is ever a good idea to run Java code supplied by a stranger).
As a practical matter, it may be more of a problem that CIRCLean's file-conversion step could lose important information if, for example, the LibreOffice conversion is imperfect. With undocumented proprietary file formats—particularly with recent revisions—even LibreOffice occasionally fails to understand some obscure structures.
The known issues include the fact that images are not extracted from PDF files—only text—and that only the first page of a multi-page spreadsheet is properly converted to HTML. On this latter point, however, the project notes that this should be enough to determine whether or not the contents of the file is interesting enough to follow-up on and, if so, that can be done later when additional precautions can be taken.
A similar case could be made for not sanitizing other less-common formats—Photoshop macros, for example. But the biggest omission at this point seems to be the handling of HTML files and email, which can contain active content as well as links to remote content that could be used to track the user. And HTML is widespread enough as a document format to be plausible content on a USB stick (perhaps converted by Microsoft Word).
The correct approach for the user would be to only open HTML documents in an offline browser with JavaScript deactivated; perhaps that is well-known enough these days that a special tool is not required. After all, the Edward Snowden and Wikileaks stories of the past few years have the raised the profile of a number of valuable security tools like Tor and TAILS.
There are still areas where CIRCLean can be improved. For example, there is an issue open to deal with BadUSB-style attacks, in which a thumb drive mimics another device type (such as a keyboard) with malicious intent. Vinot has indicated one possible solution already: blacklisting all non–mass-storage USB kernel modules. Without USB HID support in the kernel, a malicious drive cannot mimic a keyboard. In an email, Vinot described a few other ideas, such as converting PDFs to the more restrictive PDF/A format before converting them to HTML.
CIRCLean serves a
purpose distinct from both of those projects; its ideas may influence
them in interesting ways, but the niche it fills is important, too:
that of a file-sanitization appliance that works quickly and simply.
One report cited on the CIRCLean site notes that up to 66% of USB keys in the wild may contain
malware—so it is hard to be too careful.
| Index entries for this article | |
|---|---|
| Security | Malware |
| Security | Virus scanning |