Using Tails? Be careful with embedded metadata!

Embedded metadata can reveal who edited a document. We tested the Metadata Anonymization Tookit (MAT – included in Tails, a Linux version for anonymous communication) to see how such metadata can be removed. We found and reported a security issue. The MAT developers intend to get this fixed now.

[→ deutsche Version]

Tails is a special-purpose Linux distribution. It allows people to communicate anonymously and securely. That is why it is used by journalists and activists, for instance Edward Snowden. When using Tails, you should know what you are doing. Making mistakes can result in insecure communication, so people need to be careful when using it. Our tests with metadata underlined this.

Tails is safe – with caveats

Let’s get this straight: Tails is still safe, as far as we can tell. However, those who rely on data protection, anonymity and security have to take special care to use it correctly, to avoid being lulled into a false sense of security. During the last AKtiVCongrEZ (an annual meeting of German activists organised by Digitalcourage) in February 2016, we tested the Metadata Anonymisation Toolkit (MAT) that comes with Tails but can also be installed on other operating systems. MAT removes metadata from images, audio files and office documents. Unfortunately, we found a security problem in MAT that will hopefully get fixed soon.

Why is metadata a problem?

Many applications and devices add extra information when you create or edit images, audio files, spreadsheets or formatted text documents. This metadata can include author, creation date, file system location, software used, license or geodata (physical location). This can be helpful, but it can also be used to trace the history and usage of a file to find out who used it when and where. Whistleblowers who want to hand on confidential documents, journalists who want to protect their sources or people who publish photographs on the Internet but do not want to reveal the exact location – they all should remove embedded metadata. Tails includes the MAT application for that purpose.

MAT’s approach: all or nothing

MAT’s advantage is that is covers all your usual file formats whereas other tools are more specialised (exiv2, jhead etc.). MAT follows an “all or nothing” approach to metadata: it tries to remove everything, even innocuous metadata (such as aperture and exposure time for photographs). Also, by default it overwrites the original files with its output – no questions asked, no backup created.

MAT under scrutiny

We took a closer look at the Metadata Anonymisation Toolkit and found a security problem. It turned out that MAT does not remove all metadata under all circumstances. Fortunately, MAT can show you what it missed, but it will do so only if you actively re-check the document after treatment. Our advice is to do this check after each use of MAT. Until an improved version of MAT becomes available, we recommend the following workaround: If possible, use MAT to remove metadata from each image or other file before embedding it into a container format such as PDF. If you only have a PDF file (or other container) to start with, things get more difficult: maybe you can extract the parts you need, using the tools pdfimages and pdftext for example. Then clean and recombine the parts into a new PDF using LibreOffice or Scribus, or simply collect the cleaned parts in a Tar archive. Take care to check that any files you ultimately send are clean – as the last step before sending.

Bug report submitted, fix will be developed

We reported the issue to the MAT developers and entered it as a feature request in MAT’s bug tracker. We were able to recommend a software library to the main developer and thus convince him to tackle the problem. He marked the issue to be resolved for the next major release, 0.7. We are happy to see that the developers acted on our report by publishing a clear warning on the MAT website.

Our test setting

This is how we tested: We created a PDF document that contained an image. As usual, the PDF document as well as the image file contained metadata. We tried to remove these using MAT. While MAT did remove PDF-specific metadata, the metadata in the embedded image remained intact. When we used MAT to re-examine the “cleaned” file, it detected the remaining metadata.

How is this possible?
PDF is a container format that can embed other files. For viewing metadata, MAT uses a third-party software called exiftool which also looks at embedded images and documents. For removing metadata, MAT tries to do the job on its own – and the programmer obviously forgot or failed to consider cleaning embedded images and documents in addition to the container itself.

Further information

Update: On 1 September 2018, MAT2 was released. This new version changes a lot of things. PDF cleaning was re-enabled and seems to work reliably now (i.e. metadata in embedded media files are removed, too). New website: MAT2 requires Python 3.5 or newer. An installation howto is available for several Linux flavours.