AMITIAE - Tuesday 15 December 2015

Cassandra: Metadata - Saviour and Scourge

apple and chopsticks


By Graham K. Rogers


A couple of times a year I am assigned to teach a course to computer engineers on Ethics & Morals. This came about intially because of my previous career as a policeman; and because of the writing I do on computer-related subjects. One of the topics I like to include is the use of metadata, because of my interest in photography. In the last couple of years, however, especially since the revelations of Edward Snowden, this has taken on a new meaning.

As part of the assessment process necessary in almost any course that is taught, I assign the students to summarise what they have learned and to comment both on the specifics as well as on the value to them of the course as a whole. Students tend to recognise the potential that metadata has, and most recycle the comment of General Hayden, which I show in a video clip, "We kill people using metadata", although they omit the following comment, "but that's not we do with this metadata." I am never sure if I am reassured by that parenthetical addition.

As I was reading through the student submissions this term, a number of questions came to mind about the purpose and the ethical uses. I also considered ways in which metadata should be used.

I use metadata a lot when taking pictures. It can remind me of the lens I used; the aperture and time for the shot; of the location; the pixel count; the file type and size; and many other ways in which I can examine the technical attributes of a digital photograph long after it has been taken. I also use metadata that a file contains as a way to refine a search.

Aperture metadata Aperture metadata

Some of the EXIF metadata for one photograph

I work only on Macs (and iOS) so am not able to comment on Windows file types, although I am aware, for example, that the components of Office do contain (and retain) file information. It is the retention that has sometimes caused the revelation of illicit activities, such as when an academic changed the grade of his son: the date and previous grade were still accessible when forensic tools were applied. It is clear that any file created on a computer contains a great deal of metadata. Some of this may be seen by using the Search facility of OS X.

The normal Spotlight search provides a list of files that might be suitable depending on the user's search parameters: usually keywords. Deeper searching may also be needed if the basic search does not produce a proper result. On a Mac this is done using a Finder panel and the Search box (or Command + F). This immediately causes a problem as a user may not remember the file name; and Contents may not be valid if the file needed does not contain text.

metadata -Finder

Pressing a + icon reveals additional search parameters, with Kind as the default. A second button gives a list of types. For Kind this is Any, Application, Archive, Document, Executable, Folder, Image, Music, Movie PDF, Presentation, Text, Other. Alternatives to Kind are, Last Opened Date, Last Modified Date, Created Date, Name, Contents; and Other. Selecting any of the options reveals a number of ways in which data can be entered to refine the search. Each of these options and variations depends on the metadata available.

The Other option has a wider selection of options, some of which may only apply to specific file types. When Other is selected, an A-Z list of attributes is opened, from which the user may select one (or more). To illustrate the strength of such search methods and the variety of options, I show the students the item concerning (photographic) Flash: "Whether the picture was taken using a flash".

metadata -Finder

Other such idiosyncratic search parameters include Altitude (the Altitude of the item in meters above sea level, expressed using the WGS84 datum. Negative values lie below sea level), Tempo (Tempo of the music in the audio file in Beats Per Minute), or Where From (where the item came from). If I use "apple" as part of the data limiter with that last option, some 15 items are found on my hard disk: images, icons and 2 zip files from Apple.

The range of metadata contained in some files is therefore considerable, so what is the purpose? Metadata is a type of information that contains "an underlying definition or description" ( This may not, however, illustrate the concept sufficiently. Reading the page lower down, the suggestion is (as per above) to assist in "finding and working with particular instances of data". This is aimed at specific users and the management of data on their computers. However, the page also adds the important idea of metadata within webpages and the obvious function of search, although with the more sophisticated algorithms used nowadays, web metadata may be less significant than (say) 10 years ago.

The purpose of metadata is therefore to assist users: either by management of files within a user's computer; or by external organisation, such as by way of search. However, the types of files available and the range of metadata that can be accessed have given rise to a number of other uses, which may be ethical, illegal or (at best) questionable. My examples above, of computer users searching for files within their own computers, are exactly what metadata is for; the external (Web) facilities are also to make it easier for users to work with files, albeit web pages are online.

metadata Unfortunately, we are not in a perfect world, and metadata may become part of evidence in criminal trials. While photographic evidence was originally presented with the declamation that the experts had examined the untouched negatives which were in their possession, the advent of digital images has changed this. It was acceptable for an expert to point out if a photograph had been altered from the original, so there would seem to be little difference with the digital counterparts, especially when any alteration may be evidence of criminal intent.

The same would apply to digital versions of typed documents. Before the computer, comparisons might be made of hand writing, or of the specific key impressions on a page to prove which specific typewriter had been used (attributes that the Romanian Department of State Security under Nicolas Ceaucescu found invaluable). While some printers do produce different output, this is usually hard to detect: far more reliable would be the use of metadata, showing which computer or operating system had been used, the times of creation or alteration and other attributes of a specific file in question.

Where the unethical become ethical is when metadata is used or altered to conceal such potential criminal activity that the expert witness would otherwise uncover. Of late, however, there has also been the valuable use of DNA in investigations. This most personal of metadata can be used both to prove a person's involvement in a crime and to disprove that a suspect was connected. While police have been keen to add this to the crime-detection armory, some have been less willing to allow its use to prove the innocence of someone already convicted.

There are clear-cut reasons for the use of DNA. Like fingerprints, no two examples are alike. Even examples from identical twins which were previously thought-of as undifferentiated can now be analysed to reveal subtle epigenetic changes.

Collecting and using metadata is now recognised as a way to glean information about those conducting illegal acts from abroad. Suitable targets may include hackers attacking the system or those planning terrorist acts. Once a target is inside the country, the balance shifts, particularly for UK and US security authorities. For criminal acts, there is the option of the warrant: issued by a court, this is proper judicial support for law enforcement.

The grey areas appear when governments (by way of their proxy agencies) not only collect data on their own citizens but store it as a form of insurance: in case a crime comes to light in the future; or one of those persons whose metadata is collected, turns out to be some form of criminal. It is clear that more information than is needed has been collected by legal, but questionable means (e.g. when internet packets are conveniently routed outside the country, even for milliseconds).

It has long been suspected that the security agencies share information on their respective citizens to circumvent the long-held notion that "we do not spy on our own." At the reading of the long-expected Investigatory Powers Bill in Parliament it was revealed by the UK Home Secretary, Teresa May, that she and her predecessors had approved the bulk collection of communication data in the UK since 2001 by MI5 and GCHQ. This was apparently not illegal, but the laws that applied had holes so large a coach and horses could be driven through them. It is with this in mind, that the new law will apparently provide oversight. Some are less than reassured by this.

Authorities insist that the collection of data is for protection; but there is little evidence that this works. There are frequent claims that 20, 30, 40 attacks have been uncovered (hence we have been protected), with no evidence (for security reasons) that this is actually the case. Instead, there are numerous attacks that are successful, followed each time by government insistence that more powers are needed to protect us, when clearly this is not being done.

The dichotomy is captured succinctly by Amulya Gopalakrishnan of the Economic Times of India:

Intelligence agencies create this false confrontation between national security and personal security, says tech analyst Nikhil Pahoa. He argues that the link between encryption and intelligence failure is a smokescreen; the real problem, as witnessed in the Paris attacks, is that security agencies are unable to sift the signal from the noise despite the data they do have.

One of the questions I listed to guide me when I began writing this commentary was "Is metadata dangerous?". This is the central problem. My students, writing their own comments on the course, all remembered certain quotes I had used, such as "All governments lie" and "We kill people based on metadata" from former head of the National Security Agency General Michael Hayden. Not many of the students, however included his rider, "but that's not we do with this metadata".

He perhaps meant it to be a witty rejoinder to David Cole's summary of the NSA's operational capability and the risks that metadata posed, but the effect was chilling. The 4-minute clip is available on YouTube

Whether or not General Hayden was referring to location data (the metadata that might be available when a phone links to a specific carrier tower) or building a picture of suspects and their contacts, or more, was not made clear. Nonetheless, the overall effect perhaps did more damage to the security case than was imagined. The idea that metadata is collected on thousands of persons, with no criminal background, no intent, and no contact with terrorist organisations is repellent; but the authorities sell the concepts in such a way that it seems unreasonable (for some) to object.

metadata - telecomms
Architecture of Radio - Mapping Signals that Surround us

Yet all the retention of data has made little difference to terrorist attacks. They happen all too often, despite the insistence of the authorities that the data is needed; and their claims that attacks had been prevented.

The differences occur after the attacks, when evidence is collected, data is retrieved and the dots are joined using some of that stored data: a time when data could be retrieved by legal means such as warrants. Law enforcement agencies would claim that resorting to such legal niceties would slow them down; but speed is no longer of the essence as the attacks have already taken place. A thorough investigation, with a methodical approach would be preferable.

There are specific ways in which metadata can and should be used: tracking known criminals (including terrorists and their sympathisers); adding to a picture of the criminal after the action has been committed and thereby strengthening the evidence picture that is to be presented to the courts, or used to bring others to book; as clear evidence in cases, either to prove or disprove (such as the time on a photograph - as per above).

Where the retention of metadata is undesirable is when it is used as a source for a fishing expedition months or years after its initial collection. That leads to the surveillance state with all the implications of a society that is constantly looking over its collective shoulder. That fear erodes all ideas of a free society and in itself is as unsafe as any attack.

Graham K. Rogers teaches at the Faculty of Engineering, Mahidol University in Thailand. He wrote in the Bangkok Post, Database supplement on IT subjects. For the last seven years of Database he wrote a column on Apple and Macs. He is now continuing that in the Bangkok Post supplement, Life.



Made on Mac

For further information, e-mail to

Back to eXtensions
Back to Home Page

All content copyright © G. K. Rogers 2015