Thursday, 24 March 2016

All about the metadata

If it is true that granularity of language reflects the importance of the subject matter then metadata, not content, is at the heart of the Investigatory Powers Bill.

For content the Bill provides a few definitions: Content, Relevant Content, Intercepted Content and Protected Material. 

For metadata we have a richer set: Communications Data, Relevant Communications Data, Internet Connection Records, Entity Data, Events Data, Systems Data, Related Systems Data, Equipment Data, Secondary Data and Identifying Data.

The emphasis on metadata is perhaps unsurprising, since the Intelligence and Security Committee told us in its March 2015 report that metadata is indeed more valuable than content to the intelligence agencies in their mission to join up the dots and spot potential malefactors:







The plethora of definitions (not to mention the proliferation of cross-linked sub-definitions) does not make for easy understanding. 

In an attempt to untangle the spaghetti heap I have been experimenting with flowchart visualisations of the more significant and complex data definitions. More of that anon. 

The table below shows where the major varieties of telecommunications data fit in the scheme of the Bill. For simplicity it focuses mainly on bulk powers and also omits definitions of overseas-related communications, overseas-related equipment data and overseas-related information in the bulk equipment interference part of the Bill.  

In general terms the types of metadata obtainable under the bulk interception and interference warrants are broader than those under the powers and bulk warrant for acquisition of communications data.

Power
Subject matter
Communications data retention notice (78(1))
Relevant Communications Data (78)(9)
  • Communications Data (223(5))
Communications data acquisition - authorisation and notice (53)
Communications Data (223(5))

  • Entity Data (223(3))
  • Events Data (223(4))
Restrictions on use of S.53  power to access or process internet connection records (54(4))
Internet Connection Records (54(6))

  • Communications Data (223(5))
Bulk communications data acquisition warrant (138)
Communications Data (223(5))

  • Entity Data (223(3))
  • Events Data (223(4))
Bulk interception warrant (119)
Communications (223(2))

Content (223(6))
Intercepted Content (137(1))
Relevant Content (134(5))

Secondary Data (120(3))

  • Systems Data (225(4))
  • Identifying Data (225(2) and (3))

Related Systems Data (119(6))

  • Systems Data (225(4))
Bulk equipment interference warrant (154)
Communications (223(2))

Protected Material (170(9))

  • [not] Equipment Data (155(5))
  • Private Information (173(1))

Equipment Data (155(5))

  • Systems Data (225(4))
  • Identifying Data (225(2) and (3))
Information
Warrant for retention or examination of bulk personal datasets (175)
Bulk Personal Dataset (174)

It can be seen that around half a dozen different kinds of power or authority provide routes for the compulsory retention and acquisition of various kinds of metadata. They all have in common that the Bill’s restrictions on selecting and accessing bulk content (an individual located within the British Islands at the time of selection cannot normally be targeted without a further warrant) do not apply.

This is a diagram of the overall metadata ingestion scheme of the Bill.



















Turning to the definitions, the Clause 78 power to direct retention of communications data rests on the definition of Relevant Communications Data. Internet Connection Records are a subset of Relevant Communications Data to which Clause 54 applies some access restrictions (although fewer in the Bill than the draft Bill). 

















Relevant Communications Data in turn depends on the dividing line between Content and Communications Data. The definition of content interfaces separately with Systems Data. The draft Codes of Practice released with the Bill suggest that it is possible for communications to consist entirely of Systems Data and so contain no content.


















What the definition of content lacks in companions it makes up for in conceptual difficulty.  The Parliamentary Joint Committee scrutinising the draft Bill remarked:









Communications Data consists of either Entity Data or Events Data, to which different levels of authorisation apply under the targeted communications data access regime in Part 3 of the Bill. This is the equivalent of the current RIPA communications data access regime under which over 500,000 access demands are made on communications service providers annually.
















Turning to bulk powers, the bulk communications data acquisition warrant authorises the obtaining of Communications Data. A bulk interception warrant authorises the interception of Secondary Data in addition to content. Secondary Data is the Bill’s version of what under RIPA is known as Related Communications Data. Secondary Data consists of either Systems Data (as before) or Identifying Data. Unlike with RIPA, the Bill will allow metadata contained within the content of a communication to be scraped and be no longer treated as content. 

















Similarly a bulk equipment interference warrant authorises the obtaining of Equipment Data, a close cousin of Secondary Data.



















Last, a bulk interception warrant also authorises the obtaining of Related Systems Data from telecommunications operators. 
















That's all about the metadata.

The chief remaining omission from the visualisations is Protected Material in S.170(9). This is the bulk equipment warrant equivalent of Content. As such it defines the material for which a targeted examination warrant is necessary if it is to be selected for examination by reference to an individual known to be located in the British Islands. 

The definition contains a triple negative that presents a considerable challenge to parse and represent graphically. Instead, here is the unadorned raw text to ponder:
“protected material” means any material obtained under the warrant other than material which is -

(a) equipment data;
(b) information (other than a communication or equipment data) which is not private information.”
Relevant Content crops up in relation to targeted examination warrants in Part 1. It means 'any content of communications intercepted by an interception authorised or required by a bulk interception warrant'. 

Intercepted Content, in relation to a bulk interception warrant in Part 6, is defined almost identically: 'any content of communications intercepted by an interception authorised or required by the warrant'.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.