Cyberleagle: Another round of data retention

[Updated 4 December 2014]
[Further updated 20 January 2015 to add tweet.]
[Also updated 5 January 2015 with this brief commentary on the Home Office Factsheet:

Page 1: Top Lines

"IP resolution is the ability to identify who in the real world was using an Internet IP address at a given point in time." Data retention at best identifies the device or connection being used and any associated subscriber details. The subscriber is not necessarily the user. Page 2 of the Factsheet is accurate: "This data can help identify who has made a communication, when, where and how." (emphasis added)

Page 1: Background

"However, some IP addresses are shared and allocated dynamically." True, but dynamic allocation is not what Clause 17 is about. Dynamic IP address allocation is sequential temporary allocation of a public IP address to one customer after another. Dynamic IP addresses are already explicitly mentioned in the DRIPA datatypes (Data Retention Regulations 2014, Schedule, Paras 13(1)(b) and 11(3)). It is evident from the diagram on page 3 of the Factsheet that the problem being addressed by Clause 17 is simultaneous sharing of a single public IP address by multiple ISP customers.

Page 3 : Diagram

"At 4pm 2,500 people are using a single IP address on the internet." Exactly. The issue is simultaneous sharing of a single IP address, not dynamic (sequential) allocation of an IP address.

"The e-mail service provider now provides police with IP address and port number used to send the e-mail and accurate time." In order to do this the e-mail service provider in the diagram example will have had to retain IP address, port number and timing data. Will such providers, as well as internet access providers, be subject to mandatory retention?

"Police seek details from internet access provider. Internet access provider now identifies the individual using the unique combination of IP address and port number provided at 4pm." The internet access provider identifies the customer, who may be but is not necessarily the individual who used the device in question.]

Four months after DRIPA and 18 months after putting down a marker in the May 2013 Queen’s Speech, the UK government has embarked on a new round of legislation for mandatory retention of communications data. This time it is under the banner of IP address matching.

The Counter-Terrorism and Security Bill had its Second Reading yesterday and is expected to go into Committee on 9 December. Clause 17 will extend DRIPA to new categories of communications data.

DRIPA’s existing data retention obligations, rushed through Parliament in four days in July, are of course controversial. They are the subject of a threatened legal challenge by David Davis MP and Tom Watson MP. The proposal to add IP address matching dates back to a recommendation of the Joint Committee on the draft CommunicationsData Bill in December 2012.

What new categories of communications data would have to be retained?

Clause 17, like so much UK legislation in this field, is difficult to understand. The Explanatory Notes and the Impact Assessments are more detailed, but still confusing. (The Home Office has subsequently issued a Factsheet.) MPs suggested in the Second Reading that the drafting of Clause 17 needs to be examined critically. They are right.

The overall aim seems to be to mandate retention of data that can link a given communication made via a simultaneously shared public IP address to one of many devices or connections that may have been using that IP address at a given time. Clause 17 labels this “relevant internet data”. We might call it linking data.

This appears to break down something along the following lines (the first two of these are illustrated in the useful diagram in the Home Office Factsheet).

Some ISP and mobile operator systems don’t allocate one public IP address to one customer device or connection, but have many customers sharing an IP address simultaneously. They could be required to retain linking data such as port numbers.
Even if an ISP retains IP address and (say) port number records, it cannot be sure of identifying a single device or connection unless law enforcement can provide it with a both a port number and an IP address to look up. So a cloud storage or web e-mail provider accessed by the user could also be required to retain logs of linking data visible to it, such as port numbers.
Operators such as public Wi-Fi hotspots could be required to log MAC addresses.

Weblog data (records of websites accessed by customers) would be excluded from mandatory retention by internet access providers such as ISPs and mobile operators.

The Overarching Impact Assessment provides this summary:

“IP Resolution: Allow for a power to require communications service providers to retain the data necessary to attribute an IP address to an individual.”

Taken literally, that is a power to require the impossible. We don’t have IP addresses tattooed on our foreheads. Even if we did that would not identify us, as opposed to someone else, as the user of the device at any given time. An IP address at best identifies a device or a connection. The ISP may then be able to link that with the identity of its subscriber customer, but no more. The subscriber may or may or not be the user. The Factsheet diagram, unfortunately, perpetuates the myth that an IP address identifies a user.

DRIPA in fact already covers retention of subscriber data for IP addresses (both where the IP address is static and where it is dynamically allocated in sequence to different customer devices and connections). What it doesn’t cover is the single public IP address simultaneously shared among many of an ISP’s customers.

The Bill is meant to be only about IP address matching. So it is not immediately obvious why the Impact Assessments say that the Bill will expand DRIPA to cover a wider range of internet services. On the other hand Clause 17 does not seem to do this, since it only amends the categories of data to be retained. DRIPA has already adopted an extremely broad underlying definition of telecommunication services.

The new obligations would be subject to the same 31 December 2016 sunset clause as DRIPA. As with DRIPA itself, mandatory retention will apply only to data generated or processed in the UK by public providers in the process of providing the telecommunications services concerned; and then only to those on whom the government serves a notice. The Impact Assessment says that the service providers most likely to be affected by the Bill have been consulted.

That is my current stab at what Clause 17 is trying to do. However it is a puzzling piece of drafting. Here are some questions worth considering.

What is ‘relevant internet data’?

Clause 17(3)(b) defines this as communications data relating to an internet access service or an internet communications service which:

“may be used to identify, or assist in identifying, which internet protocol address, or other identifier, belongs to the sender or recipient of a communication (whether or not a person)”.

This is the most curious part of Clause 17. The problem is surely not identifying which IP address ‘belongs’ to a given sender or recipient of the communication, but identifying which device or connection (of many) was used to make a given communication via a given shared public IP address. Is it drafted the wrong way round?

What is an ‘identifier’?

The Clause says that “identifier” means “an identifier used to facilitate the transmission of a communication”. More helpfully, Clause 17(3)(b) tells us that an IP address is an identifier. The Explanatory Notes seem to conflate linking data and the shared identifier that we are trying to tie to a device or connection:

“… An IP address can often be shared by hundreds of people at once – in order to resolve an IP address to an individual other data ("other identifier" in this clause) would be required.”

Whatever the ‘other data’ may be, surely it is not the ‘other identifier’ in Clause 17(3)(b)?

What else might be covered by ‘identifier’? A MAC address, although it operates at a lower (physical) layer than an IP address, would seem to qualify. But Clause 17 is not avowedly about retention of new categories of identifiers, only retention of data capable of linking shared identifiers (such as IP addresses) to an individual device or connection. If a MAC address is itself an identifier, does that prevent it being linking data? The Explanatory Notes suggest that a MAC address could also be linking data:

“Data necessary for the resolution of IP addresses could include port numbers or MAC (media access control) addresses.”

Are there circumstances in which a MAC address could be used to identify the particular device that sent a communication via a shared IP address? Public Wi-Fi hotspots seem a likely candidate. However a MAC address would presumably be less useful than a port number, assuming that the MAC address is not visible from outside the hotspot and so could not be logged at the other end of the communication.

What are an internet access service and an internet communications service?

These are the foundation stones of Clause 17. Communications data cannot be required to be retained unless it relates to an internet access service or an internet communications service. These terms are also critical to the scope of the weblog data exclusion. Many will be surprised, therefore, to find that neither term is defined.

What do the terms mean? The glib answer is ‘whatever they meant in the EU Data Retention Directive’. That is their origin. They were used (but not defined) in the Directive.

The 2009 Data Retention Regulations, which implemented the Directive, followed its terminology. When the Directive was invalidated DRIPA re-enacted the datatypes that were in the Schedule to the 2009 Regulations. So the 2014Data Retention Regulations that were made under DRIPA again used the two terms, notably in the definition of ‘User ID’: “a unique identifier allocated to persons when they subscribe to, or register with, an internet access service or internet communications service.” Perhaps unsurprisingly given the government’s commitment to re-enact the 2009 datatypes identically, the 2014 Regulations again left the terms undefined.

That is a plausible historical reason why the terms have been left undefined in Clause 17. But even though there is a breadcrumb trail back to the Directive, the lack of definitions in the Directive means that uncertainty remains particularly over ‘internet communications service’. Does it relate to any type of communication, or is it more limited, for instance to e-mail, messaging or telephony providers? The diagram in the Factsheet uses the example of an e-mail provider. However the Impact Assessment suggests that the government believes it has a broad meaning, covering for instance cloud storage services:

“For example w[h]ere a user uploads an illicit file to a cloud server that server provider, if subject to a data retention notice, would be required to retain sufficient information to enable the internet access provider to identify the user.”

We look forward to illumination of these and no doubt other points as the Bill proceeds. Meanwhile, the bigger question of whether any of this is compatible with the European Convention on Human Rights and the EU Charter of Fundamental Rights remains to be fought out.

[My 8 point tweet of points on Clause 17:

1/8 Is it about dynamic (sequential) IP address allocation? No. Already covered in DRIPA and so excluded from Cl 17.
— Graham Smith (@cyberleagle) January 20, 2015

2/8 The Home Office Factsheet suggests Cl 17 is about simultaneous use of one public IP address by many customers.
— Graham Smith (@cyberleagle) January 20, 2015

3/8 But you'd never guess that from reading Cl 17. What else might it cover? Its vague drafting gives little clue.
— Graham Smith (@cyberleagle) January 20, 2015

4/8 The Fact Sheet shows it is meant to cover not just internet access, but cloud/web e-mail providers who generate or process data in UK.
— Graham Smith (@cyberleagle) January 20, 2015

5/8 Cl 17 isn't limited to data linking a device or connection to a public IP address. Includes 'other identifiers' as well as IP addresses.
— Graham Smith (@cyberleagle) January 20, 2015

6/8 What is an 'other identifier'? A MAC address, said the Minister on 9 Dec. The EN seems to suggest a MAC address is linking data. Both?
— Graham Smith (@cyberleagle) January 20, 2015

7/8 'Other identifier' is said to 'future proof' Cl 17 by making it 'technologically neutral'. In a provision sunsetted in Dec 2016?
— Graham Smith (@cyberleagle) January 20, 2015

8/8 RIPA was drafted to be technologically neutral. The result was a statute universally acknowledged to be impenetrable. #BeenHereBefore
— Graham Smith (@cyberleagle) January 20, 2015

[Updated 4 December 2014 with references to the Home Office Factsheet and minor clarifications and edits. Further update 5 January 2015 with comments on the Home Office Factsheet. Further updated 20 January 2015 to add tweet.]

6 comments:

Tim M3 December 2014 at 12:39
Where a home or a business has all their devices behind an NAT router (ie just about always) then recording the MAC address will only be the MAC address of the the router, not the source device, and similarly the source port will only be the source port number on the router, not on the "true source" device.

I think what the clause may be aimed at is the situation where the ISP themselves is putting customer connections behind NAT... I believe BT were looking at doing this. In such a case, while your router thinks it's has an Internet (as opposed to private) address, in fact it's external address is just an address on a larger private network, and the true "internet address" is that of the ISP's NAT device, and is shared with multiple other connections. In this case, the clause is saying "not only must you maintain the 'public network address' of a connection, but where you yourselves are doign NAT on customers, you must, for each connection, also retain the information about the NAT state tables at that time so that you can tell us which of you subscriber connections was involved".

Doesn't sound so easy to retain (as NAT state tables are very much more transient than DHCP address leases etc), but it would avoid the query of "we're after this one IP address" being answered with "oh, that's any one of these 1000 customers behind that particular NAT device".
Tim M4 December 2014 at 00:50
Worth considering if this is trying to start a "death by a thousand slices" attack on first the carriers, then ISPs, then providers (hotels etc) then individual accounts, with the argument "everybody else above you does this so why don't you".

Stupid of course (hard to ban TOR type services under TCP/IP model, but then it was designed that way) but that doesn't stop a world of FUD & pain etc on the way...
Unknown5 December 2014 at 09:02
In case it is of use to anyone reading your fascinating analysis, Graham, I have prepared a consolidated version of DRIPA earlier in the week, incorporating the amendments which the current text of the Counter-Terrorism and Security Bill would, if passed in its current form, make.

It is available ot anyone who might want it, here: http://neilzone.co.uk/consolidated_DRIPA_as_modified_by_CTS_bill_as_at_20141203.pdf

Best wishes

Neil