Sitting down to be counted

Now we are all counted. The location of everybody in the country on Sunday is plotted with absolute precision. Public services over the coming decade can be fine tuned to take account of who is where.

I am in favour of that. I have no problem with the general principle of a census. But the census operates through the coercive power of the state, and that is a power to be used lightly or minimally. So the key question for me is whether the census is fully sufficient, but no more than necessary.

There are two directly relevant questions here:  what data is kept, and what data is disclosed.

What data is kept is fairly straightforward: all of it. That is, after all, the point of the exercise.

What data is disclosed is also apparently straightforward: none of it. Confidentiality is widely recognised as a central part of the bargain in requiring people to give up personally sensitive information.

But none of it isn’t actually the answer at all. Another answer, just as valid, is all of it – just not for a hundred years. And another answer again is much less clear cut. Douwe Korff has argued at length and in detail [pdf] that the apparent legal safeguard of confidentiality is much less robust that it first appears. However far his analysis is right in the short term, over the long term, confidentiality clearly cannot be relied on as an absolute.

But there is a prior question of where there is a reason for storing all the data which has been collected in the first place. It needs to be personally identifiable in the short term to ensure compliance – that is the essence of the difference between a census and a survey. But none of the medium term subsequent value comes from the fact that the data is personally identifiable, indeed ONS goes to some lengths to ensure that it isn’t – so there is no prospect of an exquisitely personalised bundle of public services being delivered to me as a result of completing the census.

Some data elements may benefit from more localised analysis than others. I don’t imagine that the purpose of the religion question is to help the Church of England plan its parish network, but I do imagine that asking for a precise work address is to allow fairly fine grained travel to work analysis. But even the more detailed requirements are not at the level of a house or even a street. There is a need to confirm that the household at my address provided answers, but no need that I can immediately see ever subsequently to link the specific answers back to the specific address.

Of course, even anonymised data isn’t necessarily very anonymised. Reverse engineering anonymity turns out to be unexpectedly straightforward [pdf], but that’s hardly sufficient reason to keep superfluous personal data.

More significantly, perhaps, from one perspective that data is not  superfluous at all: the perspective of the next century. It is not for nothing that the first spectacular failure of a UK online government service was when the publication of the 1901 service collapsed under the weight of massive demand.

But this is another way in which the world has changed. Keeping census data secret for a hundred years and then publishing it used to make some kind of sense, but does so much less now, for at least two (slightly contradictory) reasons.

The first is that we won’t all be dead by 2111. I am pretty confident that I will be, but it’s a safe bet that there are many thousands of people alive now who will still be alive then. Perhaps they won’t care – but that does rather raise the question of what the point of the hundred year rule is in the first place.

The second is that the genealogists of the 22nd century might just turn their noses up at the census: the deluge of social data may mean that the sense of discovery about lives of the past may have vanished altogether by then. By then, as well, the very idea of a census may well have come to be seen as a historical curiosity, if it turns out that this census was the last.

What though if they don’t, what if census data continues to be seen as valuable for social and family historians? That perhaps takes us back to where we came in. It is one thing for the coercive power of the state to require the provision of information to support the planning of public services, it is rather another thing to deploy that power to support the  hobbies of generations yet unborn.

In other contexts, the argument would not need to be made at all.  The fifth data protection principle would apply, requiring that

Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.

But by virtue of s33(3) of the Data Protection Act, the fifth principle does not apply, a fact which rather oddly is included in the census factsheet on confidentiality and data security [pdf] as if it added to rather than detracted from information security.  Despite that – or perhaps in part because of it – the case for jettisoning personal identifiers from census data seems a strong one to me.

Am I missing a similarly strong counter argument?