Paper cutouts (thinking on paper part 2)

I have had some great – and challenging – feedback to my recent post about thinking on paper, including a session at govcamp (and some rich conversations before and after it) and a discussion with a group of my work colleagues about our personal approaches to managing information at work.

Govcamp reflections

There was an energetic discussion at govcamp about managing information in government (or anywhere else still a bit trapped by the paper paradigm). My post about thinking on paper provided some of the context, as did Glyn Jones’ on helping civil servants help citizens. As ever at govcamp, the conversation quickly left those starting points far behind, but unusually, it’s possible to keep track of where it went to. This year the session note taking really worked for the first time – a great feat of co-ordination by the organising team, and a virtuoso display of keeping up with a fast moving conversation in the notes of this session taken by Harry Harrold and Sharon Dale.

Govcamp discussion - paper and information

Picture by David Pearson

One of the most helpful parts of the discussion for me was the one which didn’t happen. Nobody thought this was either an invented problem or a solved one. Even in a group heavily skewed to the technically adroit and self aware, there was a sense that coping with – and contributing to – organisational information was a continuing struggle.

Points from the discussion which particularly resonated with me included

  • We share information for a reason  Information sharing is not an abstract good (or at least, will not work if that is all that it is perceived to be). Being clear about both individual and organisational purposes in creating, storing, finding and archiving information is essential to finding more effective ways of doing it.
  • Information belongs to people and says something about them Do we as individuals have any right to control how we are presented and seen through our information or this (in this sense at least) a public space? Is there a right to be forgotten, and if so, what might it be?1 There was a pretty robust response from others in the group to the effect that we could not – and should not – hope to manage reputation in this way, but the challenge is not one we should forget.
  • We can avoid the cost of organising information by not organising it There is only any point in agonising about how best to organise information if we do actually need to organise it.  But that is, of course the key question. I had Benedict Evans’ thought rattling around in my mind, All curation grows until it requires search. All search grows until it requires curation.
  • Pages are not units of information We constrain our thinking if we let ourselves get trapped into thinking in terms of paper and pages. That was one of the main points of my first post, but a couple of powerful examples came out of the discussion. The first was the statute book: law is inherently intertwined, as John Sheridan has so often demonstrated, treating it as page-based documents makes it too easy to overlook the potential power of transclusion. More generally, it’s hard to think about small, linkable pieces of information when those small pieces are trapped in documents, and those documents are the units of information management.
  • A human guide can be more valuable than an index  There was a lovely example at govcamp of how human guidance could make a huge positive difference, a handover of work where the outgoing person had made a set of short videos explaining the structure and organisation – and above all, documentation – of the work, turning what could have been a painful transition into a simple and pleasant experience.
  • If everybody helps everybody else, everybody gets helped Much of the govcamp discussion touched in one way or another on the core point that information management is fundamentally human and social. As so often, the technology is not inherently complicated, the hard bit is to have a clear understanding of users’ needs and ambitions. Success requires reciprocal altruism: I get no benefit from helping you find the information I have created or understand. I get benefit from you helping me find what you know and understand. The challenge is in creating the culture and incentives to make that become the norm.

Working in the real world

Even before the beginning of govcamp, Catherine Howe had pointed out a gaping hole in the original blog post. As she quite rightly discerned, I had been thinking very much from the point of view of an organisation which wants and needs to have its information organised, and had missed the perspective of people faced with a deluge of information and needing to find coping structures which fitted their personal preferences and styles of working. Since I am one of those people just as much as anybody else, that really was a bit of an oversight. Rather than just write down my own prejudices though, I got a group of my very helpful colleagues to talk about how they manage information and how they feel about it.

It was another rich conversation, but three points struck me particularly forcefully:

  • People are different Some people feel disorder viscerally and are profoundly uncomfortable with anything but an empty in box. Others are comfortable treating their email as a swamp, with murky contents and an ever present risk of something unexpected floating up from the depths. No technology is going to induce either group to  become the other, so any solution has to be capable of dealing with both – and everybody in between.
  • Systems create habits Not only does familiarity trump usability, but systems create assumptions of what is normal (and what becomes instinctive and apparently natural). An approach which fails to align with those assumptions risks rejection, regardless of whether on some supposedly objective measure the new thing is better.  So for some in the group, using personal email folders felt like a natural way of organising information, but using a shared folder structure did not. Knowing that something ‘should’ be done differently may be enough to induce a mild frisson of guilt, but it’s not enough to change behaviour.
  • Discouragement is easy Letting things pile up gets very quickly to a tipping point where they don’t get done at all. Filing one document when already working on it for other reasons is easy. Sorting out what to do with ten documents is a chore which organised people will do as part of their routine. Faced with a hundred – or the accumulation of a few weeks, or even a  few days – the overwhelming response is to do nothing.

That’s not the whole story, of course, or anything close to it, but it does provide the beginnings of some insights about why this is hard, and why monolithic system design will always tend to disappoint.

  1. In many cases the names of individuals will be redacted at the point records are released at the National Archives, so in one sense the concern may be misplaced. But there is also the potential for an emergent internal personal profile and reputation. That’s a good thing not a bad thing – it’s effectively one of the five principles in the previous post – but managing personal sensitivities will be an important part of getting it right.

Thinking on paper

We used to know where to put things if we thought we might want to find them again later. It was called filing. Filing got less fashionable in government about 15 years ago, not coincidentally at about the time that paper was getting increasingly displaced.1

The way we think about and manage work and information has changed a lot since then, but to a surprising extent the idea of paper has survived much longer than the reality – all too often, we organise information as if it were on paper, even when it never has been and there is no expectation that it ever will be.

That has the advantage of keeping things simple and familiar. But it also has the much bigger disadvantage of obscuring opportunities to do things better.

There are five big differences that a new approach needs to reflect

  1. paper is not the medium
  2. files and directories are not the location
  3. metadata is data
  4. people are search engines
  5. friction is failure

Let’s look briefly at each of those, and then put them together to explore the wider implications.

Paper is not the medium

Word processing programs and web sites both have pages, but they are very different things. Word can’t do anything unless it knows what size paper you have in mind, and what you seen on screen will then be driven by that paper size, even if the document never has been and never will be printed. The physical structure (the point where there is no more room on a page) has no connection with the logical structure (the point where one section ends and another begins). And in most systems, you have to take possession of a document in order to open it. Moving and sharing such documents is still all too often stuck in a world of email as transport system and endlessly self-replicating document as payload.2

There is a small advantage in doing that: it allows the same mental models to be effective in the new world as in the old. But there is a much larger disadvantage: it gets in the way of developing new mental models which are better aligned to the greater power of search and organisation which new tools allow, it means we risk getting stuck in thinking about the new world as if it shared the weaknesses of the old.3

In short, as Mark Foden has put it:

It is time to move from circulating documents to visiting texts.

Files are not the location

Paper documents live in files. The key to finding the document is to find the file. And if you might need the file, you need a filing cabinet reasonably close to hand where it can be safely stored with lots of other files, almost certainly on related subjects. That of course has enormous consequences for the organisation and physical structure of work: if the unit of work is a paper file and that file is a unique (and therefore precious) assembly of information, the location of work is driven by the organisation of information.

Translating the analogy of the filing cabinet into the digital world is arguably even more pernicious than the analogy of paper.  A hierarchical folder structure reinforces the idea that there is a single, canonical, right place for a piece of information. More subtly, it supports the idea that that right place tells you all you might want to know about the nature of the information to be found there. Yahoo gave up the attempt to create a structured directory of the web many years ago: it turned out that the connections within and between web pages were a more useful source of information than knowledge of their location – and so began the ascendancy of Google. The absence of those of strands of connection – of hyperlinks – is the fundamental reason why searching a file structure is always more frustrating than searching the web. The web was, of course, invented as a tool to support the organisation of work, for just this reason, but that’s another story.

In supermarkets and filing cabinets, physical location is a direct representation of the approach to cataloguing. Hierarchical filing structures aspire to be like supermarkets, applying an unneeded constraint of the physical world, when they could be much more like Amazon, where the physical layout of goods in the warehouse has nothing at all to do with how they are found on the website.

Would we get further by focusing on tagging rather than filing? Tagging is social, collaborative and can be game-like; filing is lonely, bureaucratic and dull – prompted by this recent post on applying tagging to books – could it be more effective in a working environment too?

The title of that post – everything is miscellaneous – is a a reference to a book by David Weinberger published in 2007. I wrote a blog post about it at the time which I don’t think has dated too badly. This is the sophisticated version of the argument that we should give up on fixed classification as a way of finding stuff: essentially that even now the way we think about filing and retrieval is dominated by the constraints of paper. There’s even a rather splendid five minute video which summarises the argument.

Metadata is data

Tags (and file locations) are of course forms of metadata, as are authors, creation dates, and a host of other data scraps which get attached to files. The level of creative imagination needed to dream up three keywords was itself an insuperable hurdle to the effective adoption of document management systems.  That’s not to say that metadata is unimportant. Some of the obvious stuff – date ranges, authors’ names and more – can be extremely valuable. But there is potential which we have scarcely begun to explore and make use of. I may be the creator of two apparently very similar documents (perhaps even so similar that it’s impossible to identify which (if either) is canonical). Understanding who has visited that text may be critical in searching an otherwise undifferentiated mass. Understanding who has contributed to it (and when, and in what order) may be what separates the historically interesting from the ephemeral.

Data, in other words, which captures the history of a document (or better still, of an idea, of a policy, of a ministerial decision) can be useful and powerful. Texts can tell their own story – with not an arbitrary keyword in sight.

People are search engines

Traditionally, we have thought of document repositories as self-contained entities. Whether they are filing cabinets or databases, the raw  material for finding stuff is contained within the stuff to be found.

In the long run, that is unavoidably true – you will get no help from the authors of papers written a hundred years ago about why they are as they are and what thinking lies behind them. But in the short run, it’s not true at all: knowing who knows stuff is often as useful as knowing where that stuff might be filed. Knowing who else has worked on similar issues in the recent past and getting their insights into what the key documents might be is often the most useful thing which can be done – but whether it is practically possible is often a matter of chance. And while filing is a chore for most people, a request to share knowledge is a mark of respect and is usually welcomed as such.

So if we want to make better use of the knowledge stored in our systems, we need to be better at finding the people who created or who understand that knowledge. And if people are the best search engines, we need the best search engine to find people.

Friction is failure

All of these problem were supposed to have been solved many years ago. Yet somehow they persist, seemingly impervious to the wonders of new technologies and new ways of working. The underlying argument of this post is that one important reason for that is that we still too easily remain trapped in framing the problem around paper, files and working practices – and that as long as we do so, the promise of digital will remain only a promise.,

But that’s not the only reason why we are still grappling with these issues. Another critical one is that we underestimate the power of friction. People are very good at balancing costs and benefits, even (perhaps especially) if they do so completely unconsciously. If benefits accrue in the indeterminate future and costs are incurred today, those costs will be avoided to the greatest extent possible. And if the present value of those benefits is at or close to zero, the acceptable cost will also be zero.

My instinct is that that explains why it has proved so difficult to persuade people even to do things which appear trivial to those who have designed the systems. Assigning keywords or navigating to the right folder are tasks which take just seconds – but if those seconds are pure cost, they will be avoided as far as possible.

There are two ways of dealing with that. The more obvious is to attempt to reduce the cost. It’s true that trying to get people to do a smaller and easier thing is more likely to succeed that getting them to do a bigger and harder thing, but if they are resistant to doing anything, it will still be a long and continuing struggle to make any kind of difference

The second way is less obvious, but I suspect is more powerful. Instead of reducing the cost, let’s aim to increase the benefits.  The time value of money (or in this case, the time value of time) strongly suggests that the scale of increase needed for benefits far in the future is likely to be unachievable. So instead, let’s look for ways of bringing the benefits into the present. That makes it much more a social challenge than a technical one. If we recognise, reward and value people who manage knowledge effectively, if we set real expectations that a piece of work will be seen as successful only if it is captured in a way which maximises its medium (and longer term) information value, then perhaps the trade off changes.

Shredding the paper

So where does all that get us? My conclusions are still tentative; we need to do more to test and explore how we best manage information in a digital working environment, and learn from those who are already doing it well. I am pretty sure though, that there are three traps we need to escape from:

Once we have got all that out of the way, we are in a position to be much more positive – addressing the challenge of using information to help civil servants help citizens.

The point of all this is not to prescribe – still less proscribe – how we might want to manage these things in the future. That thinking needs to be done, but this post is not it. The point is rather that we are more trapped in thinking about information in ways constrained by the office of the mid-twentieth century than we like to realise. If we want better solutions, we first need to find better problems.

  1. Not just in government of course. But as in other areas, the same external drivers can result in very different rates of change.
  2. This is beginning to change with the growth of cloud based services, but the cultural experience is still to download a document, then open it, and only then be able to edit or create.
  3. Getting rid of paper as a metaphor for digital information is a very different matter from getting rid of paper as a physical object. There are still valuable uses for paper, which is why the paperless office is another piece of the future which keeps receding, but that metaphorical use is not one of them.

Story and history

This post is mainly about The Imitation Game but was written before I had actually seen it. So it’s not a film review in any normal sense. Having since watched the film, I have added a short update at the end, which isn’t a film review either.

What’s the difference between history and a good film? Quite a lot, quite often, is the unsurprising answer. Films – other than documentaries – are there to entertain rather than educate and their success is measured in tickets sold, not consciousness raised.1

Some films (and novels, paintings, poems) tell stories based more or less solidly on real events, but even with the best will in the world, historical precision and popular entertainment are not always easily aligned. Sometimes the real events are just a backdrop to a predominantly fictional story, and it is clear that no deeper lesson is intended. But sometimes there is an apparent intention to tell a true story in a broader sense, not suggesting that every word and every character is drawn from life, but certainly giving the impression that the main actors and actions are firmly grounded on historical foundations.

And so to The Imitation Game, which is partly about the life of Alan Turing and partly about code breaking at Bletchley Park during the second world war, in which he played a central role. Both Turing and Bletchley are very real, as is their significance in the history of the war.2 There is an important story to tell, with elements of personal and institutional history which make it a compelling one. Inevitably and unsurprisingly, it is a complicated story with many players. Turing’s role was critical but not, by itself, sufficient. His work built on pre-war cryptography by Polish mathematicians and was made usable by those who turned his theoretical concepts into working machines. Thousands of people worked at Bletchley Park, not just one.3 For those and other reasons, Sue Black’s verdict on the accuracy of the account given in The Imitation Game is damning:

The story of Turing physically building the Bombe machine, or “Christopher” as it was called in the film, formed a large part of the central story of the film. This is, to my knowledge, completely inaccurate. […]

The story running through the film of one main codebreaker, Turing, with a team of four or five, producing a machine that won the war, is a ridiculous oversimplification of what actually happened. More than ten thousand people worked at Bletchley Park, more than eight thousand of them were women. We didn’t really get a flavor of that coming through at all from the film. There were many teams of codebreakers working on different areas of codebreaking. […]

Gross over simplification of stories, people and facts, focusing on Turing’s one (platonic) heterosexual relationship and not giving any time to his homosexual relationships, attributing work carried out by several people who still have had almost no recognition for their enormous contribution to Turing, I could go on, and on, the film has many faults.

Sue is no casual commentator. She knows her stuff. So after having lacerated the historical inaccuracy and damned it as ‘a clichéd bubblegum version of the story’ with a ‘sometimes ham-fisted script’ in which ‘Turing’s character is so much a stereotypical English eccentric that I found it insulting to his memory’, it’s pretty obvious that she is going to tell the rest of us to stay well away from it.

But she does no such thing.

I have to say that overall I loved it. Thinking about The Imitation Game from the point of view of how it presents such an important part of our history in a user friendly and easily digestible way to the average person in the street gets me very excited. […]

The Imitation Game is probably the most fundamental contribution we have so far to the public understanding of the importance of Bletchley Park. I hope that it wins Oscars, breaks box office records and brings the story of our wonderful British hero Alan Turing into the public consciousness.

That contrast makes Sue’s blog post one of the most thought provoking film reviews I have ever read. I hope I am not being unfair if I summarise her position as being that The Imitation Game is a deeply flawed film with serious inaccuracies, but should nevertheless be recommended, because it is better that people have an imperfect understanding of Turing and Bletchley Park than that they should have no understanding at all.

That raises some really interesting – and really difficult – questions. Do film makers have a responsibility for historical accuracy? Does anybody else? Does it matter if history is broad brush if the gist of it is right? Do the answers to those questions change for more distant history?

On the face of it, the first question is easy. The thought that the history police should scrutinise scripts and rule on historical disputes is clearly risible. But that doesn’t stop films creating real concerns. Enigma machines are at the centre of another example, from fourteen years ago, when the US film U-571 was pilloried in the UK for what was seen as appropriating a British victory in capturing an Enigma machine from a German submarine, and representing it as an American achievement. The fact that the film did not even purport to represent a real incident did not stop a political outcry, including at Prime Minster’s questions. The switch had been made for very simple commercial reasons: American audiences are more likely to pay to see films featuring American heroes.

The fact that commercial factors might influence the content and structure of a film can hardly be shocking news. But the fact that there are concerns is a useful reminder that history matters, that our understanding of who we are and how the world works is in part a function of our understanding of the past. Trying to create and manage that understanding is not always a neutral and disinterested activity. A couple of days ago, my son brought home a copy of an article by John Sweeney, handed out in a GCSE Soviet history lesson, ‘Russian textbooks attempt to rewrite history’:4

They call it “positive history” and the man behind it is Putin. In 2007, the former secret police chief told a conference of Russian educationists that the country needed a more patriotic history. Putin condemned teachers for having “porridge in their heads”, attacked some history textbook authors for taking foreign money — “naturally they are dancing the polka ordered by those who pay them” — and announced that new history textbooks were on their way. Within weeks, a new law was passed giving the state powers to approve and to disallow history textbooks for schools.

Systematically bending history to the service of a current state ideology is clearly different from being cavalier with the truth in the production of entertainment. But in their very different ways, they present a version of the same challenge. If history matters at all, truth matters. It matters that there was a state-induced famine in Ukraine. It matters that the Soviet Union did not win the second world war single handed. And it matters that Alan Turing was not complicit in treachery, it matters that the work of others was attributed to him, it matters that those others were brushed out of the story.

So back to the dilemma presented by Sue’s review. Is it right to ignore major inaccuracies in the telling of a story if that’s the only way of telling the story at all? What if Turing’s sexuality had been ignored altogether? What if he had been left out of the core narrative altogether? Would it still be better to tell the story than not? Does it make any difference if the distortion results from being selective about things which are true rather than from including things which are false?

I don’t think there are easy answers to those questions. There is not, and cannot be, a pure and perfect history of any event: history, as I have argued before, can be no more than what historians write and can never be anything other than selective. I still feel uneasy celebrating the learning of history through a medium which is careless of history. On balance though, and with some reluctance, I conclude that Sue is right. If the choice were between two powerful dramatic presentations, one more accurate than the other, it would be easy. But when the only choice we have is between flawed understanding and no understanding, the flaws need to be fundamental before we should favour ignorance.

So having got all that out of the way, maybe I should just go and see the film.

Update, 3 January 2015

I did go and see the film and am rather less inclined to recommend it as a result. Taken as pure fiction, it’s entertaining enough, though with gaping holes in characterisation and plot. But its premise is that it is depicting real lives and real events, and on both counts it falls down badly. Somebody coming to the film with no knowledge of the history would learn that code breaking was important, that cryptography is fundamentally mathematical, and that in many ways the people at Bletchley Park were inventing modern code breaking as well as doing it. But almost every detail of how those things were done is either wrong or misleading. Film makers like lone (and preferably eccentric) geniuses who achieve through inspiration, they dislike teams who achieve through sustained and systematic work. That’s not because they are mad or bad, but because film is a more effective medium for some kinds of narratives than for others.

If you knew nothing about Alan Turing before you watched The Imitation Game, you would know more about him by the end than you did at the beginning. But a lot of what you thought you knew would be wrong, and you would have little basis for separating truth from fiction. At the end of the original post, I said that ‘the flaws need to be fundamental before we should favour ignorance’. On reflection, that’s not the real choice: to my mind the flaws were pretty fundamental, but that doesn’t mean that I favour ignorance. Fiction is not history, even when it is historical fiction. Like Sue Black, I would rather see a world in which more people knew more history. If The Imitation Game creates an appetite for history, that is a good thing. But it is not its purpose to satisfy that appetite, so it is no surprise that it does not do so.

  1. Even documentaries can never tell the whole truth, even if they were able to tell the truth and nothing but the truth, as I have discussed before.
  2. There is a much repeated claim attributed variously to Churchill and Eisenhower (and to ‘historians’) that Allied code breaking shortened the war by two years or more. The firmest attribution, and the clearest and best argued version of the claim is by Sir Harry Hinsley, whose view is that:

    the war would have been something like two years longer, perhaps three years longer, possibly four years longer than it was

    That is of course debatable, partly because counter-factual history is always debatable, partly because it assumes that the war would have been won the same way more slowly, as it was in fact won more quickly (rather than, for example, by using nuclear bombs against Germany), and partly because other historians draw different conclusions from the evidence – Paul Kennedy, for example, in his Engineers of Victory (p. 358 ) is explicitly dismissive of Hinsley’s claim.

  3. And of course that summary is itself a ridiculous oversimplification.
  4. The article is behind the Times paywall, but there is what appears to be a complete copy here. That piece is from five years ago, but more recent press coverage does not suggest any change of direction.

Making connections

The web is a failed information management system.

What is odd about that statement is not that the attempt has failed – I don’t think I have ever heard of any other fate for an information management system – but that the fact of the attempt has been so completely forgotten.

Information is everywhere, of course. The public web, or at least some parts of it, is densely populated with links. Following a chain of them long beyond the answer to any question you might have started with is the road trip of the internet age. But beyond a still-thin surface layer, many end points of links remain resolute no through roads.

Communications network

The idea of hyperlinks long predates the web. The hypothetical Memex engine dates back to 1945 and a recent article in the Atlantic takes the story back to the nineteenth century. More recently, everybody knows that Tim Berners-Lee invented the world wide web,1 but there is much less understanding of what it was he thought he was inventing.

Berners-Lee described the problem he was trying to solve in his famous paper proposing a new information management system for CERN:

CERN is a wonderful organisation. It involves several thousand people, many of them very creative, all working toward common goals. Although they are nominally organised into a hierarchical management structure,this does not constrain the way people will communicate, and share information, equipment and software across groups.

The actual observed working structure of the organisation is a multiply connected “web” whose interconnections evolve with time. In this environment, a new person arriving, or someone taking on a new task, is normally given a few hints as to who would be useful people to talk to. Information about what facilities exist and how to find out about them travels in the corridor gossip and occasional newsletters, and the details about what is required to be done spread in a similar way. All things considered, the result is remarkably successful, despite occasional misunderstandings and duplicated effort.

A problem, however, is the high turnover of people. When two years is a typical length of stay, information is constantly being lost. The introduction of the new people demands a fair amount of their time and that of others before they have any idea of what goes on. The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found.

The solution he described combined technology and usability, recognising from the outset that people would use something which was attractive and useful:

The aim would be to allow a place to be found for any information or reference which one felt was important, and a way of finding it afterwards. The result should be sufficiently attractive to use that it the information contained would grow past a critical threshold, so that the usefulness the scheme would in turn encourage its increased use.

That’s a fine ambition which became an information revolution and it’s pretty clear that the ‘critical threshold’ was passed quite a while back. But the initial problem Berners-Lee described still sounds uncannily familiar today, and is still a long way from being solved. As I wrote a while back:

One of the purposes of this blog is to help me find things I half remember thinking five years ago. I have no equivalent tool at work for finding my thoughts, let alone anybody else’s. That’s an important reason why so much energy is devoted to the reinvention of wheels.

There has been a flurry of recent coverage for the brave study by the World Bank which shows that a third of their policy reports are never downloaded and almost 90% are never cited (though if I have understood their methodology correctly, my citing their paper on the citation of papers would not be counted as a citation, so the precise numbers should not be taken too seriously). But although the coverage has included wry comments about the fact that a report about how pdf documents are little read is itself a pdf document, I haven’t see any recognition of a more fundamental problem. The introduction to World Bank report is a statement of why knowledge and the sharing of knowledge matter, including (with the emphasis in the original):

Internal knowledge sharing is essential for a large and complex institution such as the Bank to provide effective policy advice. Bottlenecks to information flows create inefficiencies, either through duplication of efforts and diverting resources from knowledge creation itself.

With that thought it mind, it turns out that that report does not link to any of the published material it refers to. It has a long list of references, many of them to other papers by the World Bank itself, but in virtually all cases they are textual descriptions, not links.2 It’s a dead end not because it’s a pdf, though that doesn’t help, but because it is constructed as an end point, not as a node in a network.

I have laboured that point a bit not because I care greatly about the information management practices of the World Bank, but because I suspect they are distinctive more in the visibility of what they do, than in the doing of it.

Most of the material I see in my working life is self-contained and very little of it makes explicit connections to other information.3 There are two big reasons for that (as well, no doubt, as a host of smaller ones).

The first is technical. You can only link to something if you know where it is now. There is only any point in linking to anything if you can be confident that it will still be there next week and next year (and in some cases, next decade). That requires information to have a permanent, canonical location at an appropriate level of granularity and for the arrangement of information to be more durable than the arrangement of work.

The second is cultural. You will only link to something if doing so is seen as valuable (and if doing so both is and is perceived to be easy to do). Links are most likely to be seen as valuable by people who might choose to follow them. Following links is easy for somebody reading on a screen, but impossible for somebody reading on paper. Reading on a screen is easier if the material is designed to be read that way, not just in layout but in information richness.  So there is little chance that links will flourish in an environment where most information is designed for presentation on paper (even if it is actually sometimes consumed on screen).

Any solution to the information management problems of organisations needs to address both the technical and the cultural issues. The technical solution is necessary, but wholly unsurprisingly, it falls very far short of being sufficient. Even with the network in place to support a much more web-like approach, we cannot hope to consume information that way until we start producing it differently.

But if we succeed, there are prizes well worth having here, which go far beyond better information retrieval. As Tim Berners-Lee speculated a quarter of a century ago:

In providing a system for manipulating this sort of information, the hope would be to allow a pool of information to develop which could grow and evolve with the organisation and the projects it describes. For this to be possible, the method of storage must not place its own restraints on the information. This is why a “web” of notes with links (like references) between them is far more useful than a fixed hierarchical system.

The need hasn’t changed in the last twenty five years. Perhaps we should try the solution.

  1. Apart from the people who persist in believing that he invented the internet.
  2. There are precisely two clickable links, both to posts in the same blog – but bizarrely the links are to the blog’s homepage rather than the specific posts being cited, so even those don’t help as much as they should.
  3. The one big exception to that is emails which contain long chains of their predecessors, but the less said about that the better.

A bus company with a train set

Quick question: what’s the dominant form of public transport in London?

And an irresistible second quick question: what is wrong with this picture?

TfL beta site top banner

We will come back to the second question, but if your answer to the first was the tube, you can be forgiven. That’s the most distinctive, most high profile part of what Transport for London provides, as well as being where most of the money goes. But the heavy work of moving people around London is done by buses: there are about twice as many bus journeys as tube journeys and there are almost 20,000 bus stops against a mere 270 underground stations.

One reason for being misled into thinking that the tube is all that matters is that that is what TfL itself seems to think. I have written before about how the ticketing system treats buses as an afterthought and the poor information design of bus arrival signs, but a fairly cursory look at the TfL website shows the depth of its assumption that the tube is what matters.

Contrary to how it might appear, this is not actually a TfL bashing post, it’s a complex information management post. The state of the underground is easy to communicate. The state of the bus network is considerably harder both to establish factually and to communicate clearly.

TfL - Live travel newsLet’s start with live travel news. The screen is dominated by the list showing the status of each underground line and a large map of service disruptions – even when there are no disruptions on the map (click on any of the screenshots to get a fullsize version). Other forms of transport are accessible through tabs across the top – with buses getting a tab less than half the size of the tediously named Emirates Air Line. There is a live bus arrivals link at the bottom of the page, but it’s off the bottom of every screen I have ever used, and I had never noticed it before taking the screen shot.

TfL - Live travel news - busesClicking the buses link takes us to a rather muddled screen. Leaving aside the tube planned works calendar and the tube improvement plan (but is there really nothing which might have been said about buses in those spaces?), there is a link to live departure boards (which is a generic page dominated by the tube despite having already established a primary interest in buses). More promisingly, you can put in a bus route and check for disruptions, though the result of doing so is more than a little strange:

There are currently no relevant disruptions or this is not a valid route

Since it doesn’t seem unreasonable to assume that TfL knows what bus routes it runs, they should presumably be able to tell which of those alternatives is correct, and be willing to share that knowledge.

But perhaps I am being unfair.  TfL is developing a new version of its website, currently in public beta, perhaps that will provide better navigation and information. The beta certainly looks much smarter and fresher, but on what is available so far, the primacy of the tube appears to be alive and well.

TfL status update betaTfL home page betaThe beta home page has a  smart status display which focuses on lines with problems rather than giving equally prominence to lines with no problems. The status update page is just as dominated by the tube as the old site – even the row of tabs linking to other transport modes has been replaced with a drop down menu. And rather tellingly, both the url structure and the breadcrumb trail firmly position buses (and everything else) as subordinate to the tube as default.

Default status updateStatus update 2

So much for the website. What about the information?

It is of course massively easier both to gather and to present information about trains than about buses. A simple status indicator for each line doesn’t take up much space and already tells you quite a lot, drilling down from that can quickly tell you all there is to know. That approach cannot possibly work for hundreds of bus routes and thousands of bus stops. When things are disrupted, trains still sit neatly on their tracks, but buses can wander around all over the place. That means it’s also generally much harder to describe what is going on instead in a way which is simple and comprehensible. A search on route 2, for example, turns up this gem:

Until December 2013, bus stopping arrangements will be changed due to major works. For southbound 2 36 185 436 N2 and N136 buses, please use stop L in Vauxhall Bridge Road. For northbound 2 16 36 52 82 and 436 buses, and westbound 148 buses please use stop H in Wilton Road or Stop Q in Grosvenor Gardens. For eastbound Route 11 211 N11 and N44 buses, use stop S in Buckingham Palace Road, to the west of Victoria Station. For northbound Route 24 buses, please use stop J in Wilton Road, and for southbound Route 24 buses, please use stop U in Vauxhall Bridge Road. For southbound Route 44 170 C1 C10 and N44 buses, please use stop R in Buckingham Palace Road, to the west of Victoria Station. For eastbound Route 148 buses, please use stop N opposite Westminster Cathedral. For northbound Route C2 buses, please use stop Q in Grosvenor Gardens. Route 507 will start from Victoria Station and operate via Rochester Row, towards Waterloo only, to Horseferry Road.

Apart from the fact that only one sentence of that is relevant to the route I searched on, it’s pretty hard to make sense of any of it without pretty detailed knowledge of roads and bus stops around Victoria. There is a link to a map of the route, but as it’s marked “(does not show disruptions)” that’s not a great deal of help

It’s easy to carp, of course. Providing enough information to be useful but not so much as to confuse is a tricky balance to get right, and relying solely on words to do it makes it harder still. Hard is not the same as impossible though. There’s plenty of scope for improvement through better information design. But while buses are treated as a perennial afterthought, the problem may not get the focused attention it needs.

Semi-intelligent bus location mapIn other areas, TfL has recognised that it is better at transport management than information design. There has been an explosion of creativity since they opened their data to third party developers, but that doesn’t include the critical information about where the buses actually are. Matthew Somerville has made a splendid attempt at interpolating location from bus stop arrivals, but it’s closer to conceptual art than a practical tool. Since TfL undoubtedly does know where its buses are, it would be far better to allow access to the information than to be reduced to inferring it inaccurately. With some really smart programming, that might even allow for emergent disruption reporting, with diversions appearing on a map because that in practice is what buses were doing rather than because the disruption had been published.

And so finally to the second question: what’s wrong with the picture at the top of the post? No points at all for recognising what it is a picture of. My first thought was that it was a bodged stitch up of more than one photograph, which seemed both complicated and pointless and so a bit unlikely. But then I realised that the image had been flipped – with the most obvious clue being that the traffic is on the wrong side of the road. Maybe that’s a sign that TfL has ambitions to copy Sweden.  Or maybe not.

History, weak

It’s history week at the Cabinet Office, a series of internal events designed to remind the current generation of policy makers both that there is always something to learn from history and that their work will become history in its turn. It being Cabinet Office, there are ways of emphasising history not open to every organisation: we sat in the room from which Churchill went out onto the balcony to announce victory in Europe to the crowds below.

But it was a couple of tables at the back of the room which prompted this post. Casually strewn across them (but not so casually that white cotton gloves were not strewn around as well) was an eclectic set of historic documents. One group were records from 1984, on their way to the National Archive to be released next year under the thirty year rule. They were in files which were visually indistinguishable from those produced decades earlier and which would continue to be produced for a decade or so longer

And I was reminded of a post I wrote three years ago about the end of the file as a unit of work organisation and the implications for our ability to know what we know. I think it still bears reading.

If progressing the work continues to diverge from creating records of what has been done, the raw material of history may be thinner in future than it has been for centuries (and history here means medium term institutional memory as much as it does the work of historians). That problem will not be solved by exhortations to do better filing: it will be solved, if at all, by tools which support what people are trying to do in the short term while quietly adding what may be needed for the longer term – which is easier said than done.

Three years on, I have seen nothing which makes me think that problem is going to go away, though I would be delighted to be told that I am wrong. Historians and policy makers will both need new skills and new tools to operate effectively in that world, with landscapes much less clearly mapped than they once were.

That’s not really the end of history, of course. As I said back then,

History will, of course, look after itself. It always has. But the future history of our time will be different from our histories of past times, and that will not be because we have an eye to the future, but because we are always relentlessly focused on the present.

The Guardian pwned my blog

Update:  Since posting this this morning, I have had two people contact me from the Guardian – one in a comment to this post and one by email.  As a result, I am reassured that what I experienced was a bug they are keen to fix rather than indifference to the context in which Guardian material might find itself.  The email response suggested that the most recent version of the plugin – 0.3 – already fixed the problem.  I am not sure that’s quite right, so continue to advise extreme caution – but the intention is clearly there to make the plugin work as I argued it should.

I am removing the Guardian wordpress plugin which I wrote about a couple of days ago. It has a couple of major flaws, and I would discourage anyone from using it until they are fixed.

The Guardian is perfectly entitled to manage the presentation of its own material. The terms and conditions for the use of its data leave no scope for doubt of their absolutely fixed intention of keeping that control (even if  the language of those terms and conditions feels slightly at odds with the concept of an open platform).  Nowhere in those extensive conditions though does it state that the Guardian claims the right to extend that control to the host blog.  But that is what the plugin does.

As I noted before, embedding a Guardian article brings with it a title for the blog post of which the article forms a part – but only a part – tags and an excerpt.  None of those were what I wanted for the post I wanted to write, so I deleted them all.  Not ideal from my point of view, but it was, I presumed, an attempt to be helpful.  Having set them to what I wanted them to be, I now discover that Guardian plugin has taken it upon itself to change them all back again. I don’t find that acceptable.

It gets worse.  My next act was to deactivate the plugin.  That caused it to remove the Guardian article – which is fair enough. It’s not hard to identify the text which belongs to the Guardian.  It begins:

<!– GUARDIAN WATERMARK –>

and ends:

<!– END GUARDIAN WATERMARK –>

It could hardly be much clearer – but the plugin takes no notice of that, and instead completely deletes the entire post, including all that I had written.

It’s not that the Guardian doesn’t expect bloggers to put their own context and commentary round articles: their own documentation makes clear that that is exactly what they expect.  And the use case of doing nothing more than republishing articles strikes me as an odd and unlikely one. But regardless of that, the entire text is swept away.

I hope there is nothing more here than carelessness either in design or in testing, but I am going back to the old fashioned way of quoting and linking, following the advice in one of the comments on the Guardian page about the plugin:

I really fail to see the point of this plug-in. If I want to post excerpts from Grauniad articles on my wordpress blog, I copy and paste. I can change anything I like; Idon’t need an effing key; I don’t have to put up with any ‘…ads and performance tracking…’; and I decide what gets deleted, not you…

Small pieces, joined not quite loosely enough

Here’s a small cautionary tale of unintended consequences. It explains why the particularly eagle eyed will have seen a post on the blog this morning which quickly disappeared – though not quite quickly enough to stop it propagating round the web.

Over the weekend, I installed the new Guardian wordpress plugin, more out of curiosity than because I thought I had much use for it. But then I came across an article about repurposing and representing text.  The temptation to repurpose and represent it was irresistible, so I wrote a couple of introductory paragraphs and thought no more of it. Then on the bus to work this morning, I remembered that I hadn’t actually posted it, and used my phone to change its status.  So far, so good.

Then I checked on the published version of the post. There it was, on the mobile version of the site (which uses the WPtouch theme) – but although the title was right, the words were not mine – in fact I did not recognise them at all.  They referred to the Guardian article, but did not come from it. I couldn’t work out what had happened and my bus stop was approaching, so I unpublished the post and went to work. But although the post had been live for no more than a minute or two, that was time enough for the RSS feed to have been picked up by the Google Reader account which drives Public Sector Blogs, which generates a tweet which tells the world (or that rather small corner of it which takes an interest in such things).

The strange words turn out not to be quite so mysterious after all.  The version of the article on the Guardian website has an introductory sentence which does not appear in the body text – the words above the byline in the screenshot.  It turns out that the Guardian plugin uses that text to populate the ‘Excerpt’ field – and since that field is one I never use and is collapsed in my normal view of the wordpress dashboard, I had no idea it was there.  The WPtouch plugin uses that short excerpt to populate the home page view of the blog on a small mobile screen.  All perfectly sensible, no harm done, a very minor storm in a very small tea cup.

But there is – I think – something interesting which comes from all of this.  It is that my understanding of what the Guardian is trying to do with its plugin is radically different from their understanding.

From the point of view of the Guardian, I assume, they are seeing a new way of syndicating their articles.  For them, perhaps, the article and thus its metadata are what really matters.  It makes perfect sense to force extract text, tags and a title on to the blog post in which their article is embedded, because the post is essentially the article.  And it makes sense not because they are bullies, but because they are trying to be as helpful as they possibly can be.

From my point of view, I know, I am seeing a new way of illustrating my blog posts.  For me, it is my blog post which really matters – not because of any intrinsic superiority, but because if all I wanted to do was point to articles on the Guardian’s website, pointing to them is all I would do.  So the chances of the preamble to the article being the most appropriate excerpt for the post as a whole are vanishingly small, and the idea that the Guardian has the right to pre-empt my chosen title suggests that they see themselves as rather more important than I do.

The Guardian also requires their article to appear in full, with links, copyright notice, tracking codes and adverts left intact and uninterrupted – in effect to require the blog owner to cede control over the space in which their article is reproduced. I don’t have a problem with that requirement, and for anyone who does, the simple solution is of course to link to articles rather than reproducing them.

But I would like to see the same respect and lack of interference with my content from them as they expect from me.  It’s early days, the version number of the plugin has climbed from 0.1 to 0.3 over the last 48 hours, there is plenty of opportunity – and I don’t doubt plenty of willingness – to tweak and improve.

All of this in the context of being strongly sympathetic to the Guardian Open Platform, partly because it is fascinating watching a newspaper trying to reinvent itself in real time, but even more because, as I wrote last month, the approaches the Guardian is pioneering have much wider implications, not least for public service providers.  Some of these same issues about the syndication of content interests of the different parties involved were behind some of the discussion today at NESTA’s digital disrupters event, for example.

Normal service will now be resumed, with the post which caused all the trouble this morning appearing shortly after this one.

Information on full power

The final version of the Power of Information Taskforce report is out, with recommendations in six main areas:

  • enhancing Digital Britons’ online experience by providing expert help from the public sector online where people seek it;
  • creating a capability for the UK public sector to work with both internal and external innovators;
  • improving the way government consults with the public;
  • freeing up the UK’s mapping and address data for use in new services;
  • ensuring that public sector information is made as simple as possible for people to find and use;
  • building capacity in the UK public sector to take advantage of the opportunities offered by digital technologies.

No chance to read it yet, let alone compare it with the original draft (which is still available with all the comments on it), so I am still at the level of first impressions – which of course matter a lot, not least for all those who will never read the whole thing.  On the substance, it looks first rate:  it has a clear and coherent set of recommendations, each of which is cogently and succinctly argued.

The one apparent weakness is the executive summary.  It harks back to a distant time when a summary was exactly that, with none of this 'executive' nonsense tagged on the front:  if you read it, you have a sense of what is in the report.  But it isn't written as a hook to pull in somebody who doesn't already know why they should be interested.  There's an argument for not scaring the horses too much:  the full implementation of all the taskforce recommendations would add up to a radical change in the way government does business.  But the recommendations won't get implemented without communicating a sense of excitement and a sense of why these changes are unavoidably the right things to be doing.

Maybe that needs to be a separate and slightly different document – but I am pretty sure that it is a necessary part of the marketing drive which is needed to make all this work.   As I observed on the draft in a different context, there's a need to get the reading right as well as the writing.