A lot of time and energy has gone into thinking about how text-based information should be structured and organised at the technical level. That’s not a bad thing, but risks giving too little attention to the fact that a vital reason for storing information is to be able to find it and use it – and to connect it with related information.

That’s a bit different from the position on more structured information, where one of the things which governments – and other institutions – have learned rather painfully over the last few years is the importance of abstracting data from the technology in which it is stored and processed. Setting clear standards for open data and creating the expectation that it will be accessible through APIs changes the nature of the game. It makes it possible to use and reuse data without needing to be concerned about how the original data is stored and managed, still less about aligning the technical architecture of the receiving system with that of the producing system.

That’s not to say that everything is perfect, of course. but the principle of open data is well established and much work has been done on the practical implications for government as a provider of information. The ambition is clear – summarised in this only very slightly cheesy video from Sprint 16 last month:

None of that quite works though when we look at  other kinds of information sharing within (and potentially beyond) government. Documents are more constrained by their underlying technology than is open data and – crucially – relations between documents are more constrained still.

There is though some progress here too. GDS has recently published guidance on Sharing or collaborating with government documents which sets itself a clear challenge:

Citizens, businesses and delivery partners must be able to interact with government officials and services, or those working on behalf of government, sharing appropriately formatted, editable documents.

Officials within government departments also need to work efficiently, sharing and collaborating with documents. Documents in this context include word processed text, spreadsheets and presentations.

The primary solution proposed in the guidance is to standardise on open document formats to maximise interoperability, though there is also a very brief reference to collaboration:

For information being collaborated on between departments, browser-based editing is preferable and becoming more widely used, but this option is not yet available to all.

In practice though, file formats are not the primary – or even a significant – constraint on sharing documents.1 The obstacles to more effective information management and to more effective collaboration are to be found elsewhere.

Instead, the main issues we need to address sit above and below technical document structures and file formats. There are three I want to touch on here:

  • Text is held as documents
  • Documents are held in closed systems
  • Documents are self-contained

Taking those three together will then get us to the slightly startling conclusion that the world wide web might be rather a good idea.

Text is held as documents

The first point – that the metaphors of documents (and pages and files) now get in the way of thinking about effective information management rather than being an inevitable architecture for it – is one I covered in my previous post, so won’t repeat here. But for a short sharp summary of that point – and of the possibilities which framing the issues differently could open up – you could do a lot worse than spending 75 seconds watching this video from TextThing:2

In this context, the key point though is that the document is the unit of information management. There is no easy way of liberating the information within it which is analogous to the way an API can liberate the information contained in a database.3

Documents are held in closed systems

Documents held by organisations are usually trapped within them. Within government, the frustrations caused by the boundaries between departments remain largely in place as they did when I wrote about them almost three years ago. There are some promising signs – I can choose to share a text I am working on with somebody in another organisation (though whether the recipient can actually use it in that form is another matter) – but it’s much harder, for a mixture of technical and organisational reasons, to share contexts and bodies of material.

That’s part of a bigger problem, that it’s less and less helpful to think either of publication as a binary state, or of published things existing in a special place that is intrinsically different from other places. Instead, it may start to be more helpful to think of publication as a state, or set of permissions, which can be set more or less broadly as the content of a text and its stage in the information lifecycle may require.

Documents are self-contained

Most of the texts I see at work would look remarkably familiar to a civil servant from any point in the last century or so. The content might be surprising, the informality of both style and address might be disconcerting, but their essential structure and purpose would be instantly recognisable. One of the reasons for that is that they are not generally thought of – by authors or by readers – as nodes in a network.   It is unusual to have many references, and vanishingly rare for references to be hyperlinked.[citation required]

That means that any individual text is essentially a dead end. There might be some means of finding your way there, but once there, there is no way forward. For humans, that makes it harder to navigate their way through related texts. For systems, that makes it much harder to support search and navigation based on context and relevance as well as pure content. Given that that is essentially what enabled Google to take over search, that’s quite a big gap.

This post is about the same length as the most recent document I wrote at work. This one contains nine conventional hyperlinks, two embedded tweets, one of which in turn embeds a video and one directly embedded video.4 The other one contains no links at all. Each of them feels natural and normal in its own context. More subtly, but very importantly, I am writing the post in the WordPress editor, which is optimised for linking and embedding. You can get the same result using Word, but it’s a much more painful struggle.

Accidentally reinventing the world wide web

So perhaps what we are after is a web of texts, connected to each other much more at the level of content than of file structure, with tools for searching and exploration which still take some account of structure and metadata but which are primarily focused on content and significance.

This is not a new idea. Somebody has been here before us:

In providing a system for manipulating this sort of information, the hope would be to allow a pool of information to develop which could grow and evolve with the organisation and the projects it describes. For this to be possible, the method of storage must not place its own restraints on the information. This is why a “web” of notes with links (like references) between them is far more useful than a fixed hierarchical system.

That somebody was, of course, Tim Berners-Lee, and his idea is what became the world wide web. But it’s worth remembering that the original concept was all about the web; the world wide bit came much later. What Berners-Lee thought he was inventing was an information management system – the fact that the web has since grown beyond all recognition is a measure of the power of that idea, not of its failure.

Managing information – and letting information manage itself

One of the things the experience of large information management systems has taught us is that users are intolerant of friction, that tagging and metadata are seen as burdens not as investment. That’s one of the reasons – though not of course the only one – why 17 years after the government decided that records should be kept electronically, the means of doing so still feel uncertain and immature.  As a result, a lot of thought has been given to identifying the absolute minimum users can be asked to do to make a useful contribution to the management of shared information, with the fear that any request to do more than that absolute minimum will result in nothing being done at all.

We may not be able to make that question go away completely and immediately, but perhaps we can find a better question to replace it. How might we better write text in a way which lets the information manage itself?

This is now the fourth in a loosely linked series – the earlier ones are Thinking on PaperPaper Cutouts and Footnote on Paper.

The header image is taken from Tim Berners-Lee’s original 1989 paper proposing the creation of what became the world wide web.

 

  1. For better or worse, Microsoft Office provides the de facto standard – which is not to say that it is universally accessible or that a fully open standard would not be a better approach, but is to recognise that on a day to day basis, particularly within government, file formats are not in themselves the main barrier.
  2. TextThing is both a concept and a nascent product. I have been involved in discussions with the team on both  – I am very interested but wholly disinterested.
  3. In theory, transclusion is possible, but the barrier between theory and practice is a substantial one.
  4. To say nothing of the fact that each of the tweets contain six or more links of their own.