Modernise data publishing and reuse

 

Modernising information publishing

“In the twenty-first century, information is the force powering our democracy and our economy. Both the private and the public sector increasingly rely on information and knowledge, and create value through their ability to manage these valuable assets. Successful societies and economies in the future will depend on how well they enable information to be appropriately shared”

Sir Gus O’Donnell Cabinet Secretary in ‘Information matters: building government’s capability in managing knowledge and information’

The public sector produces very large quantities of information for which the web has become a critical distribution channel.  Websites have changed a great deal in recent years.  Successful sites have become data systems that deliver a service to the customer in many different places by allowing reuse of information.  The government’s use of the web is about more than the application of a set of communication tools such as blogs and wikis. The web has an architecture based on resources and links. This enables it to be a highly effective platform for data.  Some of the most successful online tools work well because they are designed and engineered in keeping with this architecture of the web. Examples include the photo sharing website, Flickr, and the social networking service, Twitter.  These services separate data from presentation and provide separate APIs. These APIs make the service more useful and help drive traffic to the site.

Generalising this, a person may be looking at a company’s product information on the company’s own website or seeing it embedded in a widget in someone else’s site or blog. For example a person might have a community website containing feeds of information from say the BBC for traffic reports for that area or a widget from a bookstore offering books relevant to that area or a feed of planning applications from their local authority. The information from the bookstore or the BBC or the local authority would be the same if you went to their own sites, it is being re-presented automatically in a third party location.  More people will see the information if it is on more sites.

The government web estate needs to move far closer to conforming with “The Architecture of the World Wide Web” (2004 ) or Tom Coates nine point plan in “native to a web of data” (2006).  The world has moved from a controlled world, with a relatively small number of publishers selecting who and what gets published, to a world of massively democratised and decentralised publishing on the web. Web 2.0 tools such as blogs, wikis and twitter are tools at the far end of this trend. Anyone can say anything about anything, at relatively little or no cost.

These developments have led to different information structures for websites that provide and receive information. The Office for National Statistics is consulting on the use of a new model for access to the 2011 census data involving an interface to allow reusers to get at the underlying de-personalised data, rather than have to go through the ONS own top-level website (see consultation here).  Their Chief Technology Officer reports:

ONS is developing a data explorer that will itself be founded on an API which I hope will be published. It will be capable of operating across all ONS outputs, and so is not limited to our plans for the next Census (we hope to have it out there, and through a few releases before we reach Census outputs)

Such new structures enable easy reuse of information by third parties.  The Taskforce discussed on its blog a new information model for public sector websites to design in reuse of information.

Designing in reuse

This issue is discussed in detail on the Taskforce blog.

Diagram 1 – the ‘traditional approach’

The 'traditional' architecture

The emphasis of much web development to date has been on the presentation of the data to the public.

The assumption was that a particular website would be the unique interface to a particular set of data.

This meant that little or no thought might have been given to how anyone else would use the data set in question.

Sometimes the data and any analysis of it could be unpicked from such a site but in many instances this would be extremely difficult.

Diagram 2 – a Power of Information model

A Power of Information Architecture

Thinking has moved on over recent years with a developing understanding of the importance of separating data from its presentation. If nothing else, this allows for simpler changes to the presentation layer as, for example, websites are redesigned.

PRESENTATION LAYER – the public-facing front end, typically a set of web pages

ACCESS LAYER - all the information needed to access the data, including technical, legal and commercial aspects

ANALYSIS LAYER – any form of interpretation of the raw data, typically for summary presentation

ACCESS LAYER – all the information needed to access the data, including technical, legal and commercial aspects

DATA LAYER – the raw data sets

The Taskforce judges that to realise the power of much public information a different approach is needed to the way public data sets are treated when published on the web. There is a need for several access layers to the data. These layers must address all the issues that are necessary to enable use of the data. These typically include technical issues such as file formats, intellectual property issues such as copyright, and commercial issues such as pricing where applicable. The access layer is discussed in more detail here.  Access to data allows many other actors to create their own analyses of it.  A further Access Layer could allow reuse of the output of the analysis activity. This must again address any technical, intellectual property and commercial issues.  With the Access Layers in place there is scope for multiple web presentations of the data. Additional value can be generated through the ability to interact with a community around the data.

The full realisation of the power of the information is realised when all layers are in place with the architecture designed to offer opportunities for interaction.

Recommendation 13

As the internet changes, so should the way information is published.  The taskforce has developed with stakeholders a model to inform online publishing. This breaks out information into several layers with external interfaces at each layer, allowing re-use both of the raw data and the intervening software interfaces.  OPSI should develop and further test the model and publish it with a delivery mechanism, implementation plan and explanatory material by end June 2009. It should become the standard to which new systems, or re-implemented versions of existing systems, are implemented from a date determined by the CIO Council.