Modernise data publishing and reuse

 

Finding public information for reuse

Large scale publishing of public information

Public information distributed across thousands of websites is expensive or time consuming to gather for reuse.  The cost can be so high that little or no reuse occurs.  The Show Us a Better Way competition revealed this to be a problem when people seek information about complex public service choices.  One of the winning entries, School Guru demonstrates the scale of the challenge when choosing a school.  Taskforce members with experience of building large mash ups identified a high search and acquisition cost as a major barrier to innovation in the reuse of data.

Where information is presented in one place it makes it much easier to reuse. The District of Columbia in the USA provides a vivid example of aggregating data for reuse in its data catalogue. Their Chief Technology Officer has pulled together all of the District’s major data sets onto one web page and provided the data for free as a choice of feeds and downloads.  This makes it very easy for people to use information in a way that suits them.  Using modern techniques and storage it is relatively easy and inexpensive for government to aggregate performance and other data as it is produced.  And then make it freely available for re-use in virtual or physical data repositories.

Professor Nigel Shadbolt of the University of Southampton referred the Taskforce to use of data repositories in the academic sector to aggregate resources for research.  The Open Knowledge Foundation held a useful workshop with the Taskforce on finding and re-using information.  The workshop discussed the use of data catalogues which point people to where information can be found, such as the Comprehensive Knowledge Archive Network (CKAN). The workshop demonstrated that finding public sector information is not straightforward and requires a detailed knowledge of how government works.    The OPSI Public Sector Information Unlocking Service, although welcome is only intended to address part of the way to solving this problem.

The challenge of ensuring information is discoverable and remains available over time will be met by a combination of catalogues and physical data repositories. Examples of each already exist across the public sector in the information management strategies of individual organisations. There are initiatives that aim to bring some consistency such as the Information Asset Register overseen by OPSI, part of the National Archives. Further information on information asset registers can be found in a paper produced for the ePSIplus network. However, in spite of these efforts, significant challenges remain for potential re-users, who may not have detailed knowledge of the structures of government, in finding and understanding relevant and useful information sources.

The Taskforce recommends that the government build on this existing work by establishing a public sector information repository and catalogue function based around the Office of Public Sector Information, part of the National Archives.  OPSI has the expertise in modern information publishing and, as an offshoot of National Archives, can take a long term view of custodianship. We understand that officials in OPSI have already sketched out the architecture to deliver such a service at minimal expense.

The Taskforce is pleased that the pre budget report contains a commitment from Communities and Local Government (CLG) to move forward in publishing its performance data obtained for the Comprehensive Performance Assessment (CPA).   If this performance data were to be published in a well structured way, it should be possible to produce a map of public services to help inform people’s choices.

Recommendation 14

The Government should ensure that public information data sets are easy to find and use.  The government should create a place or places online where public information can be stored and maintained (a ‘repository‘) or its location and characteristics listed (an online catalogue).  Prototypes should be running in 2009.