Which datasets are of most value to you or your organisation? Why?
Once again (see my response to Question 9), I think one has to go back to first principles; the most valuable datasets are those that allow one to link disparate datasets. This is why geography (streets, addresses, postcodes) have been the focus of a) so much investment by private and public bodies; b) has been the focus of efforts to establish profitable monopolies; and c) the cause of so much conflict between the private and public sectors.
The Internet works because there has been a consensus from the beginning on the rules governing domain names and the addressing of nodes of the network. If there had been a monopoly rent to be extracted from the allocation of domain names, we would, today have dozens of competing networks, probably with incompatible protocols and standards.
So, almost by definition, the most valuable datasets produced by the government relate to addressing. They would be a whole lot more valuable to the economy, if they were standardised and universally accessible in the way Internet addressing is. This should be the first task of the Public Data Corporation, as it would immediately contribute to the efficiency of government, with measurable immediate savings, and would standardise one aspect of the way all government datasets are maintained and distributed.
Transport infrastructure related data.
Geographic information – but that’s because we work with transport and utility organisations.
It would cause funding problems for Ordnance Survey but IMO all OS data should be freely available. Then we would be able to have a national infrastructure dataset. The amount of time spent translating and converting between data from different utilities (water, gas, electricity, telecoms etc. is truly colossal.
A freely available national mapping database would allow all organisations to align their infrastructure against a common background. It wouldn’t have to be held centrally, but to know that you could download data from an electricity supply company and know that it would align with the sewer network… well – I can dream.
Geographical data – mostly marine but somne land
I agree with Christopher Roper’s criterion for the most valuable data sets: those that allow one to link disparate datasets.
Registry Trust has a monopoly of Court Records; Companies House has the statutory register of company records including their directors. To create an entity-relationship diagram, the basis for linking data seta and adding to their value, one needs access to the unique identifiers and, particularly, the data used for disambiguation. The GIS community has led the way with INSPIRE but many other domains, like the law and commerce, are far behind.
All public sector information asset register records should either use the definitive classification schemes to describe the information or indicate that the user needs to interpret the data with care.
Among the most widely used public data sets in the UK are the census records which are crucial to millions of family historians. Sorting out the generations of John Smiths who lived at The Cottage on the High Street in 1841 with a spouse and several children, some also called John, illustrates the problem which is being tackled with varying degrees of success by crowd-sourcing and using other sources, like birth, marriage and death records. In particular, deliberate and accidental errors in the public records, both sytematic and careless, require redundant data to resolve. Sometimes these come from photographs or indentures.
The availability of Web 2 and related advanced knowledge technologies will highlight the problems as well as helping to solve them.
Register of Data Controllers: as a company we work in the field of Data Protection. As an individual I must often rely on the Register entry to give me information about a Data Controller – though the entry details required are inadequate if Iwant to really know how pesronal data will be processed.
Ironically, often the most important datasets are those which are currently charged for — geographic data, corporate data, etc. The problem with this is it means that access is restricted to those with the ability to pay, for whom the cost is a useful protective barrier to competition. This means we get rent-seeking business models, rather than innovation, new businesses, and disruptive new models. It also means that the public, and community groups are excluded from access, and particularly from being able to combine these datasets with other ones.
Marine and geographical data. In particular, tidal constituent databases which were at one time open to the public. This data is not currently replaced by any available data set, including ones that are provided for a fee.
Usage – and by extension – value are predicated on what we already know about a specific set of data and how we’ve used that data in the past. Some datasets will become more useful when viewed in addition to others – so developing a good picture at this stage, I think, is unreasonable. After all, one of the primary aims of this whole venture is to stop seeing singular data sets in isolation and to explore how we can create new value through the combination of data.
Rather than expend too much energy on encapsulating a picture of what people want, I’d suggest that we concentrate on learning from what people use – and developing a decent system to do this.
We should also remember that a data set’s value should be very much dependent on its accuracy, but sometimes inaccurate data retains a higher value because it’s used as the ‘gold standard’. The census, for example, has enormous value attached to it (not least by the way the data is used) but it’s a matter of record that the data is often skewed by the way that it has been collected. While the data may actually be less valuable, it retains much of its value because it is still used to develop policy.
All of them. I understand the desire to prioritise but they’re all important because they all empower citizens to hold authorities to account.
If forced I’d have to say any datasets that show where public money is spent are going to be very high on the list.
Income and expenditure ,departmental spend and the ability to track this over time. We have developed an application which allows departments to compare their results. We are hoping to present this to the data owners and departments to allow financial monitoring.
climate, soil, geology, digital elevation models, land use classes, vegetation (nvc)
These facilitate the design of landscape scale spatial decision support systems for planning and natural resource management. Datasets on the presence of mine workings and other hazards invaluable from health and safety perspective..