How easy is it to find out what datasets are held by public sector organisations?
Most of our work has been for Local Authorities and it is they who usually provide the data for the project or contract that we are working on, so in that respect it has not been a problem. However, occasionally, we have been asked to find additional data ourselves. For example, some years ago, a Local Authority wanted to know more about Military bases, depots, used by UK and Foreign (e.g. American) forces within their area. This required contacting the MOD, English Heritage, and others to assemble what information was available. This took some considerable time, and, in the end, was probably incomplete.
Thus, to have a central repository, of all such information, would be an extremely useful resource.
Very difficult – you really need to know what data you want and who holds it. For example in the UKHO their copyright and licencing department have sworn blind that they did not produce a data set that I knew they did, and I knew the person responsible in the organisation for it.
It is almost impossible to find out what datasets are held by the overwhelming majority of public sector information holders (PSIHs).
In the UK, about 200,000 PSIHs are subject to the FOI Act. There is no comprehensive list and many appear and disappear in a year. The ICO told the 400 or so Local Authorities to include their information asset registers in their publication schemes along with their other assets. Recently I searched all 400+ publication schemes for references to their IARs. About a score of them referred to their IARs, most only quoting the IC’s instruction. Only one included an IAR. I suspect that the same hit rate would be found in Health Authorities, emergency services’ parish councils and so on.
Part of the reason is that the person responsible for information governance is not trained in information technology as opposed to the pertinent fields of law. In my own local authority, the information officer did not know that the systems designers used the information contained in an information asset register record in a different form. The original procurement documents for information systems contain a list of either or both the information to be processed (in the request for proposal) or how the information will be processed (in the technical solution). The information asset register record can be deduced from the sales literature of the supplier of a packaged solution. It can also be deduced from the IAR of a similar organisation. There are very few unique applications in the wider public sector.
Much PSI is held on desk top computers running office applications. Where are these are provided by Microsoft, for example, there is metadata associated with the application if the office procedures ensure that it is collected. Otherwise, it is straightforward to created automatically.
OPSI publishes in Inforoute some of central government’s IARs.
it is difficult. The FOIA does not oblige the public sector organisations to provide in their publication schemes links to available datasets – they only have to describe what they offer in very broad terms. Although most organisations do provide these links my local council, for example, has chosen not to.
Still very difficult. many, many datasets not listed on data.gov.uk, many core datasets (e.g. FSA register) not under Open Government Licence, let alone downloadable in easily reusable form
Whilst data.gov.uk makes it easier to find data which is now open in some form (although the meta-data in data catalogues is poor, and even when good, tends to assume a potential user is familiar with government terminology), it remains tricky to find out about data which might be held, but has not yet been made public.
Listing data held, even when not yet openly available, would be extremely useful. The OPSI Information Asset Register/Inforoute site which apparently did this to some extent appears no longer to be active/available – and it wasn’t clear from looking at departmental sites the extent to which IARs are being kept updated.
Very hard in some cases. Many datasets on data.gov.uk look promising but then proceed to drag you through a number of hoops which you’re not certain are what you want anyway and then end up in a spreadsheet or PDF that is nearly impossible to understand.
data.gov.uk is a brilliant first step it is only that. More work should be done to liberate central government data. Additionally a true data catalogue should be developed making it easier to find what is available and allow visibility of the available data and an explanation if data is removed.
While it is often possible to locate a dataset, occasionally the extent of the data is not clear (e.g. attributes which may have been omitted for confidentiality or to avoid introducing business complexity). Summary datasets are useful, but generally their application is fairly limited to broad statistics and only partly answer some questions.
Generally difficult. In London, it is now much easier due to the work of the London DataStore discovering and unlocking data. This is a function that needs rolling out across government, and should be a core function of the proposed PDC. The data.gov.uk team have also done sterling work on this for central government departments.
It is fairly difficult as it largely relies on you knowing what you are looking for and where to get it from. It very much prevents “accidental dscovery” of dataset through browsing a central data portal.