What gets in the way of you or your organisation accessing datasets or data products?
The Greek Philosopher Thales said that “place is the most important of all things since it contains all other things” this is as true now as it was over two and a half thousand years ago. Every event, every transaction and every piece of data is connected to at least one place usually to many.
In the journey to create information from data and knowledge from information there are two ever present entities, place and time, if these are not standardised we are at risk of what Umberto Eco defines as abherent decoding, essentially I use a word meaning one thing and the receiving party interprets it differently. Whilst this may be amusing in a French farce, it is not so funny in real life, ask DCLG who gave £2.5m+ to the wrong Newcastle, Staffordshire instead of Tyneside, who refused to give it back.
The issue of standardising time has largely been resolved by the use of the 24 hour clock, albeit use of the American date format may cause issues, the Government is in the process of standardising place by creating a National Address Gazetteer (NAG) which is good news for Government entities who will have access to this data as part of National agreements.
However commercial and societal organisations and the citizen will not have the same access, it is always possible to buy this data currently three data sets OS’s address layer, Local Governments NLPG and the Royal Mail PAF. However there are two barriers to doing this the first is the price which will block all except the deepest pockets, the second is the fact that all these products, as will the NAG, include the PAF and so their contracts for use include the following clause:
6.3 The End-User shall not at any time copy, reproduce, publish, sell, let, lend, extract or otherwise part with possession of the whole or any part of the Data or relay or disseminate the same to any other party, except as is expressly permitted by the terms of this Agreement. The End-User may make a reasonable number of back-up copies of the Data for security purposes. The End-User may only use such archived back-up copies of the Data for archive retention and retrieval purposes.
This effectively precludes the use of any of these data sets to enhance internal data, that is then published or shared it also prevents the use of the file in linked data activities on the web. A client of mine, a major consultancy (£10bn+ per year turnover) refused to sign this contract and instead bought the PAF data from a VAR at additional cost to have them carry the risk involved in this contract.
So Government will universally using a spatial data set that will not be widely available outside of government, this will make it difficult to analyse much of the data available via the Public Data Corporation initiative in fact whole areas of analysis and discovery will be blocked.
It seems invidious that a single organisation, Royal Mail, can claim intellectual property rights over every address in the UK and do so in a way that reduces benefits for the wider community.
Finally much of the “big society initiative” depends on the identification and mobilisation of groups of citizens; to do this efficiently files of addresses are needed. This is recognised in the electoral process where the electoral roll is made available to each candidate’s organisation why can a similar need, without names be recognised for societal organisations of all types and the Geoplace data set be provided free without encumbering contracts.
Quoting Robert James below:
“It seems invidious that a single organisation, Royal Mail, can claim intellectual property rights over every address in the UK and do so in a way that reduces benefits for the wider community.”
Yes – it really is incredible isn’t it?
Complicated or unclear licence terms are a major barrier.
Public sector organisations often have a miriad of different datasets and it becomes extremely difficult to assess which is best for a particular purpose. Sample datasets that could be downloaded for evaluation purposes would be an ideal solution.
First there is the issue of finding how owns what data, already discussed. Often I find an organisation will admit to owning data if you ask a very precise question, which precludes any initial broad enquiries – presumably because those dealing with the enquiries do not know anything about the data, and so just look things up in a computer and if there is no match they respond with “The computer says no”.
Also appropriate licencing – with the UKHO on the oine hand they are trying to generate revenue from their own product range, and on the other they want to generate revenue from licencing out data (other responsibilities such as making data freely available for safe navigation seem to have been forgotten these days) so often licences are excessively complex and restrictive to preserve their commercial interests. This also results in them refusing to licence out data that they use in their own commercial products.
1. availability eg my local council doesn’t publish tree preservation orders – why not?
2. knowing what to look for – I rely on search engine results a lot of the time.
3. public sector organisations make minimum use of their FOI publication scheme – one would almost think they reluctant to publish.
2) Other legal issues, such as IP by third parties (e.g. contract data), defamation (e.g. what privilege to republish reports from e.g. Standards Body for England)
3) Bad use of Data Protection issues
Using Data Protection – wrongly – as an excuse to avoid providing the data (DP is important and can be a valid reasond not to make data avaialbe but government hides behind it far too often)
Differential licencing or access permissions. I often find that as a commefrcial company I am refused access to a data set where an academic is permitted to do exactly what I wished to do. Particularly galling if the academic then carries out the analysis and passes it on to a competitor – or misuses their ‘academic’ access to engage in commercial activities.
Making data available to the public but then refusing to permit business to licence the same freely avaialbe information – leaving the option of scraping the data in breach of copyright or buying the data from some cowboy who has done so.
Differential pricing. One price if you only provide services to a selection of public sector organisations. Much higher price if you dare to have even one client in the private sector.
Cost recovery pricing models that result in sudden changes in price without warning as the number of uisers varies.
The inability to distinguish in licences between (re)using the raw data and combining the data with a number of others to create someting new – and any insistence that the govt agency own this new thing even if govt data is a small fraction of the inputs to it.
Lack of govt resources being allocated to supply people with data that has already been created
Is any work being done to simply the position on derived data licensing?
Agree with all three, and I’d add a fourth: usability of the data. I.e. how easy it is to actually understand. We need more context with the data such as how, why, when it was collected, who for and who paid for it. Data is rarely ideologically or politically neutral so we need to be aware of potential bias.
Intellectual Property Rights, even government departments can’t exchange data effectively as they end up trying to cross charge each other. It is also very complicated and expensive trying to understand licensing constraints and ensure compliance around derived data.
The result is that domains eg forestry, which rely upon the intersection of different environment data (soil/climate etc), become unwieldly from ipr constraints and derivative products become prohibitively expensive to develop.
If base data could be freely available, the costs of managing licensing (across departments) could perhaps be eliminated.
The biggest problem is that government still does not understand that data is 21st century infrastructure, and as such needs to be freely available to enable economic activity and social benefits. Key pieces of infrastructure are still behind paywalls eg addresses, train times, company accounts, court transcripts and until the raw data is freed up the data creators inside government will continue to act like monopolists, which is a brake on innovation.
Generally the lack of knowledge of what the datasets contain, when and how often they are updated and where they are available from.