How do you, or would you, decide whether a dataset has value for you or for your organisation? What affects how valuable they are, for example timeliness, granularity, format?
I’m always looking for low level geography (eg. LSOA) to allow for detailed analysis in my work. Standardised format is also important otherwise my team spend their time cleansing. I can live with data being a little behind the curve if its available by small area, but ideally we want all three – timely, local and standardised!
Granularity and accuracy are usually most important.
All data is useful. The most important thing for me is to understand it’s provenance, i.e. am I looked at something that has been aggregated or filtered in some way. ‘Raw’ data with a time stamp is the most useful.
I would need to evaluate a sample of the dataset concerned and to assess how the information could be utilised in the context of our product in order to maximise user benefits. The cost of obtaining licenses would need to be weighed against the resale price and likely market value of our product. There is sometimes a relatively high minimum cost for the license and I would like to see a greater emphasis on a sliding scale of charges based on unit sales whereby the license costs could be more easily built in to the sale price. I would also like to see a greater level of co-operation on more of a partnership basis whereby the license provider works hand in hand with the private sector organisation to develop the product.
The most valuable data is that which is up-to-date. This is not to say that older data is irrelevant, but to ensure its relevancy it must be ‘time-stamped’ and in a format which does not require large amounts of time and energy to translate it into a useful form. This is not so much of a problem now as it was 20 or so years ago, but it is this older data which often requires the most work.
In our context the mapping formats used by the Ordnance Survey, Mapinfo and Arc-info are the most easily used and translated, but, as others have said, the Street Address Gazetteers, of various sorts do require time and effort.
Two aspects affect us:
1. Accuracy – how accureate is it, when was it last updated, how complete are the updates, what resolution is the data.
2. Licencing costs (both financial and admin effort) – the UKHO’s web site makes it easy to licence low value data sets for non-navigation use but if they decide that the data is for navigation then we have a 40 page set of licence forms to print off and sign each page in duplicate, and send to them with supporting data, and the licence cost is often under £100 a year – a waste of everyone’s time and money. The chief of the UKHO recently stood up and said that where the turnover is under €100 000/year no licence fees are charged, but this is nonsense
Levels of granularity are important – I usually want fairly detailed facts. Links to further sources of data would be helpful – or to a person to contact. Timeliness depends. of course, on the nature of the dataset and it’s volatility.
Does it solve a problem for people (e.g. would making the FSA register database make it more accessible and more widely used), can it be combined with other data to through a light on that data or the new datasets (consistent IDs is very important here), is a lot of work needed (e.g. does it need to be scraped, or is it downloadable)?
This question assumes data is primarily valuable to organisations: but public data is frequently of great value to communities, informal groups, networks and other bodies which don’t behave strictly like organisations or with organisational decision making processes.
The value of data to many groups is often defined in what the data can do and how it can be used: can this data secure better services, for example. The value of the data is in whether it can bring about the change or not; whether or not it will be used to do that can depend on how easy it is to use – but whether that’s a function of timeliness, granularity or format varies wildly by group and by the dataset in question.
(One rule of thumb though that might be useful in thinking about value of data in democratic/community/social change contexts is to say that the data is most valuable when it is available to citizens at the same time, at the same quality, and with the same supporting meta-data, notes and analysis, that it is available to policy makers.)
I agree, but would go further. In my line of work an LSOA is rather too large and crude a geographic unit
To be a low level geography assisting local decisions and analysis greater granularity than LSAO is necessary. In my experience the trade off can be a lack of standardisation.
Since cleansing is a part of understanding how to utilise the data, I’m happy to accept the workload on non-standard data format if I get really local information.
The most obscure of datasets can have immense value to someone.
If someone would like a dataset, I’d suggest they are required to offer some form of document that says what it is and why they want it (maybe 3 questions, max one side of A4); and if that data is to be made available for free, then it is made available.
The important part of this question is in fact data that is not available for free. Where an organisation says that charging X for this dataset will produce Y benefit, they *must* be held to that statement – in that if those benefits are not forthcoming from restricting who can see a dataset via payments, then the data should be opened. There should be a review process after a defined period into every decision which doesn’t result in open access udner the Open Government Data License — say a lightweight process after a year, and a more intense process after 3.
There is value in all kinds of data. Value that isn’t discovered until the data is picked up by someone who realises “hey, I can do this now!”
Instead of asking “is this worth anything to anyone” you should be saying “this *might* be useful to a *lot* of people so it’s worth releasing.”
Accuracy is important to us as we are looking at planning and outturn type data. It is key that we have a timestamp, that the data is detailed and a timetable is release for the updating of the dataset. As we are looking at time series data is is important that multiple years are available
Data formats : GML, ODF, preferably open standard data formats
A valuable dataset would have some accompanying information describing how it was constructed (in the case of derived datasets, eg where climate data is interpolated to finer resolution). The granularity of the data has to be high for tactical applications (e.g. what tree species to plant at a specific location), lower quality will suffice for broader strategic landscape questions
From a commercial point of view the information has to be authoritative, maintained and accessible. We are more likely to be able to add value if the data is ‘actionable’ in some sense as transport data affecting your commute is.
As for most things, source and completeness ar of critical importance.