Data, data everywhere, nor any byte to link

We have always lived in a society dependent on data – the change that has happened over the last decade or so is the sheer scale of the data being generated.  We have probably passed the point where some data will never be looked at by a human being – it all just sits there.  Most of this data has value, it is just that we need to think how to extract that value effectively.  Both this government and the last realised that they hold a lot of data themselves and that it might be valuable.  There have been several initiatives aimed at releasing that value, and they are beginning to deliver, but everyone is increasingly aware of the challenges in releasing this valuable national asset.

Many years ago, I worked for ICI Acrylics.  They made the material that went into spas in the USA.  Spas are a luxury market, and we spent a lot of time making the material in colours that reflected that luxury price point.  One year, I had to go to the big baths and spas show in Las Vegas to see how we might be able to introduce innovative new products into the market.  I met a large number of spa makers and saw some amazing sights in the show, but my real insight came in our hospitality room.  As the dealers came to see us, they all made a point of thanking our sales manager for his help.  It wasn’t just one of them – it was all of them.  At the end of the evening I asked him what his secret was.  He told me that he had bought the US Government data on the distribution of income against zip codes.  Before he visited a dealer, he looked up the most affluent neighbourhood near the dealer and suggested that they focus their sales there.  Needless to say, people with higher incomes were more likely to splash out on a spa (sorry, couldn’t help that one!) and the dealers got more sales.  This made me realise that data only gets translated into useful information if you have value for the output.

It is all very well realising that data has value, but a major learning from the last few years is that most data is stored without a thought to its future use.  It is often in the wrong format, contains errors and the sorts of simple confusions that make is difficult to use, and these can, in turn, spawn errors of interpretation that undermine the value.  Many of these errors are not easily dealt with using machines.  They are often caused by the people inputting the data – wrong addresses, swapped letters or numbers, misspellings and so on – and are most effectively corrected by people too.  Our work with LinkedGov has shown us that identifying people who can see the potential value of the data, or who are looking to develop new data applications, can provide the resource and insight necessary to make sure the data they understand is “clean”.

Once the individual data sets are available, it is down to identifying the insight that linking different data sets can provide, carrying out that linking and extract the value.  That can only come from an understanding of the market – like the need to find people with enough disposable income to be able to buy a spa!   This means that you have to start with a question and seek out the data needed to answer it.  Owning the data and thinking it has intrinsic value is coming at the situation the wrong way around!

One of the challenges people often quote against this opening up of data is that it often ends up being about people – and people are not sure that they want others to know about their habits or assets.  This certainly seems to be true when it’s a theoretical question.  However, it is actually more often the case that the opening up of data makes peoples lives easier.  How many of those suggestions Tesco or Amazon make about what you might be interested in actually lead to you discovering a new food, a new author or a new band you might not have otherwise discovered?  Anonymisation of data will be necessary to protect some aspects of our lives, but there might well be knowledge about other aspects we would gladly trade for convenience.

The Open Data Institute, announced as part of the Autumn Statement the other week, is an important step.  The academic understanding of linking data and the development of languages and methodologies to carry out that process is strong in the United Kingdom and the Institute will draw heavily on this expertise.  But it is to be located in Shoreditch, where possibly the highest concentration of young, ambitious and market focused companies operating in this space are to be found.  It will make that expertise available to companies with the insight into what has the most value and hep them develop the science and technology into products and services.

Data is a resource like any other – it has value when it answers a need.  We are rich in it, so we ought to focus on making sure it is able to answer the questions we ask of it.


Last updated on Thursday 01 March 2012 at 10:54

  • Recent Comments

    Post your comment
  • Lee Murphy|06/08/12 at 9:23 PM

    Historical data is historical data has historical reasons for being stored the way it was. That can be cleaned and incorporated. When we walk forward on a path it should be with experience of where we have been and the aspiration of where we want to go. Simply we've got waht we've got, now, how are we going to create an easier route to market in future for this type of data. One word, enumeration.

    June Beddows|02/04/12 at 11:54 AM

    The view and perspective on this aspect has two angles really. The Suppliers need to be ready and be ahead of the game with tools to meet this emerging need, but as one whose job it is to help others utilise all this new technology effectively I feel we need to do more about ensuring people have got the basics right first. I suspect that is part of what Steve is thinking too. The majority of people and places are still trying to get control of the old, small scale business processes and technology.
    Lets have more innovation to make this aspect happen more effectively too.

    David Bott|14/12/11 at 9:04 PM


    I didn't mean to appear relaxed about personal data, I'm just aware that a blanket "it's bad" approach is as unreasonable as a universal "make it available" one.


    Steve Mallon|13/12/11 at 2:08 PM

    Much of the debate on the data explosion seems to take an industrial view of processing: efficient mechanistic crunching of vast volumes of data, albeit with quite clever algorithms at times. As more data derives from M2M activities, this must be at least a partially valid perspective. At the same time though, as you note, much data is not "clean", and even when this is not the case, useful inferences require "intelligent" (rather than mechanistic) reasoning. Most examples of "smart" that I read about appear at best "insightful". Perhaps the real revolution only comes when we start to see real machine reasoning being applied to these large messy datasets?

    I think you make be too relaxed about the risks of personal data aggregation: the "utility for privacy trade" may seem reasonable, but prophets of doom aside, don't we need to be careful to avoid loosing some of the equality of information in the market which the internet has done so much to advance to date?

    Michael Kenward|13/12/11 at 9:37 AM

    ...and IBM is building a whole new business on "analytics", divining meaning in that flood of data.

    IBM Press room - 2009-12-01 IBM Launches London Analytics Solution Centre - United Kingdom

    (Recognise the guy on the right.)

    They must expect to make a load of money out of it when they tout the whole thing under the "solutions" banner.

Leave a Comment
*Required fields
Please enter the contents of the verification image. This is to help us prevent automated ‘spam’ comments.
Post comment
Copyright © 2013