The National Archives Labs will be closing shortly. You can still access archived versions of the content at the UK Government Web Archive. The Valuation Office map finder will still be available.

Please contact Webmaster if you have any queries.

The National Archives Labs

Maps and geo-referencing

This week, I am going to talk about the challenges we are experiencing with map-based applications, and also say something about the work we’ve been doing on identifying places in our records – a process called geo-referencing

Mapping

We know that some our early efforts have not been as usable as we would hope. Obviously, the issues with IE8 are disappointing, to us as well as to our users, and we are working on solutions and alternatives. While we can’t make everything on our beta site meet our website standards of usability and accessibility, we want as many people to try out these applications as possible and to know what they think. That’s why we’ve recommended that users experiencing problems access applications in IE8 compatibility mode, rather than us take things offline.

Please let us know if we could have communicated this more effectively. Are we answering your questions? And are you seeing questions that other users have asked and been answered further up the comment thread?

This is The National Archives’ first interactive commenting feature and is a learning process for us. Tell us what you want from it.

Geo-referencing

Our records contain millions of references to places in the UK and overseas. We want to make it easy to find these and, ideally, to plot them on a map so that you can, for example, look at a map of England, see which places are mentioned in Domesday Book, and then click through to the record.

While it is easy for anyone reading a document to recognise that ‘house in Willesden’ or ‘fields round Abergavenny’ relate to places, it is much harder for computers to figure this out. We first need to be able to identify place names in records and then we need to look them up in a gazetteer to identify their precise locations. The process of identifying place names is called geo-parsing and the process of identifying their locations is called geo-coding.

This is not as easy as it seems, because:

  • Geo-parsing systems often rely on contextual information such as the word ‘near’ before the location or ‘city’ after the location, to identify a possible location. However our digital archival data sometimes contains very little contextual information.
  • When it comes to geo-coding, the big problem is that there are many places with the same name (there are 14 Newports in the UK) and in many cases words are used both as place names and personal names (for example the town Wellington and the Duke of Wellington). We are using a number of techniques which allow us to use contextual and historical information to determine whether a word is a personal name or a place name and, if it is a place name, which precise place is being referred to.
  • Often, place names used to be spelled differently. For example, the modern place of Loughton, Essex was spelled Lochintuna in Domesday Book, a name that naturally does not appear in modern gazetteers. However, using historical and archival information about place names, we are able to learn spelling patterns which enable us to identify the modern spelling of old place names.

The first applications that build upon our automated geo-referencing will become available here soon – we are planning to launch a map-based search of Domesday Book. As with any large-scale automated process of this kind, there will be a number of mis-identified and mis-located places. However, we are developing a series of methods for identifying such errors and allowing users to correct them.

I will be talking about other site features and new releases in later blog posts. If there is anything you want to know more about, do let me know in the comments section.

Director of Technology and Chief Information Officer – David Thomas

As a senior archivist and records specialist at The National Archives, David’s career has focused on developing access to archives and information in both government and the archive sector.

David is responsible for information technology services at The National Archives, and is leading on the major cross-government project to develop a shared service for preserving digital records.

Comments (18)

  • betty judge

    I can understand the difficulty with so many names the same, not only in UK but Canada, USA and Australia. One must always use the County and Country.

  • victor markham

    looking forward to this development. Will it include place names of places that have been lost to the sea like those on the Holderness coast

  • Tony in Devon

    Fascinating and clear. Thanks.
    1. I read Royal Mail will drop County names – a problem for the future.
    2. Even modern postcodes change occasionally.
    3. In some contexts, narrowing a search by looking for initial capitals may help identify names!
    4. A place name would not normally have individual letters (first names) before it – unless it’s S for south – just one more thought.
    5. In our immediate neighbourhood some houses (cottages) have switched names a generation or so back, and a local farm’s name was attached to a mile-away neighbour on a C19th map.
    6. I hope the project avoid Wikipedia’s geo-referencing profusion – for me at least – as an accasional browser profusion of systems means confusion!

    Best wishes for a successful (lifetime’s!) project. – Tony

  • David Thomas

    Thanks for your comments. Yes we are certainly going to continue to use counties. The Association of British Counties has some great information on historic counties. See http://www.abcounties.co.uk/counties/map.htm

    We are also thinking about how to deal with places which have fallen into the sea or been swallowed up by reservoirs or runways. Once we start on global mapping we will need to think about all those place like Wellington in Somerset and New Zealand and Boston in Lincolnshire and Massachusetts.

  • david swinscoe

    You mention a “map based search of Domesday” When?

  • Nick Bradbury

    I would like to see a user freindly system where I can type in a name of a Lane in a certain area at a certain date. For instance, I have ancesters that came from Meriden and Kersley at the turn of the Centry, But the M6 runs though that area and I am sure that the Lane they lived in is lost or has changed names. To be able to zoom in on a map of the time and print it would be very useful.

  • Dominic Johnson

    The labs website at which I have only just looked info okay but the picture at the top changing from a set of pictures to maps without the reaser doing anything is very distractive. e.g. I suffer from migraines and the constant changing from one to another even when trying to read elsewhere has now made me feel really quite ill.

  • David Thomas

    In reply to David – we hope to launch the map-based search of Domeday Book by the end of September.
    In reply to Nick _ I agree. For now, I guess the best thing you can do is to look at old Ordnance Survey maps – these should be available locally.
    In reply to Dominic – we’ll look at this and see how we can fix it.

    David

  • Miles Davenport

    Excellent website. I completely understand the issues of relating data to geographical location. Some locations may have alternative “regional/local” names.

    Associating this type of information with geographical detail, so it remains meaningful, and relevant is a real challenge; especially with the demands of mobile applications, and a “data connection” which may be slow or changeable.

    The approach of the European Library (using the SOLR/Lucene idnex) is interesting as they search a number of regional (libraries) for any query, and are “quite honest” if zero results are returned for a specific library.

    This approach could be used to query different types of geo-referencing.

    (for example http://search.theeuropeanlibrary.org/portal/en/search/%28%22london%22%29.query)

    Please offer the user an option to keep a static front page “header”, or retain the “changing version”. I like the rotating images.

    Excellent :O)

  • Archie Courtney-Wildman

    I have been “adding in” digitised photo’s of areas in which my family tree persons have lived. Using Google I have been able to show areas in other countries where family members have lived.
    Your first efforts, I have only looked at Penzance and Newlyn (Penzance), and the only comment is that a date (year) on the photographs would be a help and save trying to guess what era one is in from the clothes worn!
    Good luck with your further efforts.

  • Chris Willis

    David and Nick

    the Genuki Gazetteer will find most places. If it leads to a gazetteer page (the lower of the two options in the bubble) it gives a choice of maps – including old-maps.

    Not sure if you can print these maps, but you can certainly look at them. Maybe too early for things affected by the M6.

    Chris W

  • Thembinkosi Lehloesa

    I really like the idea, its great. I’m however in South Africa and can therefore not really make use of the lab. But it is fantastic. I only wish you could rub the same ideas to the archive here in South Africa. Your main wesbite is truly excellent too: I think a first in the world.

  • David Thomas

    Archie – sadly we don’t have dates for the Dixon Scott photographs – you can sometimes make approximate guesses.

    Thembinkosi – thanks for your comments. We do plan to do some work on maps relating to other countries later, so we may eventually have some material for South Africa.

  • Dick Lane

    Digitalisation is a great idea but one does struggle at times to get to where the information wull be.
    How about an ‘on line’ tutorial’ for the less expert ( like me)

  • Chris Willis

    David Thomas

    in reply to Nick Bradbury, The Genuki gazetteer gives access to the oldmaps site amonst many others. So you can get historic 1:2500 (and six inch) maps and zoom in.

    Do you have more detail available on Geoparsing. I am supposed to be producing name. place and occupation indexes for a fairly substantial corpus of Will abstracts. Most (but not all) have standardised place names and christian names. Also zero sentence driven capitalisation.

    Some of the place names in the two batches that have non-standardised place names are quite ingenious.

    I am reasonably sure these all refer to the same place (in Surrey):

    1 her.txt Ebbesham
    1 her.txt Ebesham
    2 her.txt Ebsam
    4 her.txt Ebsham
    2 her.txt Ebshame
    2 spa.txt Ebsam
    2 spa.txt Ebsham

    Chris Willis

    David Thomas reply:

    Thanks Chris. I will get one of my more expert colleagues to reply in detail.
    Best wishes
    David

    The National Archives reply:

    Chris,
    When automatically geo-parsing documents there are basically two approaches. First one can use some set of heuristics to determine which words in a text are likely to be place-names and then retrieve their coordinates from a gazetteer. These heuristics could, for example, be capitalisation (names are frequently capitalised) or a sentence features such as “town of ” which indicate that the next word is a place-name (in “town of bath” the word “bath” most likely refers to the town and not an actual bath). Negative heuristics are another possibility as in “Simon Derby” where the first name before the place-name indicates that the place-name is used as a surname and is thus not to be identified as a place-name.
    If the data to be geoparsed has no distinguishing features of this kind, then the heuristics will obviously not work. In that case the only option is a brute-force approach where every word in the text is checked against the gazetteer. The problem here are places such as “Reading”, which if not capitalised is also a verb. Distinguishing between the two cases is very difficult, if no capitalisation information is available.
    Both approaches have their advantages and drawbacks. In the first case the likelihood of “reading” (the verb) being identified as a place-name is low, however if the lower-case “r” is the result of a typo, then the geoparsing will not pick up on it. In the second case, while all places from the text that are mentioned in the gazetteer will be found, there will also be a higher number of non-place-name words that are identified as place-names. Depending on the desired use, a compromise needs to be found.

    Regarding the non-standardised place-names, you might want to look at Information Retrieval techniques such as SOUNDEX or Levenshtein edit-distance, which are metrics for how similar two words are. These can be used to identify cases where the same word has been spelt slightly differently.

  • Chris Willis

    Thank you for your replies to my earlier enquiry. This leads to:

    Is your gazetteer available in the public domain (for automated use)?

    Chris Willis

Leave a comment




Comment validation by @