Skip to content

Samples of Anonymised Records

Introduction

The Samples of Anonymised Records consist of extracts from Census records which are designed to enable researchers to carry out detailed analyses using 2001 Census data for individuals or households.

The SARs produced after the 1991 Census provided a valuable research dataset. The Census Offices have produced an individual file and a household file from the 2001 Census. There is a legal obligation to protect the confidentiality of the individual information that is released in the SARs and to ensure that the data that are released are safe from disclosure risk.

A large amount of analysis has been undertaken to assess the risk of disclosure. To meet their legal requirements the Census Offices have judged that some reductions in detail for certain highly visible or disclosive variables were required in comparison to the 1991 SARs. Consultations were held with the research community in 2002 to assess which bandings of variables were most acceptable to users.

 

Contents

The SARs extracted from the 2001 Census consists of three products derived from three separate extracts (No case in one extract appears in either of the others):

  1. The Individual Licensed SAR consisting of around 3 per cent of person records, relating to some 1.84 million people in all. For each person it contains the main demographic, health and socio-economic variables and derived variables such as social class; household information; data on the sex, economic position and social class of the individual's family head; and limited information about other members of the individual's household (e.g. number of pensioners), area identification at GOR level in England, and for the countries of Wales, Scotland and Northern Ireland.

  2. A Special Licence Household SAR (SL-HSAR) consisting of a 1 per cent hierarchical sample of households and individuals in those households. It contains information for some 245,000 households and covers England and Wales only. The information is given for each individual in households of size up to and including 11 persons.

  3. A 5 per cent sample of Small Area Microdata (SAM) - a new product for 2001 - containing 2.9 million individual records with Local Authority level identified. The variables included are similar to those in the individual SAR, though broader banding has been used to preserve individuals confidentiality.

Full details of the content of the 2001 SARs and how to access them can be found on the CCSR website using the link avove.

 

Availability

The Individual Licensed SAR is now available from CCSR (a charge may apply). To register for access and for further information on guidance and training, please go to the CCSR website and follow the link for access and registration.

The SL-HSAR is now available from the UK Data Archive. Researchers wanting access to the data should follow the link for requesting a download of the data on the Special Licence and complete the Special Licence application form, which will then be assessed by both the UKDA and ONS. You will be notified of the outcome as soon as possible. Users should also note the 'Guide to Good Practice: microdata handling and security' contained in the link above and agree to abide by its requirements.

When applying for access to the SL-HSAR users must already be registered with the Economic and Social Data Service (ESDS) or the Census Registration System (CRS) and have an Athens ID number. Users can apply for ESDS registration via the UK Data Archive website.

The SAM is now available from CCSR (a charge may apply). To register for access and for further information on guidance and training, please go to the CCSR website and follow the link for access and registration.

 

Protecting Confidentiality

The Census Offices have a clear, well published, protocol for protecting the confidentiality of individual information:

...In releasing statistics from the Census, all possible steps will be taken to prevent the inadvertent disclosure of information about identifiable individuals and households.

The Registrars General also have a legal obligation not to reveal information collected in confidence in the Census about individual people and households, and have given public assurances about what this means in practice. In presenting very detailed results from the Census, protecting individual information is of key importance. Traditionally the confidentiality of Census output is protected by a combination of disclosure control methods.

As well as the legal aspect of disclosure control ONS has also stated in the 2001 Census Disclosure Control advisory group paper AG0106 that:

"Maintaining the confidentiality of individual data underpins the trust that exists between data suppliers and any agency that acts as custodian of information about them. At ONS we are fortunate that businesses and the public have confidence that their information is securely held and that we do not release any data that could identify an individual. It is essential that this trust be maintained......".

Protecting the confidentiality of details about individual people becomes more difficult with each Census, as the amount of accessible and publicly available information about individuals increases. More information can now be matched statistically with the Census. Alongside this, for the 2001 Census a larger range of small area statistics has been released, notably because some key measures which were previously obtained from 10 per cent samples were available in 2001 for the whole population. A much wider range of small area information is being published through Neighbourhood Statistics, from public records as well as the Census.

Since 1991 the internet has transformed the potential for making census results widely accessible to citizens. Changing attitudes to the trust in which public agencies are held and concerns about the importance of privacy of personal information also place new and more onerous demands on bodies responsible for protecting such information supplied in confidence.

The general strategy for ensuring the statistical confidentiality of 2001 Census output was stated in the Government's March 1999 White Paper The 2001 Census of Population:

"Precautions will be taken so that published tabulations and abstracts of statistical data do not reveal any information about identifiable individuals or households. Special precautions may apply particularly to statistical output for small areas. Measures to ensure disclosure control will include some, or all, of the following procedures:

  • restricting the number of output categories into which a variable may be classified, such as aggregated age groups;

  • where the number of people or households in an area falls below a minimum threshold, the statistical output - except for basic headcounts - will be amalgamated with that for a sufficiently large enough neighbouring area; and/or

  • modifying the data before the statistics are released."

These considerations have led ONS to reassess how much detail could be released from the 2001 Census. Additional measures have been introduced for tabular output and some restrictions in detail have been applied to the SARs.

Disclosure Risk Assessment

The Economic and Social Research Council, through the Cathie Marsh Centre for Census and Survey Research (CCSR), made a request for 2001 SARs. They also asked ONS to consider the following enhancements to the 1991 SARs specification:

  • reduce the threshold for the Individual SAR from 120,000 to 90,000 population

  • increase the sample size for the Individual SAR from 2% to 3%

  • changes in detail given to some of the variables for example ethnic group, family type and professional qualifications to reflect changes in the information collected in 2001

  • add extra variables (to reflect the new questions asked in the 2001 Census)

These proposals are based on the paper by Dale & Elliot; 'Proposals for the 2001 SARs: an assessment of disclosure risk'. This paper assessed the risk of disclosure from the SARs and concluded that the risk was very low. It suggested that the 1991 assessment of risk was pessimistic and there was scope for a decrease in the threshold and an increase in the sample size of the individual SAR.

ONS carried out further analysis to assess the risk. In particular, ONS recognised that a risk assessment for the country as a whole would not necessarily allow it to meet the commitments it has made to every individual who completed a Census form. In particular, some individuals are more easily recognisable in the population than others. The Census Offices have a responsibility to protect everyone's information, not just the majority.

ONS also considered how an attempt could be made to identify an individual. It considered what additional information and data would be available to users of the SARs (regardless of whether it was in the public domain) and whether this information could be used to identify an individual in the SARs.

The main elements of the analysis were:

  • an analysis to determine whether or not a variable should be collapsed, similar to the analysis carried out in 1991. See The 1991 Census User's Guide, Chapter 5.4.4

  • an analysis of the number and proportion of unique individuals in the sample who are also unique in the population. This looked at the total population as well as groups within it.

  • an assessment of the risk that an individual within the SARs can be identified by matching the SARs against an external dataset.

This analysis showed that grouping of age, ethnic group and occupation substantially reduced the risk of identifying an individual from the sample. It also showed that the sample size could be increased from 2% to 3%.

ONS also looked at the risk of identifying individuals by matching databases against other sources and whether or not some of the variables may be able to help in confirming the identity of individuals. Variables such as the area classification, communal establishment type and family type were all found to increase the risk significantly by substantially narrowing down the location of an individual or groups of individuals in the population. These variables would either need to be excluded from the SARs or grouped into fewer bands.

A small number of uniques remained in the SARs sample once these checks were completed. In order to further reduce the risk of identification of an individual ONS carried out perturbation of the risky records using the PRAM technique (post-randomisation method). This consisted of changes to certain values in these records, applied by means of record swapping or imputation.

Microdata Laboratory

ONS recognises that recoding of variables will have an impact on the extent of analysis that can be carried out using the 2001 SARs and have made both the individual and household SARs files available in much greater detail. These are, known as the Controlled Access Microdata Sample(s) (CAMS) and are accessible in safe settings in all ONS sites, for approved research projects. Applications for access to these files are assessed by the Census Research Access Board (CRAB). It is hoped that this access will be extended to sites in Edinburgh and Belfast. Once CRAB has approved the application any outputs from analyses carried out on the CAMS will be checked for disclosiveness before they can be removed from the safe setting.

References

The 1991 Census User's Guide, Edited by Dale & Marsh, HMSO
Dale, A. and Elliot, M. J. (2001) Proposals for the 2001 SARs: an assessment of disclosure risk Journal of the Royal Statistical Society, Series A; 164(3), pp 1-21
The 2001 Census of Population (Cm 4253)

Content from the Office for National Statistics.
© Crown Copyright applies unless otherwise stated.