The UK Government Web Archive Search Facility is a beta release, and forms part of ongoing work to promote the contents of the web archive and make the collection more accessible to a wide range of users.
The advanced search page provides greater control over the results returned.
For most searches, the results will be drawn from several domains or websites. The initial result set returned from a query is, however, limited to two results per domain or website (except when using categories with "Advanced Search"). Figure 1 shows an example extract from the initial results page following a search, from either the "Quick Search" or "Advanced search" page.
Figure 1: Extract from general results page
Each search result contains the following:
If the result set was generated from an Advanced Search, any filters applied (i.e. if the user uses any search option other than "All of these words") are listed at the top of each page of the results. A "Remove" link is available next to each filter. Clicking this link re-runs the search without this filter restriction.
Results pages may also contain a "Show hidden results" link. This takes the user to an expanded result set without the two results per domain limit, and this can potentially contain a very large number of hits. Each page in this result set shows an estimated result count and the current position within the results, e.g. Results 1 – 10 of 4500 results.
Figure 2: Extract from domain specific results page
Can I search for a specific URL?
Yes. If you know the full URL, enter this in the Quick Search box, and select the "URLs" option in the drop down menu. If you do not know the full URL, type in as many characters as you know and select "the collection" in the drop down menu. Where the domain consists of two or more words joined together, you will normally obtain better results by entering the joined term, e.g. "britishmuseum" rather than "British Museum."
Does the search include hidden page content?
Yes, the search covers all page content that can be read and indexed by the search engine. This includes content html <meta> tags, but not image content (unless it is duplicated in readable text, e.g. via an alt tag).
Can I search for results within only one domain or department?
The simplest way to do this is to start with a general search using one or more search terms likely to bring back results for that domain. You can then locate a result from the target domain in the result set, and click the "More results from this domain" link.
Why is the same page listed more than once in the results?
This can occur because the archive often holds multiple versions of web site pages harvested at different dates. In the initial summary results, only two hits from each domain are presented. Normally, these will be two different pages but in some cases they can be two dated instances of the same page. If you click the "View hidden results" link, you may see more instances of the same page listed.
You can use the "View All Versions" link next to each result to view a listing of all versions of that page captured in the web archive. You can then navigate from there to a specific version.
I can't see an estimate of the total number of results from my search. How many results are there?
Due to the size of the web archive, it is unfortunately not always possible for technical reasons to provide a reliable estimate of the initial result count returned by a search query. If you click the "View hidden results" or "More results from this domain" links, these result sets then include an estimate for the total number of hits.
How comprehensive are these results? I've come across a page that matches the search term but it's not returned in the results.
The initial search results from "Quick Search" (and "Advanced Search" if no category restriction is applied) include a maximum of two hits per domain, and a limited set of results overall. If the page is not listed in the initial result set, it may be returned in the search results by selecting either the "All Results" or "More from Domain" buttons. Furthermore, the current search index does not include pages later than January 2009.
Why isn’t the site I'm looking for higher up the search results, given the importance of the search term to that site?
The search takes in over 1000 different domains with varying numbers of pages, and the search term(s) may occur in many of these. The search has no means of ranking sites according to the relative importance of search terms on a site specific basis.
Why does the summary snippet and /or result contain strange characters?
This can arise occasionally due to the character formats and encoding used on the original web pages at the time of capture.