This snapshot, taken on
02/07/2014
, shows web content acquired for preservation by The National Archives. External links, forms and search may not work in archived websites and contact details are likely to be out of date.
 
 
The UK Government Web Archive does not use cookies but some may be left in your browser from archived websites.
For collecting usage data about publications, two basic approaches are possible. One is based on weblogfile analysis and one based on linkresolver logs. Client-side approaches like web bugs or pixel tags common in web page statistics are not sufficient for distributed publication networks where documents may exist in different versions, formats and multiple single files. Logs from repositories or journal sites, as well as from linkresolvers, can be made accessible by standard technologies using the OAI protocol for metadata harvesting and OpenURL ContextObjects. The basic architecture has been proposed by Bollen and Van de Sompel and can be expanded to include data from publisher sites available in a different XML-form called SUSHI. At the moment, however, statistical data from publisher sites conformant to the COUNTER initiative is only available at the journal title level and not at the article level. Therefore aggregating data from these sources results in a much coarser granularity. Data aggregated from different sources has to be normalized, automated accesses from robots have to be tagged and duplicates have to be removed. This last point refers to publications and can be done based on persistent identifiers (like DOIs or URNs). In a broader context, the removal of duplicates will also have to rely on metadata-based heuristics. This means that duplicates are detected based on the comparison of ISSN-numbers, article titles or publication year or parts or combinations of these fields. The aim of this project is to make progress towards a position wherein item-level usage statistics are comparable across a range of sources.

Usage Statistics Review

ReportFinal report

For collecting usage data about publications, two basic approaches are possible. One is based on weblogfile analysis and one based on linkresolver logs. Client-side approaches like web bugs or pixel tags common in web page statistics are not sufficient for distributed publication networks where documents may exist in different versions, formats and multiple single files. Logs from repositories or journal sites, as well as from linkresolvers, can be made accessible by standard technologies using the OAI protocol for metadata harvesting and OpenURL ContextObjects. The basic architecture has been proposed by Bollen and Van de Sompel and can be expanded to include data from publisher sites available in a different XML-form called SUSHI. At the moment, however, statistical data from publisher sites conformant to the COUNTER initiative is only available at the journal title level and not at the article level. Therefore aggregating data from these sources results in a much coarser granularity.

Data aggregated from different sources has to be normalized, automated accesses from robots have to be tagged and duplicates have to be removed. This last point refers to publications and can be done based on persistent identifiers (like DOIs or URNs). In a broader context, the removal of duplicates will also have to rely on metadata-based heuristics. This means that duplicates are detected based on the comparison of ISSN-numbers, article titles or publication year or parts or combinations of these fields.

The aim of this project is to make progress towards a position wherein item-level usage statistics are comparable across a range of sources.

Project Staff

  • Ms Christine Merk, University of Constance
Project Director
Summary
Start date
14 April 2008
End date
31 August 2008
Funding programme
Digital Repositories programme 2007-8
Topic