This snapshot taken on 02/04/2007, shows web content selected for preservation by The National Archives. External links, forms and search boxes may not work in archived websites.

Main navigation

In section navigation

Appendix 3 Role of automated evaluation

1. Overview of automated testing

As outlined in Appendix 2, automated accessibility evaluation is a process whereby:

Automated evaluation is very useful precisely because, being completely mechanical, it can very quickly, and at low cost, provide an assessment of a large number of web pages, providing either comprehensive evaluation of a single (large) site or, as in the current study, allowing a significant sample of pages to be evaluated from each of a large number of different sites.

2. Limitations of automated testing

However, this form of evaluation is also strictly limited by its mechanical nature. It can only detect certain very specific, and relatively narrow, accessibility barriers; accordingly, there are very many potential accessibility barriers that cannot be detected in this way.

To give just one example, consider the issue of images embedded in web pages. To make the page accessible, it is required that every image should be provided with an accompanying ’text alternative’. This text alternative would not be presented for users who can satisfactorily perceive the image; but for users who are blind, or have otherwise impaired vision, the text alternative would be presented instead (perhaps through speech synthesis, or Braille, or magnified text etc.). There is a very specific technical mechanism that is used to associate alternative text with images in a web page (the so-called ‘alt’ attribute of the ‘img’ element). It is very simple for a computer program to automatically examine a web page and check whether each image does have alternative text associated with it. If there is no such text, then this is an accessibility barrier for many users with disability (rated as Priority 1 in WCAG 1.0), and can be reliably reported as such.

However: where alternative text is actually provided, there still remains a very significant question as to whether the particular text is effective or appropriate – does it provide a genuine alternative for users who do not have access to the image? Assessing this requires understanding of the original image and the role it plays in the page, separately understanding the meaning of the alternative text, and then forming a judgement as to whether the two are functional and effective alternatives to each other. This process of perception, understanding and critical comparison is not something which can be programmed into a computer. Rather, the only way of making such a judgement is to rely on a suitably trained human being. Computer tools may be used to assist, and possibly make the evaluation process more efficient, but the work cannot be done without such manual intervention.

This is just one example; there are many other examples of WCAG 1.0 checkpoints requiring human judgement for their proper evaluation.

It follows from this that automated accessibility evaluation is a very useful technique for getting certain types of accessibility evaluation, but it also has significant limitations. In particular, automated evaluation can be both effective and reliable in detecting certain definite barriers or forms of in-accessibility; further it can be a useful aid in detecting certain potential barriers where manual judgement would be required for proper assessment. On its own, however, it can never give a positive judgement of actual accessibility.

3. Inflexibility of automated testing

A further criticism that is sometimes made of automated accessibility evaluation is that it is unreasonably harsh. It is said to be impractical to achieve the 100% correctness that automated tests look for, particularly in the context of a large site, which may be subject to continual revision, expansion and updating. This is more properly a criticism of the WCAG 1.0 conformance criteria which are indeed expressed in somewhat rigid or absolute terms.

In the current study, this was addressed by the introduction of the ‘Marginal Fail’ classification. These are sites that do show failures on one or more of the fully automated tests, so they cannot be regarded as strictly conformant with WCAG 1.0; nonetheless, the failures are not pervasive or comprehensive in the context of the overall size and scope of the site. Of course there is some degree of arbitrariness in choosing particular thresholds on these checks to distinguish between ‘marginal’ and ‘comprehensive’ failure, and there will inevitably be some sites close to the boundary. Further, even limited failures may have a disproportionate effect on the accessibility of any particular site. Nonetheless, in the context of a comparative assessment, involving large numbers of diverse sites, the marginal classification does give a good indication of sites that are already making detectable progress towards more accessible design.

4. Conclusions

Overall, the negative results from the automated evaluation phase of the current study, i.e. the proportion of sites that are showing “failure” on one or more automated checks, are robust and reliable: they do genuinely indicate the presence of accessibility barriers affecting significant numbers of users with disabilities.

Finally, it is worth commenting that the majority of the sites assessed in the study are primarily ‘informational’, i.e. for the most part, they do not deliver complex, transactional, services. The particular automated checks used here do give reasonable coverage of the most common accessibility barriers that arise on such sites. However, as e-government moves towards increasing levels of sophistication in the services offered, these checks will become less satisfactory in their coverage. In particular, where sites require user ‘log-in’, or involve sequences of pages with user input (forms), or rely on executing special, site-specific, software on the user's computer (applets, scripts etc.), then automated evaluation of accessibility becomes progressively more difficult. Accordingly, manual assessment of accessibility will certainly continue to be necessary and will, if anything, need to play a stronger role in overall tracking of the achievement of accessibility objectives.

In section navigation