Main navigation

In section navigation

4.3 Non-HTML file formats

Publishing date: May 2002

Most documents published within on websites are produced in HTML (see section 4.1). This is the primary format used on the web. It is easy to produce and is understood by all web browsers.

There will, however, be many occasions when a document published in HTML is not enough, or is not suitable to the task. Organisations produce publications that are not only large but also complex in structure, using multiple-column text, formulae and many graphics and detailed tabular information. It may be decided that these documents will also need to be published in their print-ready format.

There are a number of other formats that can be used by the web audience - PDF, RTF, plain text, Microsoft Word and CSV.

4.3.1 PDF (Portable Document Format)

PDF files are created using a proprietary application from Adobe. To read these files the user will require the Adobe Acrobat Reader program that is available free of charge on the web.

A file saved in PDF will display text and graphics, in black and white or full colour. While file sizes are sometimes larger than those using HTML and combined graphics files, there are additional benefits when using PDF. These include:

There is a single file to maintain.

Files are easy to produce from the majority of word-processing and desktop publishing packages (using the proprietary PDF-creating applications).

Pages retain almost the exact rendering of printed pages, (which is not possible in HTML).

While PDF offers many conveniences, the format has historically suffered from severe limitations in terms of ease of access by people with disabilities, since it has been difficult to reconstruct the text into another format. Adobe has always had a commitment to working towards overcoming accessibility problems with the format.

Since 1995 Adobe have issued accessibility add-ons for Acrobat, some with more success than others have. This has greatly improved with version 5

Adobe Acrobat Readers - www.adobe.co.uk [External link]
Adobe online accessibility resource [External link]

While the Access plug-in is felt to represent an improvement, it is clear that disabled access to PDF documents still falls short of the ideal. Because of these current accessibility limitations, it is recommended that PDF is not used indiscriminately without alternative versions being offered and is not regarded as the ‘natural’ file format for the web.

A free online PDF to HTML or text conversion service is also available at:

4.3.2 RTF (Rich Text Format)

RTF (Rich Text Format) is a file format that lets you exchange text files between different word processors in different operating systems. For example, you can create a file using Microsoft Word 97 on Windows 95, save it as an RTF file (it will have an ‘. rtf’ three-letter file extension), and send it to someone who uses WordPerfect 6.0 on Windows 3.1 and they will be able to open the file and read it. There are many different revisions of Microsoft’s proprietary Rich Text Format and portability of files will depend on what version of RTF is being used.

In some cases, the RTF capability may be built into the word processor. In others, a separate reader or writer may be required.

When saving a file in RTF, the file is processed by an RTF writer that converts the word processor’s internal file format to the RTF language. When being read, the control words and symbols are processed by an RTF reader that converts the RTF language into formatting for the work processor that will display the document.

4.3.3 Plain text (.TXT files)

Plain text is the simplest format for storing text files in computers and on the Internet. In a plain-text file, each alphabetic, numeric or special character is represented by a 7-binary digit binary number (a string of seven 0s or 1s). 128 characters are defined (upper and lower case letters, numbers, and a few punctuation and other characters).

A document saved as a text file will be legible but will lose all the formatting apart from line- and paragraph breaks.

4.3.4 Microsoft Word (.DOC)

Documents can also be saved in Microsoft Word format. A range of free Word readers is available from Microsoft. This is the least desirable format as it is proprietary and it cannot be guaranteed that a reader exists for a particular user’s computer.

A number of different versions of Microsoft Word are commonly used. It should therefore be clearly shown on the web page which version has been used to create the document.

Microsoft Word reader software http://www.microsoft.com/Office/000/viewers.htm [External link]

4.3.5 Spreadsheet formats

Microsoft Excel (.XLS)/Lotus 1-2-3(.wk1)

Spreadsheet documents can be saved in Microsoft Excel format or as Lotus 1-2-3. Both are proprietary but are widely importable into spreadsheet and word processors.

Comma-Separated Values (.CSV)

This is the simplest way in which tabular information can be saved for importing into table-orientated applications such as Microsoft Excel or database applications such as Microsoft Access. A CSV or ‘flat file’ is a common standard among computers and is understood by a wide range of software. In CSV a comma separates each column of data and each row is shown on a separate line.

No special reader is required for this format.

Microsoft Word reader software http://www.microsoft.com/Office/000/viewers.htm [External link]

4.3.6 Compressed file formats

.ZIP

ZIP software stores single documents or collections of files and their directory structure in a lossless compressed format. Zip archive files have the extension - .zip.

The commercial WinZip application can be downloaded for free trial but must be purchased after a fixed period. When ZIP files are prepared on MacOS systems, care should be taken to exclude the MacOS resource fork from the Zipped file. After you download a Zip file, you usually need to use a Zip-capable decompression program to ‘unzip’ it.

WinZip software http://www.winzip.com

4.3.7 File download sizes

File size can be an issue because of the differing levels of formatting data contained within the various file formats. Depending upon the text-to-graphic ratio of a document, PDF or RTF files may be considerably smaller or larger than the original documents. Plain-text will always save with the smallest file size as it contains the minimum of document structure and no images at all.

Whenever these options are employed they should be listed at the top of the HTML document, directly under the document summary. The file size for each should be shown next to each file to give the user an idea of the potential download time.

If a proprietary format such as PDF or Microsoft Word is used, a link to the reader software download site should be included next to the document sing a standard form of words.

Subsequent sections of a long document should have a link to these downloadable formats at the very top, as many users may not have been introduced to the document from its homepage.

e-Government Interoperability Framework (e-GIF) http://www.govtalk.gov.uk [External link]

4.3.8 Audio files

To be suitable for publishing on your website, recorded sound has to be prepared to professional standards. DAT (Digital AudioTape) and audio CD-ROMs are satisfactory for Web broadcasting. Audio captured via professional quality sound cards from professional standard recordings on analogue compact cassettes will also be satisfactory for Web broadcasting. Amateur quality recordings are unacceptable.

4.3.8.1 TV and cinema commercials

Web managers asked to stream TV or cinema commercials should check that they do not have a BACC viewing ‘watershed restriction’ or a similar restriction. The web page carrying the direct link to such a commercial should carry a warning, for example:

This television advertisement carries images that some viewers may find disturbing.

In section navigation