Automatically finding inaccessible alternative texts in web pages

September 24, 2010

A publication on Automatic Checking of Alternative Texts on Web Pages (Olsen, Snaprud, Nietzio) was recently published.

Often alternate texts for images, maps of audio files, are generated by web publishing software or not properly provided by the editors. For humans it is relatively straightforward to see which alternative texts have been generated automatically as the texts are in no way describing the corresponding image. Examples include texts such as ”Image1”, texts which resemble filenames such as ”image12.png” or ”insert alternative text here”.

The proper method for adding images to a document is when an editor uploads an image for article and can / must provide an alternative text in the CMS.

There are however several improper methods which results in in-accessible automatically generated alternative texts:

  • The editor uploads an image and uses the default alternative text.
  • The editor uploads an image for an article and the CMS generates some (often strange) alternative text.
  • The editor uploads an image but have no possibility to write an alternative text.

Following are some example of automatically generated alternative texts (Image source wikipedia)

A picture of a dog eating with a correct alternative text: Golden Retriever Eating

Correct Alternative text "Image12.png" HTML: <img alt=”Golden Retriever Eating” ... />

A picture of a dog eating with a wrong alternative text: image12.png

Wrong Alternative text "Image12.png" HTML: <img alt=”image12.png” ... />

A picture of a dog eating with a wrong alternative text: image12.png

Wrong Alternative text "Image12.png" HTML: <img alt=”image12.png” ... />

For people who cannot see non-textual content alternative texts are crucial to understand and use the content and automatically generated alternative texts may impose web accessibility barriers. Most automatic accessibility checkers only detects for the existence of alternative texts. The above mentioned texts, which are not describing the corresponding image well and are thus not considered accessible, will not be detected.

The paper introduces a pattern recognition approach for automatic detection of alternative texts that may impose a barrier. The introduced algorithms reach an accuracy of more then 90%, which should hopefully be a step towards improving the usefulness of automatic accessibility checking. Additionally, it could be useful input of manual accessibility checking.

(Full disclosure: I’m a co-author of the paper)

Advertisements