Automatically finding inaccessible alternative texts in web pages

A publication on Automatic Checking of Alternative Texts on Web Pages (Olsen, Snaprud, Nietzio) was recently published.

Often alternate texts for images, maps of audio files, are generated by web publishing software or not properly provided by the editors. For humans it is relatively straightforward to see which alternative texts have been generated automatically as the texts are in no way describing the corresponding image. Examples include texts such as ”Image1”, texts which resemble filenames such as ”image12.png” or ”insert alternative text here”.

The proper method for adding images to a document is when an editor uploads an image for article and can / must provide an alternative text in the CMS.

There are however several improper methods which results in in-accessible automatically generated alternative texts:

  • The editor uploads an image and uses the default alternative text.
  • The editor uploads an image for an article and the CMS generates some (often strange) alternative text.
  • The editor uploads an image but have no possibility to write an alternative text.

Following are some example of automatically generated alternative texts (Image source wikipedia)

A picture of a dog eating with a correct alternative text: Golden Retriever Eating

Correct Alternative text "Image12.png" HTML: <img alt=”Golden Retriever Eating” ... />

A picture of a dog eating with a wrong alternative text: image12.png

Wrong Alternative text "Image12.png" HTML: <img alt=”image12.png” ... />

A picture of a dog eating with a wrong alternative text: image12.png

Wrong Alternative text "Image12.png" HTML: <img alt=”image12.png” ... />

For people who cannot see non-textual content alternative texts are crucial to understand and use the content and automatically generated alternative texts may impose web accessibility barriers. Most automatic accessibility checkers only detects for the existence of alternative texts. The above mentioned texts, which are not describing the corresponding image well and are thus not considered accessible, will not be detected.

The paper introduces a pattern recognition approach for automatic detection of alternative texts that may impose a barrier. The introduced algorithms reach an accuracy of more then 90%, which should hopefully be a step towards improving the usefulness of automatic accessibility checking. Additionally, it could be useful input of manual accessibility checking.

(Full disclosure: I’m a co-author of the paper)

Advertisements

2 Responses to Automatically finding inaccessible alternative texts in web pages

  1. Deniz says:

    Interesting post, thank you for sharing.

    I wonder if there is a technology to create meaningful alternative text based on the image selected?

  2. That would certainly be interesting, but very challenging. I have not seen a working approach in the literature. However, many CMS generate alternative text of images automatically, but with poor quality.

    This is briefly discussed in the paper, but was omitted for the blog post. In addition to the examples mentioned in the post, inaccessible alternative texts could be descriptive in itself yet not describing the corresponding image well. An example would be an image of a dog with the incorrect alternative text cat. Since the text which is not describing the image well, not the word cat by itself being inaccessible, such inaccessible use of alternative texts are hard to detect automatically.
    Similar limitations would exist if we tried to automatically generate alternative text from images; It is hard for an algorithm to know that an images is a picture of a dog and it would thus be hard to suggest an appropriate text.

    A possible approach could be to compare the image with images which already have alternative text. If an image is similar to an image which already exists, it should have similar alternative texts as well. This would require intensive computation, and a large data set for comparison, and it would not be 100% accurate. An approach could therefor be to suggest alternative texts to the editors. The editors would then have to accept if it is correct, or suggest a better alternative.

    Such a working approach would certainly be very useful.

    Morten

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: