Towards Automated eGovernment Monitoring

September 26, 2011

Morten Goodwin’s Ph.D. thesis, with the title Towards Automated eGovernment Monitoring, is now available online.

Illustration photo of digital government

EGovernment solutions promise to deliver a number of benefits including increased citizen participation. To make sure that these services work as intended there is a need for better measurements. However, finding suitable approaches to distinguish the good eGovernment services from those which need improvement is difficult. To elucidate, many surveys measuring the availability and quality of eGovernment services are carried out today on local, national and international level.

Because the majority of the methodologies and corresponding tests rely on human judgment, eGovernment benchmarking is mostly carried out manually by expert testers. These tasks are error prone and time consuming, which in practice means that most eGovernment surveys either focus on a specific topic, small geographical area, or evaluate a small sample, such as few web pages per country. Due to the substantial resources needed, large scale surveys assessing government web sites are predominantly carried out by big organizations. Further, for most surveys neither the methodologies nor detailed result are publicly available, which prevents efficient use of the surveys results for practical improvements.

This thesis focuses on automatic and open approaches to measure government web sites.

The thesis uses the collaboratively developed eGovMon application as a basis for testing, and presents corresponding methods and reference implementations for deterministic accessibility testing based on the unified web evaluation methodology (UWEM). It addresses to what extent web sites are accessible for people with special needs and disabilities. This enables large scale web accessibility testing, on demand testing of single web sites and web pages, as well as testing for accessibility barriers of PDF documents.

Further, the thesis extends the accessibility testing framework by introducing classification algorithms to detect accessibility barriers. This method supplements and partly replaces tests that are typically carried out manually. Based on training data from municipality web sites, the reference implementation suggests whether alternative texts, which are intended to describe the image content to people who are unable to see the images, are in-accessible. The introduced classification algorithms reach an accuracy of 90%.

Most eGovernment surveys include whether governments have specific services and information available online. This thesis presents service location as an information retrieval problem which can be addressed by automatic algorithms. It solves the problem by an innovative colony inspired classification algorithm called the lost sheep. The lost sheep automatically locates services on web sites, and indicates whether it can be found by a real user. The algorithm is both substantially tested in synthetic environments, and shown to perform well with realistic tasks on locating services related to transparency. It outperforms all comparable algorithms both with increased accuracy and reduced number of downloaded pages.

The results from the automatic testing approaches part of this thesis could either be used directly, or for more in-depth accessibility analysis, the automatic approaches can be used to prioritize which web sites and tests should be part of a manual evaluation.

This thesis also analyses and compares results from automatic and manual accessibility evaluations. It shows that when the aim of the accessibility benchmarking is to produce a representative accessibility score of a web site, for example for comparing or ranking web sites, automatic testing is in most cases sufficient.

The thesis further presents results gathered by the reference implementations and correlates the result to social factors. The results indicate that web sites for national governments are much more accessible than regional and local government web sites in Norway. It further shows that countries with established accessibility laws and regulations, have much more accessible web sites. In contrast, countries who have signed the UN Convention on the Rights of Persons with Disabilities do not reach the same increased accessibility. The results also indicate that even though countries with financial wealth have the most accessible web sites, it is possible to make web sites accessible for all also in countries with smaller financial resources.

Full disclosure: I am the author of the thesis.

Accessibility of PDF documents

July 16, 2009

Similar to (x)HTML and CSS, which are the technologies use for traditional web pages, PDF documents may be designed with accessibility in mind. Today, most of the public PDF documents are inaccessible.

Scanned PDFs

A typical example of an inaccessible PDF is a scanned document originally consisting of mostly text. Assistive technologies, such as screen readers used by for example visually impaired users, are designed to read normal text. However, if a PDF is scanned, even when it consist of only text, it is perceived as an image. These images can not easily be interpreted as text by a screen reader, and the document would for the users who require text to be read out loud, be completely inaccessible.

Structured elements.

PDF documents may contain structured elements, often referred to as “tags”.
Tags in PDFs are used to denote the semantics of the text such as headings, images, links etc. As an example, when headings are properly tagged, users can navigate through the document using the chapter and section headings. In contrast, if the PDF is untagged, headings and paragraph texts are indistinguishable and navigating using the headings is not possible. Clearly, for large documents, navigating through the document is challenging if not close to impossible.

It should be noted that as with (x)HTML, tags can be used improperly. Letting a PDF be tagged does not by itself guarantee
that the document is barrier free. This is rather prerequisite for accessible PDF documents.

Access to content.

Occasionally, PDF documents restrict users to copy the text. Unfortunately, by enabling this option, it affects the assistive technologies in a negative manner. These technologies, such as screen readers, are restricted from extracting the content.

Language specified.

When a text is read out it is essential that it is read in correct language / dialect as the text is written. As an example, synthetic reading of an English text with German pronunciation will be impossible to understand. It is therefor important that the language is specified in the PDF so that the proper pronunciation could be applied by screen readers. Most PDFs are missing language specification.

Test PDF accessibility.

To test for the above issues PDFs can be done with eAccessibility Checker and providing a URL toa PDF document. Note that the eAccessibility Checker is experimental and may at times be unavailable.