Remaining challenges of measuring the accessibility of web sites according to WCAG 2.0

August 11, 2010

The Web Content Accessibility Guidelines (WCAG 1.0) was launched in 1999 and was followed up by WCAG 2.0 in 2008. These guidelines have been the de facto standard for how to make web sites accessible for all people, including people with special needs.

Accessibility Sign

During the 9 year period from 1999 from 2008, many measurement methodologies for WCAG 1.0 was created. Furthermore, many national and international surveys have benchmarked the accessibility of public web sites according to WCAG 1.0. Since WCAG 2.0 differ from WCAG 1.0 in significant ways, the existing measurement methodologies cannot easily be translated to WCAG 2.0. Thus, very few applications for evaluation according to WCAG2.0 has been produced. Only two tools claiming to be WCAG 2.0 compliant are known to the authors: AChecker and TAW. The details of these tools are not known.

A paper titled Evaluating Conformance to WCAG 2.0: Open Challenges (Alonso, Fuertes, Gonzalez, Martínez) presented the remaining challenges of measuring accessibility of public web sites according to WCAG 2.0. In this paper, the authors identify the main challenges with measuring measuring accessibility in web sites in accordance to WCAG 2.0. The lessons have been learned by applying WCAG 2.0 tests in practice by university students.

The paper identifies the following challenges. The described challenges are in the authors experience unclear parts WCAG 2.0, which often means that the testers need interpret the texts and take decisions of how it should be understood. This could easily lead to inconsistency among testers as the testers may understand the texts differently.

Accessibility supported Technologies

WCAG 2.0 describes that only accessibility supported technologies can be relied upon for accessibility. It further states that the technology is accessibility supported only when user’s assistive technology will work with it. Since no list of supported technologies is provided, nor any formal way to measure if a technology is supported or not, this causes a challenge. There are no established method of saying that using one technology is accessibility, while using another is not.

Testability of Success Criteria

WCAG 2.0 consists of testable techniques. A technique is testable if it can be tested ether with machine or by human judgment. It is believed that around 80% of the criteria are testable by humans. However, the authors show that some of the description of the techniques for testing causes confusion. For example: in the sentences, “the test sequence of elements should be meaningful”, it is not evident what is meant by the wording meaningful. What is understood as “meaningful sequence of elements” for one person may not be meaningful for others. This is likely to cause confusion, which leads to inconsistency in any testing results.

Openness of Techniques and Failures

WCAG 2.0 is divided to separate documents: the guidelines and techniques. The guidelines are stationary and technology independent. In contrast the techniques is a living document which is updated as technology evolves. This makes it possible to update WCAG 2.0 with hands on techniques as the technologies used on the web evolve. One challenge is that W3C updates the techniques document for non-proprietary software only. This means that there will be no techniques collected by W3C for proprietary software, such as for example Adobe Flash. Thus, there will be no techniques from W3C on how to make Adobe Flash accessible.

Aggregation of Partial Results

How to present data from successfull techniques and common failures have not been presented by W3C. WCAG 2.0 identifies two types of criteria an element can match:

  • Positive: Elements which meet the criteria of successfull techniques. Any elements which uses the successfull techniques are known to be accessible.
  • Negative: Elements which is a common failure. Any elements which uses a common failure, is known to be in-accessible.

It is not so that the successfull techniques and common failures are opposite measures. Thus, not following a success technique does not mean that a barrier exist. Similarly, it is not so that avoid a common failure necessarily means that the element is accessible. Therefor, elements which nether match the successfull techniques nor common failures fall into some unknown state and cannot be claimed to be accessible nor in-accessible.

How to present data from a web page with common failures and successfull techniques are not clear.

Recommendations

The author further present some recommendations when measuring web accessibility according to WCAG 2.0. The recommendations are as following:

  • Accessibility-supported techniques should be clearly defined, and a methodology to identify if a techniques is accessible-suppported, or not should be established.
  • More experiments are needed for the testability of the techniques, failures and success criteria. This should be a step towards creating a common understanding of how the tests should be interpreted.
  • W3C should define how test results from successfull use of techniques, common failures, and not applicable should be aggregated and presented as a single result.

Is financial wealth leading to high quality government services?

August 6, 2010

It is natural to assume that financial wealth leads to better government. It is further reasonable to expect that wealthy countries have higher quality of the e-government services compared to countries with less financial wealth. But how much does the finances alone influence quality e-government services? This short study gives a peek of how finances affects e-government services.

UN E-government 2010 report

In this study the data used for quality of e-government services is the E–Government Development Index (E-readiness score) from the United Nations E-Government Survey 2010. Thus, it is directly assumed that a government with high quality e-government services will receive a high score, and visa versa. The remaining data used is from the World Bank Data Catalog.

The following figure presents a box plot of the differences between the E–Government Development Index of Developing and Developed countries. The plot shows that developing countries have in average score of 0.4 while developed countries have an average score of about 0.7. Furthermore, all developing countries have scores less than 0.7, while all the developed countries have a score higher than 0.5. Thus based on the United Nations E–Government Development Index score it is, not surprisingly, significant difference between e-government services in developing and developed countries.

Developing countries have in average of 0.4 while developed countries have an average of about 0.7. All the developing countries have e-readiness score less than 0.7 while all the developed countres have a score higher than 0.5.

E-readiness score versus developing and developed countires.

Thus, the quality is clearly dependant on the finances, but how much of the quality e-Government services are influenced by finances alone?

The development of government services is complex procedure shaped by many factors. There exists no general conclusion of which factors influence the quality of the government service. It is however possible to determine to what extent data from the financial situation in a country can be used to predict the e-readiness score.

The following graph presents the plot between E–Government Development Index and GNI per capita. The graph also includes a regression, which can be used to calculate the E–Government Development Index based on the GNI per capita alone.

A dotplot showing the trends between E-readiness and GNI per capita.

E-readiness versus GNI per capita

The trends in the data are clearly visible. The regression can be seen as the black line, the mean response is shows as a green dashed line while the prediction interval is presented as the blue dashed line.

The regression line (black line) shows the relationship between the E–Government Development Index and GNI per capita. If no correlation existed between the two data sets, the line would be completely horizontall. The regression line can be used to predict the E–Government Development Index using only the GNI per capita. The graphs shows us that the relationship is not linear, but more complex.
The mean response interval (green dashed line) tells the estimated mean of the data.
The prediction interval (blue dashed line) tells where future data is expected be located (similar to confidence interval).

The data shows that the mean response interval and prediction interval changes as the GNI per capita increases. Generally, we are more certain of the prediction when these intervals are small. From this we can draw the following conclusion. It is relatively easy to predict the E-readiness score when a country has a low GNI per capita. In contrast, to predict the E-readiness score based on the GNI Per Capita alone for wealthy countries is a lot less precise. I.e. lack of finances generally means low quality services, while wealth alone is not sufficient to ensure quality in e-government.