Accessibility requirements for pdf files

Mathematicians tend to supply lecture notes and problem sheets to their students in the form of pdf files created by LaTeX. This post is about what requirements the Public Sector Bodies (Websites and Mobile Applications) Accessibility Regulations 2018 legislation (“the regulations”) imposes on this sort of documents when they are supplied to students via the web, and to what extent we can meet those requirements. It will not cover requirements for other content, for example web pages, video, or audio. All of this is my legally-uneducated opinion, not that of UCL or its maths department.

The impact of these regulations is that any content we give to students has to be accessible by default. Previously, the Equality Act 2010 only required us to make reasonable adjustments when a student requested them - for example, this could mean supplying course texts in the student’s chosen format.

Unfortunately we are a long way behind the Microsoft world on accessibility. It is probably possible (see below) to beat automated accessibility tests for pdf files solely by installing some packages and adding a few things to the preamble, but this is neither necessary nor sufficient for actual accessibility or complying with the regulations.

Timings

If you publish documents before 23rd September 2019 on a non-public site, they don’t have to meet the requirements of the regulations until the website “undergoes a substantial revision” (Part 1.3.(2).(g)). If you publish on or after 23rd September 2019, the requirements apply straight away.

What is the accessibility requirement?

The regulations require that content fulfils “the relevant requirements of the European standard on the accessibility requirements suitable for public procurement of ICT products and services in Europe.” This standard is EN 301 549, the part relevant to pdf files being section 10 on documents.

EN 301 549 section 10

This section lists “success criteria” for accessible documents. They are intended to harmonize with the WCAG (Web Content Accessibility Guidelines) 2.0 produced by the W3C. Here is a short summary of only those success criteria relevant to static pdf files:

10.2.1 Non-text content

Non-text content (like images or equations displayed as images) must have a text alternative.

10.2.7 Info and relationships

“Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text.” For example, a pdf viewer should be able to figure out whether a particular bit of text is a header or section title or page number or body text, how the document splits into sections, what order the sections are intended to be read, which caption belongs to which image, and so on.

10.2.8 Meaningful sequence

“When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined.”

10.2.10 Use of colour

Colour isn’t used as the only way of conveying a piece of information.

10.2.12 Contrast (minimum)

Specifies particular contrast ratios for text and images.

10.2.13 Resize text

It must be possible to resize text up to 200% without extra technology. The guidance says that formats for which there are viewers with a 200% zoom function automatically meet this criterion, so pdf files are OK.

10.2.14 Images of text

Text is used to convey information, not images of text.

10.2.21 Document titled

The document must have a title describing its topic or purpose.

10.2.23 Link purpose

Link purpose can be determined from the link text (except where the link purpose would be unclear to any user). Hyperlinks shouldn’t look like click here but click here to go to the google homepage.

10.2.25 Headings and labels

Headings and labels describe topic and purpose.

10.2.27 Language of page

The language of the document can be determined programmatically.

Meeting the requirements

Some of the criteria are automatically met by the kind of pdf files we usually produce. They are by default black text on a white background (meeting 10.2.12 and 10.2.10), and pdf viewers generally allow zooming (meeting 10.2.13). Sections are usually given descriptive headings or labels (meeting 10.2.25) Hyperlinks are rare, but the hyperref package allows descriptive link titles (10.2.23).

The requirements for programmatic determination of language and title (10.2.27, 10.2.21) can be met by including

\usepackage[pdftex,
            pdftitle={Producing accessible documents with LaTeX},
            pdflang={en-GB}]{hyperref}

in the document preamble, so long as you have the hyperref package installed - it should be available by default on most systems.

The two requirements (10.2.1) and (10.2.14) are the most problematic. For example, people often provide scanned handwritten notes to students to save time preparing LaTeX documents. Producing the required text alternatives defeats this object, so scanned handwritten notes either fail the requirements or are pointless.

It appears that the requirement of (10.2.1) for an image or graph could be met by having an appropriate descriptive caption, which is easily done in LaTeX. The requirements of (10.2.1) and (10.2.14) for equations are harder to meet. By default there is no adequate text representation of LaTeX equations in the pdf output (try copying and pasting a formula from a LaTeX pdf). This can be partially met with the axessibility package, which is included with

\usepackage{axessibility}

It probably won’t be available on your system by default, but Linux users who have installed texlive-full will have it, and the Windows software MiKTeX can install it. The package works by adding to each equation in the pdf file a hidden comment containing the LaTeX code that produced it. In principle this provides a non-text alternative available to screen readers with minimal effort from document authors when combined with their screen reader dictionaries.

There are a couple of problems with this package. It doesn’t work for many LaTeX environments, most importantly math surrounded by dollar signs (though it’s ok with \begin{equation}...\end{equation} and $ ... $), and it doesn’t expand user-defined macros. The authors supply a script to fix these issues, but it adds an extra (and not exactly user-friendly) step to document preparation. Also, while this might meet the letter of the “text alternative” condition, it is only really useful to people who understand LaTeX.

This leaves 10.2.7 and 10.2.8, which map to WCAG 2.0 section 1.3.1 and 1.3.2. These are a reference to “tagged pdf“which encodes structure information into the pdf file. The most up-to-date question on TeX.se I found about this is from 2015, and it suggests that we are a long way off having a straightforward method to produce tagged pdf with LaTeX. The pdfx package is relevant. It’s difficult to know the extent of the problem without access to Adobe Reader Pro and its accessibility tools.

An online accessibility checker

Institutions are required by the regulations to produce an accessibility report. Rather than examine documents by hand they are likely to try and assess this automatically - UCL will install Blackboard Ally into its Moodle system, and presumably it will insist all content is supplied to students via Moodle. We will therefore need to produce documents that pass automatic tests.

The European Internet Inclusion Initiative has an online pdf accessibility checker which is designed to test the WCAG 2.0 success criteria. It is still in development, and it is necessarily limited in what it can do - for example one of the success criteria is that the document contains textual information that can be presented instead of images, but the checker cannot tell if the textual information is actually useful. In particular, you can pass all the tests (with the exception of the experimental header and footer check) just by using the hyperref package as described above. A pdf file created this way isn’t in any real way accessible, because the equations are included as images which cannot be read by a screen reader. Try copying and pasting the result of $y= \sqrt{x^2+1}$ from a pdf file and pasting into a text editor: you’ll get something like y = x 2 + 2 making it clear that the meaning of the equation is completely lost. This is what a screen reader would see, so the document is useless to someone using one.

I couldn’t get LaTeX pdf files to pass the header/footer check even when they had appropriate headers, probably because the documents weren’t properly tagged for structure - even though they pass the “structure elements (tags)” test criterion.

How visually impaired people access mathematical texts

The document Good Practice on Inclusive Curricula in Mathematical Sciences (produced by the Mathematical Sciences HE Curriculum Innovation Project, edited by Emma Cliffe and Peter Rowlett, supported by the HEA MSOR network and the National HE STEM programme) contains several relevant articles - it’s really worth reading the whole thing. Unfortunately it’s from January 2012 and things have changed since then, for example the extremely popular google chrome browser no longer supports MathML. There is some more information about that on Peter Krautzberger’s blog and in the comments there. There is some third-party work going on to restore it. Stephen Webb’s article Accessibility of University Mathematics from 2011 in MSOR Connections is also relevant.

One thing that stands out from these and other sources is the diversity of ways in which people with visual impairments access mathematical texts. Here is a non-exhaustive list of examples:

screen readers built into document viewers
screen readers which run as separate programmes
Braille displays
printed Braille texts
large print hard copy
document viewers with a high zoom setting
.tex files (this obviously requires some LaTeX knowledge, which students who have been to a specialist school and focussed on maths may have but non-math students needing to access mathematical content are less likely to)
.tex or other source files processed to remove non-mathematical formatting
web content marked up in MathML, which is accessible to certain screen readers, the approach taken by http://math.stackexchange.com for example
web content with equations displayed as images with alt-text containing some kind of description of the equation, e.g LaTeX code. This is the approach taken by https://en.wikipedia.org.

Disproportionate burden

Part 2, section 6 of the regulations allow a public sector body not to comply with the accessibility requirements if doing so would “impose a disproportionate burden.” If it’s going to do this, the public sector body has to “perform an assessment” of the burden taking into account its size and resources and the costs and benefits for the public sector body and for people with disabilities. Non-accessible content has to be listed in the body’s accessibility statement along with the reasons it is not accessible.

Conclusions

It is relatively straightforward to make pdf files that pass the EIII tests mentioned above, but although hyperref and axessibility may improve things somewhat there’s no low-effort way to produce genuinely accessible pdf files from LaTeX. It may not even be possible to produce pdf files with LaTeX that meet the new regulations because of the issues with tagged pdf.

Currently the easiest way for someone who wants to publish genuinely accessible mathematics is probably compiling to html and MathML. There are various methods of doing this which I have tried:

lwarp
TeX4ht
pandoc
LaTeXML (this was the most successful)
HeVeA
latex2html
the internet docclass

For simple tex documents (including images, macros, the AMS packages, hyperref, but no complicated packages) this can work fairly well. The results are much more accessible than anything we can produce with pdflatex at the moment.

Other web resources

The accessibility tag on tex.stackexchange contains a great deal of discussion on making accessible documents with LaTeX.
I asked a question there specifically about the regulations, but it didn’t get much of a response.
The TeX users group has a page on PDF accessibility and standards with links to various packages.