Accessibility in PDFs – Lawrence Kwok CS467 Capstone Blog

Accessibility has been a topic that I’ve haven’t thought of at all prior to last month, until I was handed a task during my current internship to make the agency’s fillable forms accessible. In a nutshell, accessibility is an easy concept to understand. You’re making a document that is able to be read or filled out by users aren’t able to access it via normal means. However during the process of looking through tutorials and online blogs over the last month as I try to document the process, I still would consider myself a novice on the subject matter.

Accessibility is quite an abstract concept to grasp. While there are defined standards set forth by Section 508 pertaining to the WCAG 2.0 (Web Content Accessibility Guidelines) and PDF/UA-1 (Universal Accessibility) that show you the bare minimum needed to achieve accessibility, the methods, tools, and also execution can vary greatly, thus adding a huge layer of complexity that I hadn’t thought about until now. Normally people would think of a PDF as a document that has several blocks of text, images, maybe a few interactive areas in the form of fillable fields, signature areas, checkboxes, and buttons.

With the added need of accessibility, those very things we can easily read over with a glance would now need to be made accessible by adding additional properties, identifying exactly what type of structure it is by tagging them, creating a structure tree, as well as ensuring that the resulting work is cohesive and easily navigable for the user that will be accessing the forms or documents via non-traditional means. In a nutshell, it’s very similar to building a class with many children each with their own specific properties, and ensuring that the logic flow makes sense. Adding the fact that there can be many ways for a PDF to exist as as well as not one strict method to structure the document for accessibility as long as it meets the guidelines, it adds a lot of creativity and ambiguity to the process.

Starting off the process, you would usually need to identify what you are working with. Is it a text document purely for reading? Is it a fillable form pdf (these are the hardest to work with due to the additional requirements)? Does it have figures and tables? And so on and forth. While the common factor with all these PDFs require you to have a sequential reading order (Left to Right, Up to Down), the types of elements present on the document heavily changes how you would remediate the PDF to make it accessible. For an example, if there were fillable form fields, I would have to not only tag each object but I would also have to add tooltips (hover fields) that explain what the field is used for. While most people can see the field and the associated question or label, screenreaders need that additional tooltip to explain how the field works to a visually impaired individual as an example.

Furthermore, you would also need to structure the document so that every item is sequential and grouped together. This may involve manual check to check the tab order (does the tabbing flow to the next item over instead of jumping around) to how to the nesting of items that are either in the same section or same category, such as lists or sub categories. This widely changes how the document tree structure looks and the way you plan the layout of the document. Adding to the fact that pages can have dozens or sometimes hundreds of individual elements, pages can vary from very simple just text blocks to complex structuring with interactive form elements, figures, tables, and graphs littered throughout the document, thus lengthening the processing and thinking of how to be best remediating the document.

While Adobe tries to make the UI as intuitive as possible, some things still fall through the cracks in both documentation and execution. Tables require cells to be defined before tables can be made, which isn’t explained well with the current tools of Adobe, however text can still be selected to be tagged as a Table, thus resulting in a huge mess when the internal structure of the table cannot be reconciled. Having a lot of trial and error does help improve learning by immersion in the actual issues, but also incurs a huge amount of frustration and can make the process arduous. However figuring out how the features or issues should be resolved makes it all the more sweeter as it is with learning hard concepts.

However, given that most current documentation is just how to solve each issue individually, its very easy for newcomers to be overwhelmed by the amount of tasks, the non-linear nature of error remediation, as well as trying to figure out the best way to make the information accessible and navigable for users. Working on such forms has given me a deeper appreciation for those that do manage to do this on a daily basis, as it does a lot to serve an entire section of the populace that cannot use the document though normal means. Even the IRS or USDA have PDF forms which are accessible just for this very purpose.

As I currently work on this, I realize that the work is not only limited to documents, but to other electronic media such as websites. Keeping this in mind, I do plan to consider accessibility as a potential factor in software development whenever possible as technology isn’t supposed to be limited to a select few, but to be made as widely available as possible and enrich the user’s life.