Google Summer of Code 2022. Improving Accessibility of PDF Renderer — Alex Velez

Alex Vélez Llaque
Learning Equality
Published in
7 min readNov 17, 2022

--

This past summer, Learning Equality had the privilege of, once again, being a Google Summer of Code mentoring organization. We couldn’t be more grateful for and excited about the volunteer contributions we’ve received and the people we met, like Alex Velez, our collaborator who worked on improving our PDF renderer A11Y mechanisms. In this guest blog post, he shares all about his experience of working with us and on this project. Enjoy it!

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Hi! This blog is about my experience contributing to GSoC 2022, working on a project aimed at improving the accessibility of the PDF Renderer in Learning Equality’s offline-first Kolibri Learning Platform with the mentorship of Marcella and Radina. Most of my contributions can be found here, and you can read more about my experience below.

About the problem

This project planned to implement a more accessible version of the PDF Renderer in Kolibri. Before this project, it did not have support for some basic functionality of a PDF reader like text selection or list of bookmarks. This was because the PDF was only rendered as canvas elements, so it was almost like having just an image of the PDF. Therefore, being an image, you couldn’t access the content and select text for example, all this made it impossible for screen readers read the PDF, so blind users simply couldn’t use this tool within the Kolibri Learning Platform.

A computer screen split in two. The left side depicts a PDF file called “Home Learning Activities” and the right side shows the Inspector feature with computer coding for that PDF renderer. The caption reads: “PDF files rendered as Canvas element are not accessible to assistive technologies like screen readers”.
PDF files rendered as Canvas element are not accessible to assistive technologies like screen readers.

My project involved the following things

  • Update to the latest version of PDFJs, which is the library that Kolibri uses to render PDFs. Until now the library had been outdated even with a major version.
  • Render the PDF text layer, enabling text selection.
  • Rendering struct tree layer. Even if the text has already been exposed, PDFs that are properly tagged give us semantic meaning of the sentences written in the PDF, and exposing this made it possible for screen readers to recognize headers, tables, images, among others.
  • Render the list of bookmarks the PDF has, and make it accessible to blind users by allowing navigation between the bookmarks and the pages the bookmarks referenced.
  • Proper testing and documentation.

My journey

Google Summer of Code is a program that I found out about a couple of years ago thanks to a conference at my university encouraging students to apply, where they mentioned that only a few students from my university had been selected, that seemed interesting and challenging to me.

A year ago I had an unsuccessful attempt, so this year I spent a lot more time reviewing all the projects from all the organizations and filtering the ones that seemed most interesting to me. Out of many I ended up choosing this one because I have always been quite interested in teaching and learning things and how technology can make learning free, more didactic and more fun, something that is totally in line with Kolibri, as well as meddling and learning how something so basic that we use daily like a PDF reader works behind the scenes was something I liked.

To render the PDFs, the Mozilla PDF.js library is used, which is the same library used to render the PDFs in the Firefox browser. So my first task was to update this library in Kolibri, since it was quite outdated, it had version ^1.9.426 and at that time PDF.js was already at version 2.14.305, and well, like almost every update of a major version, it was a somewhat complicated process but I got it.

After that, now the good part was coming. Although the first surprise I had (which wasn’t such a surprise because I had seen it while I was writing my proposal), was that the library has almost no public documentation, no matter how hard I searched, I couldn’t find for example the way in which they made the text selectable in Firefox.

So what I suspected was coming true, I’d have to get into the code of Mozilla’s web PDF renderer to understand how things got into place. And well, after a lot of browsing through all their code, following the execution flows, I was able to understand it and replicate the behaviour to have the text layer available in the Kolibri PDF renderer! It was super crazy to discover that the way the selectable text works was that the pdf canvas was still in the background, and thus kept the original styles of the pdf, but there was a layer, an additional div on top of the canvas that rendered the text of the PDF but with an opacity of 0, so that the text on the canvas could be “selected” but what was actually selected was an invisible text that was above it.

With the clickable text put in as an additional layer it was a very nice step forward, you could finally interact with the content of the PDF, and screen readers could read it. But when Radina reviewed my PR, she tested it with a tagged PDF that in addition to exporting the text, also exported the semantic structure of the text (ie, what was a heading, a paragraph, etc.). But my implementation didn’t recognize those semantics but the Firefox PDF reader did. That’s when I discovered in their code that in addition to the text layer, there was an additional layer called struct tree layer, which linked the semantic structure of the text to the text layer very elegantly with a property called “aria-owns”. Once this was added, now screen readers could recognize headers, paragraphs, tables, lists, even images and their alt texts, which would make it much easier to navigate and understandable for blind users.

A computer screen split in two. The left side depicts a PDF file called “Perosnal Development” and the right side shows the Inspector feature with computer coding for that PDF renderer. The caption reads: “Exposing the document text in a separate layer from the rendered canvas.”
Exposing the document text in a separate layer from the rendered canvas.

Finally, I implemented one of the features that most caught my attention, the PDF bookmarks, and to show them an additional sidebar would be added to the PDF Renderer, which in my objective opinion was quite nice. I have to admit that even though I had already browsed the mozilla PDF renderer code, implementing this feature did take me a bit longer to follow the flow and understand how they managed to find and scroll to the specific position that the bookmark was pointing to.

At first, I thought I could only get to the page but eventually I also managed to get the offset scroll to the specific position of the bookmark reference. On the other hand, when showing the hierarchical list of bookmarks, I learned about the existence of the tab-index property to make some html elements focusable and allow keyboard navigation.

Another small challenge was found with the bookmarks, for visual users the bookmarks worked quite well, the bookmark list was opened and if the user clicked on one, the system scrolled until the position in the pdf where the bookmark pointed to was visible. But for blind users, things were not that easy, even if the user clicked on the bookmark, the screen reader would not jump to read the reference to where the bookmark pointed, since the PDF renderer just scrolled the page, but that to blind users made no difference. This was tested with bookmarks rendered in Firefox and even there there was no screen reader jump to start reading the position the bookmark pointed to.

A computer screen split in two. The left side depicts a PDF file containing text and a photo of learners engaging in learning in a computer lab. he right side shows the Inspector feature with computer coding for that PDF renderer. The caption reads: “Adding the semantic structure layer extracted from the tagged PDF”.
Adding the semantic structure layer extracted from the tagged PDF.

So what we did was implement a way so that when the user types a special key combination within the selected bookmark, it will get the page it was referencing, make the page focusable, focus the page, and then remove the focusable property of the page, that way we get the screen reader to jump to read the page that the bookmark pointed to immediately after the user has selected the bookmark, later if the user wanted to return to the position in the bookmarks where it was before, they’d hit the same key combination and be back!

Conclusion

Working on a project like this, focused 100% on accessibility, has been super revealing for me, it has made me realize many things that as a developer are completely overlooked on a day-to-day basis. I have a much better understanding of how differently-abled users can feel when using a system that is not accessible or even totally unusable for them.

Until before this project, I had not touched much on accessibility issues, it is not something that is usually discussed in my previous environments, or something that a professor has told me at the university. I had never used a screen reader for example, and when doing so It made me realize the great importance of semantic tags in a web page, for example, which is something that many times they tell you “this is how it should be done because it is the correct way” but without seeing one of the needs that they solve.
I am very grateful to Learning Equality for giving me this opportunity, and to Marcella and Radina who have always been there and have taught me a lot of things that I didn’t even know existed before :).

--

--