Building accessibility standards for 647 documentation pages
At a glance
What I did: Founded and led an accessibility program that gave a team of 8 writers the standards, tools, and workflows to make API documentation usable for people navigating with keyboards and screen readers.
Scope: 647 pages across 6 documentation sets on 2 hosting platforms (ReadMe, XWiki), serving hundreds of thousands of developers globally.
Timeline: November 2023 – January 2026 (concept to scaled monitoring)
Outcome: Minimum accessibility score lifted from 65% to 73%. Score consistency improved across the entire documentation set. Content standards adopted team-wide and embedded into release-day workflows. Automated monitoring built into the weekly on-call rotation.
The problem
Amazon’s Selling Partner API documentation had no systematic way to measure or improve accessibility. There was no tooling in place, no content standards for accessible writing, and no feedback loop to tell us whether our docs were getting better or worse.
That meant we had no way of knowing whether the 16% of the global population living with a disability could actually navigate our content. Writers had good instincts but no shared definition of what “accessible documentation” looked like in practice — and no process for checking their work.
I wanted to change that. Not as a one-time audit, but as a lasting shift in how the team thinks about content.
Phase 1: Making the case (Nov 2023 – Feb 2025)
Before anything could change, I needed to understand the landscape and get buy-in.
I evaluated two tooling options — Amazon’s internal Accessibility Evaluator and IBM’s open-source Equal Access Checker — and tested both against our documentation. The comparison came down to practical tradeoffs:
- Amazon’s tool required internal permissions, had limited scalability for large sites, and produced highly technical output that needed manual filtering.
- IBM’s tool was free, worked across all browsers and operating systems, categorized issues by severity, and exported clean reports. It lacked navigation testing, but it was the stronger starting point.
I wrote a proposal documenting the problem, the tooling evaluation, and a measurement plan: use IBM to scan every page, calculate a baseline score, track improvements over time, and collaborate with Amazon’s People with Disabilities affinity group for qualitative feedback.
The proposal was approved, and we moved to a pilot.
Phase 2: Running the pilot (Mar – Apr 2025)
I scoped a 6-week pilot targeting 91 pages — the highest-traffic onboarding and API content. I chose these pages deliberately: if we could improve accessibility where the most users would notice, we’d learn the most.
What I built for the team
The team had never done accessibility work before. To bring 8 writers (6 external, 2 internal) up to speed, I created:
An accessibility guide for technical writers. This wasn’t a compliance checklist — it was designed to build empathy first. The opening instructions:
Navigate the site using only your keyboard. Does it flow the way it would if you were reading it?
Use the VoiceOver feature on your device. Close your eyes and listen. Where do you get stuck, lost, or confused? What parts of the content are frustrating to get through? What content updates can you make to make the experience smoother?
Then I covered the practical fixes writers could make in Markdown: alt text standards, heading hierarchy, visual position language, and how to handle the tool’s interface (sorting by Requirements or Rules to cut through the noise of hundreds of flagged issues).
A false-positive guide. The IBM tool flags everything, including issues writers can’t fix — platform-level contrast problems, ARIA labeling on ReadMe’s navigation chrome, blockquote behavior in ReadMe’s custom scripts. I tested every recurring issue type, documented which ones were real content problems versus platform constraints, and told writers exactly what to ignore and why.
This was a critical content design decision. Without this guide, writers would have either wasted hours chasing unfixable issues or given up entirely because the tool felt overwhelming.
A list of platform-level issues to escalate to ReadMe. I separated what we could fix (content) from what we needed our hosting platform partner to fix (HTML5 compliance, ARIA labeling, glossary hover behavior), and brought a documented list to our ReadMe contact.
Content decisions I made
The specific content fixes fell into four categories:
-
Alt text. I wrote guidance on what makes alt text genuinely useful versus perfunctory. Not just “add alt text to images” but how to describe a diagram so a screen reader user gets the same understanding a sighted user does.
-
Heading uniqueness. Our API docs had repeated heading labels — multiple sections called “Request examples” or “Response” on the same page. Screen reader users navigating by heading would hear the same label multiple times with no way to distinguish sections. I standardized unique, descriptive heading conventions.
-
Visual position language. Instructions like “click the button in the top right corner” are meaningless to someone using a screen reader. I created guidance for writers to pair visual references with element labels: “Select the Delete button (top right)” instead of just “Click the button in the top right corner.”
-
Empty and skipped headings. I identified patterns where writers were skipping heading levels (jumping from H2 to H4) or leaving empty headings in the Markdown. These break the navigational structure for assistive technology users.
Phase 3: Measuring results honestly (Apr 2025)
The pilot ended. The results were mixed — and I said so.
What improved:
- SP-API scores rose from 81.5% to 82.6%
- Builder Docs rose from 80.5% to 81.1%
- Standard deviation across SP-API docs decreased from 4.38% to 4.06%, meaning the experience became more consistent regardless of which page a user landed on
- The minimum score across all 647 SP-API pages rose from 65% to 73%
What didn’t work:
- We didn’t hit our 85% target
- Platform-level changes from ReadMe, which we hypothesized would add 3% or more, actually produced a -0.1% change
- The IBM tool’s overall score proved to be an incomplete metric: some pages showed only a 1.4% score improvement but achieved a 32% reduction in total issues
I wrote a lessons learned document that laid this out transparently, including a “Start / Stop / Continue” framework for what to do next. The key insight: our content quality was already strong. The remaining barriers were almost entirely platform-level, which meant we needed different tools and different partnerships to keep improving.
Phase 4: Scaling beyond the pilot (May 2025 – Jan 2026)
Based on the pilot results, I did three things:
Evaluated additional tooling. I tested WAVE (used by ReadMe) alongside IBM and the AWS Level Access Crawler. I mapped each tool’s capabilities against every WCAG principle and found that IBM + WAVE together covered the most ground — IBM for severity-based reporting and exportable data, WAVE for interactive keyboard navigation testing and component-level analysis.
Deployed automated monitoring. I set up the AWS Level Access Crawler to generate daily accessibility reports across all ReadMe pages. On the first full report, it identified 1,877 issues — all of which I confirmed were platform-level false positives, meaning the content itself had zero actionable accessibility issues.
Built it into the team’s workflow. I integrated accessibility monitoring into the weekly on-call rotation: each week, the on-call writer reviews the crawler report, creates tickets for any new issues, updates the false-positive list, and reports on overall status. This means the program continues without depending on any single person.
What I’d do differently
If I started this over, I’d push for more granular metrics from day one. The overall accessibility score was useful for getting buy-in and showing directional progress, but it masked the real improvements we were making. Total issue count, broken down by type and severity, turned out to be a better indicator of actual accessibility impact.
I’d also start the ReadMe partnership conversations earlier. We spent pilot time discovering that platform changes didn’t move scores the way we expected — and that gap between expectation and result cost us momentum. Starting the technical collaboration with the hosting platform in parallel with the content work would have been more efficient.
What colleagues said
“She went above and beyond in improving accessibility for our documentation, making thorough case studies and implementing lasting process improvements that helped our whole team create better work.” — Jack Evoniuk, Technical Writer, Amazon
“Andie has consistently demonstrated exceptional skill as a Technical Writer, combining strong technical knowledge with a deep empathy for users of all backgrounds and abilities.” — Gibran Waldron, Business Analyst, Amazon
“She took ownership of projects that had meaningful impact on user experience, inclusivity, and content quality. Andie is the kind of professional who leads through action and creates lasting positive change.” — Wendy Kurko Giberson, Senior Technical Writer, Amazon