Accessible-First Document Format

Working Draft: Last Updated 2023-07-29

Objective: to specify and provide reference implementations for a general-purpose document data format that prioritizes accessibility first and foremost.

Motivation

Most existing document formats heavily (or even exclusively) prioritize the creation of written text, usually with a visual-presentation emphasis (formatting, styling, layouts, and so on). This severely impacts a large range of people, and effectively makes such documents less accessible or even inaccessible.
Conversion from visual-first documents into more accessible formats is often limited, difficult, cumbersome, or even outright impractical. Such conversions almost always require a considerable degree of technical proficiency, meaning that even when they are possible, the work is rarely actually performed.
These factors combine to create a de facto reality wherein huge swaths of knowledge and information, as digitally encoded in various documents, is essentially unavailable to a large portion of the world's population.

We hold that this status quo of digital document creation is unacceptable and must be addressed.

Definitions

Document - a package of information, stored in digital form. In the scope of this specification, mostly refers to documents in the Accessible-First Document Format.
Author - a person (or group of persons) who are intentionally sharing their knowledge or information in the form of a document.
Audience - a person (or group of persons) who is intentionally receiving the knowledge or information provided in a document.

Considerations

The Accessible-First Document (AFD) Format is designed, as the name suggests, to prioritize accessibility as a primary concern.
Other concerns in the digital ecosystem (such as privacy and security) are also very serious issues, but we posit that they are orthogonal concerns to accessibility; meaning that it is more effective to develop privacy and security technologies that work on any format of data, than to attempt to address such concerns directly within the specification of a data format itself. This is not to devalue such considerations in any way - it is merely a practical choice to help maintain focus, and to defer the creation of privacy and security technologies to more qualified projects.
We cannot possibly anticipate every possible form of access need in the specification of this format, nor can we possibly predict what new kinds of access needs may emerge for people in the future. As such, the AFD Format must support additions, extensions, and evolutions that are not directly laid out or accounted for here.
Our primary goal is to facilitate the storage of documents in a way that practically maximizes accessibility. We are less concerned with defining the specifics of presenting these documents, since by definition that presentation must account for the audience of the document (which we cannot possibly predict) and must allow for that audience to customize the presentation to meet their own particular needs and wishes. As such, the specification prefers to err on the side of leaving room for many different presentation possibilities, even if that means potential redundancy or a bit of overhead in the data itself. In particular, we choose to emphasize semantic meaning as the primary thing to encode in the document itself, which allows for those semantics to be presented in a variety of ways (including adjustments for access needs as well as cultural expectations and localization conventions etc.) without necessarily placing the burden of that flexibility entirely on the author.

Affordances

In general, digital documents make use of a large number of different mechanisms for presenting and organizing information. We've broken these down into groups by their overall primary purpose, and listed a number of key features that the AFD Format should support for each.

Organization

Basic grouping of related or contiguous information (e.g. sections and paragraphs)
Call-outs for changes of section, topic, etc. ("headings")
Support various levels of nesting as fluidly as possible (e.g. not all documents will have chapters but still want to have organization; and some documents might be aggregates of multiple documents each with their own internal structures, etc. etc.)

Semantics

Emphasis and strong-emphasis (drawing on semantic HTML's <em> and <strong> markup)
Options to expand acronyms, explain abbreviations, provide jargon definitions, etc.

Cross-Media Representations

All non-textual media such as (but not limited to) images, audio, and video should be accompanied by semantic descriptions in text form (e.g. "alt text", transcriptions, subtitles/captions, etc.)

Assorted notes on affordances

In keeping with our overall principles, we want to avoid stipulating how affordances are presented. These should defer to personal preference on the part of the audience of the document wherever possible (e.g. someone may wish to always expand acronyms or only the first time encountered, offer definitions on-demand, etc.).

Care must be taken here to be inclusive of non-visual means of interaction, e.g. how would someone request more information via a screen reader? How can someone familiar with jargon avoid wasting time on constant re-definitions of terminology? etc.

We generally prefer to store an abundance of semantic data in the document itself, which may or may not be presented to the audience, and may or may not afford certain experiences in all cases. This maximizes the possibilities for access without overly burdening the document format (or document authors!) with the need to micro-manage the experience of someone interacting with the document itself. This also serves to maximize the agency of the audience, which is a key objective of genuine accessibility.

Concept Scratch Space

Group text semantically, provide annotations/markup out of band? e.g. can we use XML-style tagging for this? Can we just base on XML?

<AccessibleDoc>
  <Title>Pretend Document</Title>
  <Summary>
    This is a made-up document to illustrate the
    Accessible-First Document Format.
  </Summary>
  <Section Heading="Example">
    <Paragraph>This is some example text.</Paragraph>
    <Annotations>
      <Emphasis Start="14" End="21" /> <!-- emphasize "example" -->
    </Annotations>
  </Section>
</AccessibleDoc>

The order in which things appear in the document file itself should match the order in which the content is presented unless explicitly overridden by the audience's wishes. In other words, if reading "from start to finish", a screen reader should traverse the document in the same order that content is provided in the file itself; but we want to leave room for navigation options such as skipping to a specific chapter/section/etc. etc.

Non-Visual Styling

Consider ways to bridge dynamic text-to-speech configuration to allow for spoken emphasis (and other semantics) instead of merely visual font changes etc.; balance conveying intent from the author's perspective without stipulating presentation choices.