Objective: to specify and provide reference implementations for a general-purpose document data format that prioritizes accessibility first and foremost.
Most existing document formats heavily (or even exclusively) prioritize the creation of written text, usually with a visual-presentation emphasis (formatting, styling, layouts, and so on). This severely impacts a large range of people, and effectively makes such documents less accessible or even inaccessible.
Conversion from visual-first documents into more accessible formats is often limited, difficult, cumbersome, or even outright impractical. Such conversions almost always require a considerable degree of technical proficiency, meaning that even when they are possible, the work is rarely actually performed.
These factors combine to create a de facto reality wherein huge swaths of knowledge and information, as digitally encoded in various documents, is essentially unavailable to a large portion of the world's population.
We hold that this status quo of digital document creation is unacceptable and must be addressed.
Document - a package of information, stored in digital form. In the scope of this specification, mostly refers to documents in the Accessible-First Document Format.
Author - a person (or group of persons) who are intentionally sharing their knowledge or information in the form of a document.
Audience - a person (or group of persons) who is intentionally receiving the knowledge or information provided in a document.
The Accessible-First Document (AFD) Format is designed, as the name suggests, to prioritize accessibility as a primary concern.
Other concerns in the digital ecosystem (such as privacy and security) are also very serious issues, but we posit that they are orthogonal concerns to accessibility; meaning that it is more effective to develop privacy and security technologies that work on any format of data, than to attempt to address such concerns directly within the specification of a data format itself. This is not to devalue such considerations in any way - it is merely a practical choice to help maintain focus, and to defer the creation of privacy and security technologies to more qualified projects.
We cannot possibly anticipate every possible form of access need in the specification of this format, nor can we possibly predict what new kinds of access needs may emerge for people in the future. As such, the AFD Format must support additions, extensions, and evolutions that are not directly laid out or accounted for here.
Our primary goal is to facilitate the storage of documents in a way that practically maximizes accessibility. We are less concerned with defining the specifics of presenting these documents, since by definition that presentation must account for the audience of the document (which we cannot possibly predict) and must allow for that audience to customize the presentation to meet their own particular needs and wishes. As such, the specification prefers to err on the side of leaving room for many different presentation possibilities, even if that means potential redundancy or a bit of overhead in the data itself. In particular, we choose to emphasize semantic meaning as the primary thing to encode in the document itself, which allows for those semantics to be presented in a variety of ways (including adjustments for access needs as well as cultural expectations and localization conventions etc.) without necessarily placing the burden of that flexibility entirely on the author.
In general, digital documents make use of a large number of different mechanisms for presenting and organizing information. We've broken these down into groups by their overall primary purpose, and listed a number of key features that the AFD Format should support for each.
Basic grouping of related or contiguous information (e.g. sections and paragraphs)
Call-outs for changes of section, topic, etc. ("headings")
Support various levels of nesting as fluidly as possible (e.g. not all documents will have chapters but still want to have organization; and some documents might be aggregates of multiple documents each with their own internal structures, etc. etc.)
Emphasis and strong-emphasis (drawing on semantic HTML's <em>
and <strong>
markup)
Options to expand acronyms, explain abbreviations, provide jargon definitions, etc.
In keeping with our overall principles, we want to avoid stipulating how affordances are presented. These should defer to personal preference on the part of the audience of the document wherever possible (e.g. someone may wish to always expand acronyms or only the first time encountered, offer definitions on-demand, etc.).
Care must be taken here to be inclusive of non-visual means of interaction, e.g. how would someone request more information via a screen reader? How can someone familiar with jargon avoid wasting time on constant re-definitions of terminology? etc.
We generally prefer to store an abundance of semantic data in the document itself, which may or may not be presented to the audience, and may or may not afford certain experiences in all cases. This maximizes the possibilities for access without overly burdening the document format (or document authors!) with the need to micro-manage the experience of someone interacting with the document itself. This also serves to maximize the agency of the audience, which is a key objective of genuine accessibility.
<AccessibleDoc>
<Title>Pretend Document</Title>
<Summary>
This is a made-up document to illustrate the
Accessible-First Document Format.
</Summary>
<Section Heading="Example">
<Paragraph>This is some example text.</Paragraph>
<Annotations>
<Emphasis Start="14" End="21" /> <!-- emphasize "example" -->
</Annotations>
</Section>
</AccessibleDoc>