text (OHCO)

The essential parts of any document form what we call ‘content objects,’ and are of many types, such as paragraphs, quotations, emphatic phrases, and attributions. Each type of content object usually has its own appearance when a document is printed or displayed, but that appearance is superficial and transient rather than essential — it is the content elements themselves, along with their content, which form the essence of a document. When mnemonic names for these objects are specified, a document is said to include ‘descriptive markup.’ Most content objects are contained in larger content objects, such as subsections, sections, and chapters. […] Generally, smaller content objects do not cross the boundaries of larger ones; thus a paragraph will not begin in one chapter and end in the next. For this reason, the structure of a document is a hierarchical one, like a tree or taxonomy. Smaller content objects that occur within a larger one, such as the sections within a chapter, or the paragraphs, block quotes, and other objects within a section, occur in a certain order. This ordering is essential information, and must be part of any model of text structure. Combining these essential elements, we can describe a text as an “ordered hierarchy of content objects,” or “OHCO”.

Contributed by Caroline. View changelog.

A few years later, also in support of SGML Renaer, Mylonas and Durand presented a famous paper entitled ‘Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies’ at Christ Church College, Oxford in 1992. Their ‘thesis’ stated that text is an ‘Ordered Hierarchy of Content Objects’, or ‘OHCO’ for short. In OHCO the ‘ordering’ comes from the fact that texts are linear: the objects of which they are composed succeed one another, and the objects themselves are hierarchical because structures like chapters, paragraphs, sentences and prose quotations ‘nest inside one another like Chinese boxes’. The key argument that lies behind the thesis, however, appears to be flawed: ‘if you treat texts as ordered hierarchies of content objets many practical advantages follow, but not otherwise. Therefore texts are ordered hierarchies of content objects.’ All this says about texts is that hierarchical structures are easily processed by computer. This follows in any case from the hierarchical nature of computable formal languages, but is also implied by Church’s model of computation from 1936: if all calculation can be modelled on recursion it follows that hierarchically organized data will be readily computable.

(Schmidt 2010, 341)

Contributed by Caroline. View changelog.