Golder Encoding Conventions
The ‘Golder Project’ involves publishing William Golder’s poetry in electronic form, together with contextual material that will assist a reader to interpret this early New Zealand poetry.
There are four volumes of poetry, whose short titles are: NZ Minstrelsy (1852); Pigeons’ Parliament (1854); NZ Survey (1867); and Philosophy of Love (1871).
The electronic version of these volumes is in the form of XML files created in accordance with the TEI guidelines (TEI, 2003).
These specifications have been developed through the initial work on NZ Minstrelsy and then revised in the light of issues raised by NZ Survey. They are intended to provide a consistent framework for the encoding of all the volumes.
Because typographical practice has changed since these texts were published, the punctuation has been modified to conform with modern practice. The notable change is the exclusion of a thin space before each semicolon, colon, exclamation mark or question mark, as well as after opening quotation marks. Where the original text has an obvious typographical error this is marked up in the electronic text.
The <TEI.2 ...> tag specifies the main language for the text
(TEI, 2003: para 3.5):
Each volume is divided into three major blocks, 'front', 'body', and 'back'.
The 'front' contains the material from all pages in front of the poetry. Typically, this includes the title page(s), preface, and contents. Some volumes have other 'front' material, for example, NZ Survey has a dedication and a prospectus.
The 'back' contains the material from the pages after the poetry. For example, NZ Minstrelsy has a list of subscribers, an errata, and a prospectus.
Each major block is divided into 'divisions'. For example:
<titlePage type="full" id=""></titlePage>
Other major divisions:
The content of the 'id' attributes is described in 'ID attributes' below.
The front and back matter each have several major divisions, such as 'preface' or 'contents'. The body has one or more major divisions, depending on the structure of the volume. For example, the body of NZ Minstrelsy has two major divisions, identified as 'New Zealand Minstrelsy' and 'Appendix', whereas the body of NZ Survey has a single major division.
Subdivisions (Poems, Cantos)
Within the 'body', each poem or song is placed in a separate subdivision using the <div2> tag. For example:
- poems that don’t have a ‘tune’:
<div2 type="poem" n="" id=""></div2>
- poems that have a ‘tune’:
<div2 type="song" n="" id=""></div2>
Some poems are divided into cantos, for example, the title poem
in NZ Survey. These subdivision are identified with
<div3 type="canto" n="1" id=""></div3>
In general, the major divisions of the ‘front’ and ‘back’ blocks are not subdivided unless they contain poetry.
Line Groups (Stanzas)
The <lg> tag identifies stanzas and line groups within stanzas.
<lg type="stanza" n=""></lg> <lg type="chorus" n=""></lg>
When a stanza consists of identifiable blocks these are marked up as ‘nested’ line groups. In the following example, the mark-up has been simplified to show the line group structure:
<lg type="chorus" n="1"> <l n="1">Let's go a bushranging, thou fairest of lassies ;</l> <l n="2">Let's go a bushranging, and visit each scene,</l> <l n="3">Whose beauties unchanging which nought o'er surpasses,</l> <l n="4">While clad in mantles of gay evergreen.</l> </lg> <lg type="stanza" n="2"> <lg> <l n="5">The morning delights us, all nature invites us,</l> <l n="6">To taste her enjoyments wherever we rove ;</l> <l n="7">Then, come, let us wander where streamlets meander,</l> <l n="8">Or through the dark forest, or pine shady grove.</l> </lg> <lg> <l n="9"><seg>Chorus—</seg>Let's go a bushranging, &c.</l> </lg> </lg>
(Golder, 1852: 'A Bushranging')
Each line of poetry is identified using the line tag, <l>.
A ‘line’ is considered to begin at the first letter or punctuation mark, not at the page margin. This ensures that search facilities will return only the text of the poetry itself.
<l n="1">Through Hutt's vale the Erratonga</l> (Golder, 1852, 'Erratonga')
Some poems have text which is not properly part of the
poetry. To ensure that the search facilities can remove this
additional text, it is identified using the
element, for example (mark-up simplified):
<l n="9"><seg>Chorus—</seg>Let's go a bushranging, &c.</l>
(Golder, 1852, 'A Bushranging')
Page and Line breaks
Page break tags are inserted wherever there is a page break in
the original text. When a new poem begins on the next page, the
<pb> tag is placed between the end of the previous
division and the beginning of the next:
</div2> <pb id="pgGOLMIN027" n="23"/> <div2 type="song" n="15" id="">
(Golder, 1852: pg. 23)
The page break number and id refer to page following the
<pb> tags may be inserted anywhere in the XML
file, and if the page break occurs within a poem, then the tag is
inserted between the appropriate lines and/or line groups, for
<l n="16">The night-bird croaks its song.</l> <pb id="pgGOLMIN017" n="13"/> <l n="17">The clearing, filled with golden grain,</l>
(Golder, 1852: 'The Bushman’s Harvest Home')
Line Breaks and Hyphenation
Many output formats do not use the original line breaks and hyphenation. To keep the option of providing outputs which do, line breaks and hyphenation are marked up so that style sheets can be programmed to either respect or ignore the original layout. This is achieved using the orig element. Example:
employment, when I used to sit in my lonely bush<orig reg=" " rend="line_break"><lb/></orig> cottage musing over the fire in the long winter <orig reg="evenings" rend="line_break">even-<lb/>ings</orig>. As the composing of the several pieces then<orig reg=" " rend="line_break"><lb/></orig>
(Golder, 1852, 'Preface')
To distinguish between this and other uses of the <orig> tag
(see below), the
rend attribute is used to instruct
style sheets that these tags contain information about line breaks
Similarly, original line breaks in poetry are identified as follows:
<l n="1">Thwack, thwack, bounds the flail now on ev'ry thrashing<orig reg=" " rend="line_break"><lb/></orig>floor,</l>
(Golder, 1852, 'The Thrashing Floor')
In the Golder texts, epigraphs are usually quotations from the work of other poets. These are marked up as follows:
<epigraph rend="lindent(1.5)"> <lg> <l><seg>“</seg>Ah ! who can tell how hard it is to climb</l> <l>The steep, where fame’s proud temple shines afar.<seg>”</seg></l> </lg> <ab><hi rend="right"><name type="person">B<hi rend="smallcaps">EATTIE.</hi></name></hi></ab> </epigraph>
(Golder, 1852: Appendix pg. iii)
Note that the quotation marks are distinguished from the text
of the poetry itself using
id attribute is used for several
purposes, including identifying the location of footnotes and
figure references and providing points to which hyperlinks can
take the reader.
To satisfy the requirement for
id attribute values
to be unique, the following system is used throughout the Golder
The first part of each
id attribute value consists
of a prefix followed by the code for the text. The following types
of id attributes are used in NZ Minstrelsy:
|Location of footnote||fntgGOLMIN..|
A numerical suffix completes the
value. For example, a page break in NZ Minstrelsy is
marked up like this:
<pb id="pgGOLMIN027" n="23"/>
(Golder, 1852: pg. 23)
ID attribute values for NZ Survey use 'GOLNZS'. Values for the other two volumes are not yet defined.
Footnotes & Figure References
The TEI poetry tags place severe limitations on the type of material which can appear within a block of poetry. Nineteenth-century typesetters faced no such limitations, and could mix poetry and prose at will. For example, in NZ Minstrelsy, some pages contain footnotes which from an etext perspective appear in the middle of a poem.
Because of the difficulties of mixing poetry and prose in an XML file, items such as footnotes and illustrations (including scanned images of the original text) are separated from the poetry, and placed in a separate division at the end of the electronic text.
Figure references are linked to the page breaks so that the style sheet can insert a thumbnail of the figure at the top of each ‘page’ of the output file.
To do this, each figure reference is placed inside a
note block which is linked to the appropriate page
<note type="illustration" target="pgGOLMIN027"> <figure entity="GOLMIN027"> <figDesc>"New Zealand Minstrelsy": Page 23.</figDesc> </figure></note>
target attribute matches the
attribute of the corresponding page break.
Footnotes are inserted at the end of the division to which they belong. For example:
<note type="footnote" target="fntgGOLMIN02" id="fnGOLMIN02"> * Alluding to... in former years. </note> </div2>
(Golder, 1852: 'Stanzas, Written while on the voyage...') (text has been abbreviated)
target attribute value matches the
id attribute of an
showing where the footnote should be inserted in the output
Where the text has a reference to a footnote, a hyperlink is provided.
The reference from the text to the footnote is marked using the
<ref> tag. The
target attribute of the
<ref> matches the
id attribute of the
appropriate footnote. Example:
<l n="73">Oh happy plan!<ref type="footnote" target="fnGOLMIN02">*</ref>—ingenuously devised!—</l>
(Golder, 1852: 'Stanzas, Written while on the voyage...')
values are numbered sequentially from 01, in the order in which
the footnotes appear in the original text. Where possible, the
numeric suffix of the
id attributes are
Words and Phrases
There are at least two ways in which regularised spelling can be used. First, it allows a word search to return words with abbreviated or archaic spellings. Second, it makes it possible to provide a ‘glossed’ version of the text for readers unfamiliar with nineteenth century English, or with the dialect used by the author.
A variety of methods are used to regularise spelling, depending on the word involved. 'Abbreviations' and 'Linguistically Distinct' words are covered in separate sections below.
orig tags are used in
preference to their respective mirrors,
reg. This ensures that if the XML tags are removed,
the etext remains faithful to the original.
When a word has unusual or obsolete spelling (with respect to
?Oxford, 2001?), and is not otherwise marked up, the
<orig reg="regularised spelling"></orig>
construction is used:
(Golder, 1852: 'Come to the Bush')
If the original text was capitalised, the regularised spelling is capitalised so that if the regularised spelling is substituted for the original in a specific output text, the capitalisation is retained.
<abbr expan="To escape" type="elision">T'escape</abbr> <abbr expan="to enjoy" type="elision">t'enjoy</abbr>
Sometimes, a composite mark up is necessary, for example:
<orig reg="showed"><abbr expan="shewed" type="elision">shew'd</abbr></orig>
(Golder, 1867: 'New Zealand Survey')
Archaic pronouns such as 'thee', 'thy', 'ye', etc. are not marked up.
When the marked word is a possessive, the entire word is marked up, for example:
(Golder, 1867: 'New Zealand Survey')
William Golder’s poetry contains a large number of elisions. These are marked up as in the following examples:
<abbr expan="beneath" type="elision">’neath</abbr> <abbr expan="it is" type="elision">’tis</abbr> <abbr expan="passed" type="elision">pass’d</abbr> <abbr expan="dolorous" type="elision">dol’rous</abbr> <abbr expan="to engulf" type="elision">t’engulf</abbr>
(Golder, 1852: 'Stanzas Extemporaneously written on a stormy night... ')
Elisions which involve modern personal pronouns and auxiliary verbs (he’ll; I’ll) and modern elisions (can’t, that’s) are not marked up.
One of the ways that abbreviation mark up is used is for word searching. When two words have been elided, the mark up always includes both words, irrespective of whether there is a space between them in the original text or not. This ensures that the search device can identify them as separate words. For example:
<abbr expan="The effect" type="elision">Th’ effect</abbr>
(Golder, 1852: 'A Desperate Case.')
<abbr expan="to engulf" type="elision">t’engulf</abbr>
(Golder, 1852: 'Stanzas Extemporaneously written on a stormy night... ')
Spelling and Typesetting errors and quirks
In general, unusual spellings are assumed to be intentional,
and are marked up with the
When it is reasonably certain that the text has an error, the <sic> tag is used. Examples:
(Golder, 1867: 'Preface')
<sic corr="scribbling" cert="low">scribling</sic>
(Golder, 1867: 'Preface')
<sic corr="message" cert="high">messsge</sic>
(Golder, 1867: 'New Zealand Survey, Canto Fifth')
cert attribute records the degree of
certainty. In Golder etexts, the possible values are
certain indicating that the poet has published a
<sic corr="as" cert="certain">so</corr>
(Golder, 1852: 'Sweet Home'. Correction in Golder, 1852: 'Errata')
Where the word is sometimes spelt correctly, and occasionally incorrectly according to modern conventions, then it is marked <sic corr="" cert="high">.
Linguistically Distinct English Words and Phrases
<distinct> tag is used for English words
that are 'linguistically distinct'. This includes archaic words
and words from the Scots dialect, but excludes words from
languages such as Maori or Latin (see below).
The <distinct> tag has no attribute for providing a gloss in modern NZ English. Where a gloss or regularised equivalent can be found, it is tagged using the <orig> tag. This means that linguistically distinct features are usually tagged with nested <distinct> and <orig> tags. Example:
<distinct type="Scots"><orig reg="from">frae</orig></distinct>
(Golder, 1852: 'Donald's Return')
<distinct type="archaic"><orig reg="gladly">fainly</orig></distinct>
(Golder, 1852: 'The Christian's March')
Obsolete spellings such as 'pourtray', engulph, 'shew' are not considered linguistically distinct, since they are merely old spellings of current English words.
Values for the type attribute in the Golder edition are: archaic; dialect; literary; Scots. Where possible, 'archaic' and 'Scots' are preferred, since glosses from dictionaries giving 'literary' or 'dialect' forms may less accurately reflect the poet's native vocabulary.
Languages other than English
Words from languages other than English are preferably identified using the 'lang' attribute which may be applied to any tag. In cases where the electronic text will not provide an English gloss, the <foreign> tag is available. For example:
<foreign lang="LA">ad hoc</foreign>
But the <orig> tag is preferred, as follows:
<orig reg="tummy" lang="MI">puku</orig>
lang attribute is global and may be applied to
any tag, however, its contents must match the
attribute of one of the languages declared in the TEI header, see
TEI, 2003: 5.4.2.
<langUsage> <language id="EN">English</language> <language id="ES">Spanish</language> <language id="MI">Maori</language> <language id="LA">Latin</language> </langUsage>
id attributes follow the codes defined in ISO 639.
<name type="" reg=""></name>
Values for the type attribute in the Golder edition are:
Where appropriate, the language is also identified, using the codes declared in the TEI header. (see "Languages" above).
<name type ="place" lang="MI" reg="Heretaunga">Erratonga</name> <name type="place" lang="LA" reg="Scotland">Scotia</name> <name type="place">Criterion Hotel</name>
(Golder, 1867: pg. 77)
Certain characters are specified using entity references. There are two reasons for this. First, some characters such as '&' are reserved for use by the computer system, and cannot appear explicitly in the body of the text. Second, some characters are not readily available on regular computer keyboards.
These are the ampersand and angle brackets.
The codes are:
& & ampersand < < left angle bracket > > right angle bracket
Single and double quotation marks are also reserved; however, these characters are not used in the body of the Golder etexts (see below).
Characters not available on regular keyboards
Quotation Marks and Apostrophes
The quotation marks available on the regular computer are ambiguous because the same character is used for opening and closing a quotation. The Golder etexts use the following unambiguous characters:
|‘||‘||single turned comma, left single quotation mark|
|’||’||apostrophe, left single quotation mark|
|“||“||left double quotation mark|
|”||”||left double quotation mark|
<l>The wisdom of its nature we’ll discuss</l>
(Golder, 1871: The Philosophy of Love, Canto First)
The Golder texts make copious use of the em dash, that is, a dash which is one em in width. The regular computer keyboard does not have such a character, and so it is specified using an entity reference.
<l n="15">Might he yet return? Ah! never!—</l>
(Golder, 1852, "The Penitent's Prayer")
Appendix — Interim mark-up
NZ Minstrelsy contains some interim mark-up.
Dummy spaces inserted in front of the line of poetry simulate proper indentation. These are marked as padding characters. Example:
<l n="4"><seg rend="padding"> </seg>Clothes with grandeur both its sides ;</l>
These will need to be replaced with proper formatting facilities before significant interactivity is introduced.
Space between tags
Most HTML systems assume that white space between tags is irrelevant, and thus, where two adjacent words have been tagged, the system tends to lose the space between them. To solve this problem, non-breaking space characters have been inserted between adjacent tags. When the style sheets have been properly programmed, these characters may be removed. Example:
<l>Him, who <abbr expan="never" type="elision">ne’er</abbr> <abbr expan="listened" type="elision">listen’d</abbr> to the voice of praise,</l>
For the Golder Editorial Group
21 January 2004
These specifications differ from those employed in other NZETC texts in the following ways:
Values of id attributes should be unique. In NZ Survey, the attribute values are unique within the text, but are not distinguished from other texts. To ensure consistent linking within and between Golder texts, the naming convention described above incorporates the text code (eg 'GOLMIN') into the id attribute value.
In previous NZETC documents, original line breaks are marked up using the <orig> tag. We follow the same practice but, because the <orig> tag is also required for regularisation of spelling which will support search facilities and other critical apparatus, we have had to introduce a 'rend' attribute to allow style sheets to treat <orig> tags with line break information independently of others. We have investigated alternatives to this method, but these do not appear to be feasible in XML.
The NZETC usually puts scanned image thumbnails below the appropriate page breaks; however, with the TEI poetry tag set, this became complex. Rather than attempt to document all the rules for inserting figure references, it seems simpler to put them in their own section and have the style sheet move them into the appropriate locations. Similar issues apply to footnotes.
Existing etexts have a mixture of keyboard quotation marks and the preferred 'rounded' marks specified above. Techbooks are using the rounded double quotation marks but the 'keyboard' single marks.
Some elisions include highlighted text, and so it is difficult to mark up these elisions as specified above. Some testing may be required to confirm the final mark-up standard for these. This first arose with Philosophy of Love.
- Abrams, 1981
- Abrams, M.H.: A Glossary of Literary Terms (4th edition). Japan: CBS Publishing Japan, 1981.
- Golder, 1838
- Golder, William: Recreations for Solitary Hours, consisting of Poems, Songs and Tales, with Notes. Glasgow: George Gallie; Edinburgh: W. Oliphant & Son; London: Simpkins, Marshall & Co; Dublin: J. Robertson; 1838. A microfilm copy is held at the Alexander Turnbull Library, Wellington.
- Golder, 1852
- Golder, William: The New Zealand Minstrelsy: Containing Songs and Poems on Colonial Subjects. Wellington, NZ: R. Stokes and W. Lyon, 1852. The digitised copy is from the Hocken Library, Dunedin.
- Golder, 1854
- Golder, William: The Pigeon's Parliament; A Poem of the Year 1845. In Four Cantos with Notes. To which is added, Thoughts on the Wairarapa, and Other Stanzas. Wellington, NZ: W. Lyon, 1854. The digitised copy is from the Alexander Turnbull Library, Wellington.
- Golder, 1867
- Golder, William: The New Zealand Survey: A Poem in Five Cantoes. With Notes Illustrative of New Zealand Progress and Future Prospects. Also the Crystal Palace of 1851; A Poem in Two Cantoes. With other Poems and Lyrics. Wellington, NZ: J. Stoddard and Co, 1867. The digitised copy is from the Alexander Turnbull Library, Wellington.
- Golder, 1871
- Golder, William: The Philosophy of Love. [A Plea in Defence of Virtue and Truth!] A Poem in Six Cantos, with Other Poems. Wellington, NZ: W. Golder, 1871. The digitised copy is from the Alexander Turnbull Library, Wellington.
- Southward, 1980
- Southward, J.: Practical Printing: A Handbook of the Art of Typography. New York, NY: Garland Publishing, Inc., 1980. Facsimile edition, originally published 1882.
- TEI, 2003
- TEI P4 - Guidelines for Electronic Text Encoding and Interchange (XML-compatible edition). Edited by Sperberg-McQueen, C.M.; & Burnard, L. XML conversion by Syd Bauman, S.; Burnard, L.; DeRose, S.: & Rahtz, S. TEI Consortium, 2003: http://www.tei-c.org.uk/P4X/index.html. Visited 23 November 2003.
- Unicode, 2003
- Unicode Home Page. Mountain View, CA: Unicode, Inc.; 2003. http://www.unicode.org/. Visited 26 November 2003.
- Unicode, 2003a
- The Unicode Consortium. The Unicode Standard, Version 4.0.0, defined by: The Unicode Standard, Version 4.0. Boston, MA: Addison-Wesley; 2003.
- Unicode, 2003b
- Online version of Unicode, 2003a. http://www.unicode.org/. Visited 26 November 2003.
- XML, 1.0
- Bray, T.; Paoli, J.; Sperberg-McQueen, C. M.; & Maler, E.: Extensible Markup Language (XML) 1.0 (Second Edition) W3C Working Draft. Cambridge, MA: World Wide Web Consortium, 2003. http://www.w3.org/TR/WD-html40-970708/cover.html, visited 20 Dec 2003.