Connect

    mail icontwitter iconBlogspot iconrss icon

About the New Zealand Electronic Text Collection

Mission

Introduction

The New Zealand Electronic Text Collection comprises significant New Zealand and Pacific Island texts and materials held by Victoria University of Wellington Library. This encompasses both digitised heritage material and born-digital resources. The NZETC supports the teaching, learning and research activities at Victoria University of Wellington through: The digitisation of historical works held uniquely by the Victoria University Library with an emphasis on works created by Victoria; The support of the creation of born-digital resources created by Victoria. The texts made available on the NZETC are freely accessible to all researchers regardless of their affiliation with Victoria University of Wellington.

New resources are added to the NZETC according to a collection development policy which can be accessed on the Victoria University of Wellington Library website.

History

The NZETC, formally known as the New Zealand Electronic Text Centre, was created in 2002 as part of the School of English at Victoria University of Wellington. In 2004 the NZETC became part of the Library at Victoria University working on a range of digital initiatives alongside the Digital Services team who managed the library management system, web presence, intranet and specialist application support. In 2010 a new Library Technology Services team was established which took on strategic and operational responsibilities for all Library technology services and projects including the NZETC collections.

The Collection

The New Zealand Electronic Text Collection is a large and ever growing resource comprising of:

These texts are presented as they were published, making no attempt to correct or amend texts for reasons of spelling, factual correctness or otherwise. The aim for digitised texts is a character accuracy of 99.95%. Scanned images of the pages are available with some texts, so these can be checked alongside the digital text. Please contact us if you notice any errors.

New texts are added to the collection regularly, often as part of collaborations with partners within the University. These partners include:

Library Technology Services no longer seeks contract projects with external partners, but we continue to work as a partner with other libraries and digital content projects including Matapihi, Digital NZ, Creative Commons NZ, and the Kiwi Research Information Service.

Access and Downloads

Accessing NZETC texts

We have endeavoured to provide a collection which is easy to access, search and navigate through.

  • Our collection is indexed by search engines meaning you can access our texts using an appropriate search strategy on your favourite engine.
  • Each entity in our topic map has a directory page that lists mentions of this entity in other works throughout the collection.
  • Each time there is a hyperlinked name it will take you to this directory page.
  • Our solr search engine allows you to construct your own searches and refine them using facets. Facets include people, places, organisation, projects, language.
  • All our works are listed in corpora which are displayed on the projects page.
  • The NZETC uses a selection of subject headings to group texts together.
  • The Author and Works pages list all the authors and works in our collection.
  • DigitalNZ harvests our collection periodically. Searching the DigitalNZ website will often return NZETC texts.

We find that the majority of visitors to our collection find material on our site via a search engine, therefore we try to provide information about a given text in the side-bar appearing next to the digital text.

Downloadable NZETC Texts

NZETC texts can be downloaded in four different formats. Epub, PDF, TEI-XML and DAISY (Digital Accessible Information System) audio books. Unfortunately we can't offer all formats for each text.

  • Epub. Epub is a free and open standard for e-books and is accepted by a wide range of devices, such as ereaders and tablet computers. The Epub format can be converted into other formats if your device does not support Epub by installing software like Calibre on your PC.
  • PDFs. Many NZETC texts have downloadable PDF files for example the Typo collection, Nineteenth Century Novel Collection and JC Beaglehole Letters. Some PDFs contain page images of the original printed works where it is deemed important to see the original.
  • TEI-XML. TEI-XML is the language that we use to encode our texts. Each TEI-XML file is then transformed into HTML for display on our website.
  • DAISY. The NZETC has made available audio books in the DAISY format for some of our collection.

Epub files can also be read in your browser by installing plugins such as EPUBReader. Some collections on the NZETC (such as Sport Literary Journal) do not have downloadable epubs because we do not have permission to make the entire document available.

Technology

Introduction

XML and TEI are the document mark-up standards which underpin the work of the NZETC. Information on TEI can be found through the Text Encoding Initiative. Other key technologies used at the NZETC include:

  • XTM. XTM (XML Topic Maps) is the framework that topics in our texts are harvested into.
  • EATs. Entity Authority Toolsets or EATs is the toolset that we use to create entities (or topics). EATs also allows us to express relationships between entities. At present we have five different entity types; people, organisations, works, places and ships.
  • Apache Cocoon. Cocoon is the xml publishing framework that we use to publish this website.
  • Apache Tomcat. Tomcat is a Java servlet container that runs Cocoon.
  • Apache Solr. Solr is the platform we use to allow faceted searching of the NZETC collection.

More information is given below.

XML Topic Maps

Books, images, and collections are navigable through a dynamically-generated semantic framework, which represents the first release of a large-scale XML Topic Map (XTM) site in New Zealand. Users are able to move around the resources on the site tracking topics of interest rather than merely browsing the material linearly or through text searching. In a topic map, web-based resources are grouped around items called "topics", each of which represents some subject of interest.

Topics in a topic map are linked together with hyperlinks called "associations". There can be different types of association in a topic map, representing the different kinds of relationship in the real world. For instance, in the NZETC topic map, the topic which represents a particular person may be linked to a topic which represents a chapter of a book which mentions that person. This association is labelled to indicate that it represents a "mention". Similarly, the same person's topic might be linked to a particular photograph topic, via a "depiction" association.

To construct our topic map, we use XSLT stylesheets to extract metadata from each of our XML text files, and express it in the XTM format. In this way we automatically create hundreds of topic maps that describe our texts. We also harvest information about the entities contained in EATs. Finally we merge the harvested topic maps together to create a unified topic map which describes our entire website.

Each page on the website represents one of these topics, along with any associated topics. We use the open source TM4J Topic Map engine for merging and querying our topic map.

The Topic Map framework for the NZETC website was presented at the launch of the new information architecture on 5 May 2005. PowerPoint slides from the presentation are available. Papers on the NZETC technical infrastucture are available through the Victoria University ResearchArchive

Apache Cocoon and Tomcat

We use an XML publishing framework called Apache Cocoon to publish the NZETC website.

Cocoon is a Java servlet and hence it can be deployed on a wide variety of systems. We run Cocoon inside the Apache Tomcat servlet container (the official reference Implementation for the Java Servlet specification), using JVM version 1.6 from Sun Microsystems.

Cocoon offers a flexible environment based on the separation of concerns between content, logic and style. Cocoon can deliver documents in a variety of formats, including HTML, PDF, RTF, SVG, JPEG, PNG, and any other XML-based format. We use Cocoon to transform our XML texts into readable documents using XSLT stylesheets.

Cocoon can perform these transformations on demand; i.e. when a request is received from a web browser. Each request is handled by reading the appropriate XML document or documents, and processing the XML data in a succession of stages, first applying logical, then presentational transformations. Each stage is distinct and can be effectively managed by different people. Our web designer can edit the look of the site, the web developer can edit the structure of the site, and the text-editors can edit the content of the site (the e-texts), all independently of each other.

Apache Solr

We use Solr for faceted searching of our collection. The Solr search engine is a Java based engine and runs inside our Tomcat servlet container.

Contact Information

Reporting an error

Max Sullivan, Digital Projects Officer, Victoria University of Wellington Library
Email: Library-TechnologyServices@vuw.ac.nz
Phone: +64 04 463 9734

General inquiries

Michael Parry, Digital Initiatives Co-ordinator, Victoria University of Wellington Library
Email: Library-TechnologyServices@vuw.ac.nz
Phone: +64 04 463 9734

Digital Initiatives team
The Library
Victoria University of Wellington
P O Box 3438
Wellington 6140
New Zealand

NZETC Privacy Policy

Library Technology Services at Victoria University of Wellington makes use of Google Analytics in order to evaluate the usage of our site, and this information is useful in allowing us to:

  • determine which resources are heavily used, and so indicate areas that we should consider focusing future digitisation efforts upon;
  • determine which resources are lightly used, and so indicate areas where we should consider improving navigation and promotion of these resources;
  • measure the usage of particular resources so that we can provide feedback to those parties that are assisting us in making these resources available through financial or other support.

Google Analytics is a web analytics service provided by Google, Inc. ("Google"). Google Analytics uses "cookies", which are text files placed on your computer, to help the website analyze how users use the site. The information generated by the cookie about your use of the website (including your IP address) will be transmitted to and stored by Google on servers in the United States. Google will use this information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and internet usage. Google may also transfer this information to third parties where required to do so by law, or where such third parties process the information on Google's behalf. Google will not associate your IP address with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. By using this website, you consent to the processing of data about you by Google in the manner and for the purposes set out above.

If you wish to opt-out of cookies from Google you can on the Google site.

Should you like further information about this privacy policy please contact us.