eBot

 

Background
The following extracts were taken from Sowing Seeds in the Digital Garden, a paper prepared for the 2006 conference: Sustainable Data from Digital Fieldwork: From creation to archive and back. The paper was authored by Murray Henwood, Su Hanfling, Rowan Brownlee, Belinda Pellow and Tristan Gutsche and may be read in its entirety at http://ses.library.usyd.edu.au/handle/2123/1298

 

Visual media have long been used in the plant sciences to support research and learning. Plant taxonomic decisions, for example, are often accompanied by a description, an illustration of the plant, and a photograph of a representative herbarium specimen. eBot will contain digital objects ranging from scanned dried herbarium specimens through to complex digital files such as those produced using cutting-edge technology like tomography.

The project team sought advice from a range of imaging and preservation agencies on standards for image conversion. eBot colour slide conversion specifications are based on requirements for digital preservation, high quality print reproduction, and accurate representation of colour. Slides will be scanned at 4000 dpi to create 24 bit master files in uncompressed TIFF format. Copies in JPEG format will be created for transmission over networks for use within services such as eFlora.

As eBot is intended to support archiving of high-value content, images in a preservation format such as uncompressed TIFF is optimal. However, it is understood that this requirement can not always be met, for some contributors might possess extensive collections of images of great research value in other formats. As these types of collections also require a secure and managed environment, eBot currently supports TIFF, PNG and JPEG image formats. If an image is accepted in a non-supported format, then the risks of format conversion will be evaluated before proceeding. If accurate conversion can not be guaranteed the original object will be retained and a converted copy will be made available for access. eBot will contain other media including sound, video and text. The project team are currently investigating support for audio and video standards.

Since metadata standards for botanical applications are still developing, the project team reviewed a range of general and subject-specific schema before finalising the core transitional metadata set for eBot. The resulting list of tags, which are suitable for current eBot functions and mappable to other schemas, will be refined over time as standards develop and the scope of the project expands. One development currently in progress is the mapping of the eBot descriptive metadata to ensure it aligns with the Australia's Virtual Herbarium (AVH) standard.

The project team considered a range of international technical and preservation metadata initiatives undertaken by organisations such as the National Library of Australia (NLA), Library of Congress (LC) and the National Library of New Zealand (NLNZ). Elements of the NLNZ preservation metadata schema were selected because it focuses on automatic collection of metadata and was designed to capture the most important information for digital preservation (Searle & Thompson, 2003). Schema development was informed by the work of other preservation agencies and its elements can be mapped to other technical schema. In addition to the NLNZ fields all other technical metadata that can be extracted from the digital objects will be stored. This is an interim strategy while internationally agreed standards are developed

References included in the extracts

Searle, S., & Thompson, D. (2003). Preservation Metadata : Pragmatic First Steps at the National Library of New Zealand [Electronic Version]. D-Lib Magazine, 9. Retrieved October 2006 from http://www.dlib.org/dlib/april03/thompson/04thompson.html

 

 

eXtensible Text Framework development
eBot was originally developed as a MySQL database and PHP application. During implementation, the project team raised concerns about the sustainability of the application and the retrieval speed and accuracy of indexing. I investigated use of XTF as an alternative indexing and web presentation platform. As a mature open source application supported by the California Digital Library, XTF has proven suitable for presenting digital objects and metadata. Live access to the XTF version of eBot is available

Relational-to-XML migration
eBot metadata is extensive and includes general description, copyright, taxonomy, location, morphology and technical data covering file specifications. I formulated an SQL query to retrieve for each item a complete set of data elements from the relational database and I wrote an XSLT stylesheet to render each record as a separate XML file.

XTF configuration
Transfer of the XML records to XTF proved straightforward. XTF ships with a set of example stylesheets and configuration files which I edited to accommodate the needs of eBot. Web presentation is highly flexible using XSLT.

Faceted browsing
XTF supports faceted browsing. Any metadata element may be identified as a facet and used to progressively refine a result-set. eBot is underpinned by an extensive taxonomic structure which when rendered by XTF provides faceted pathways through the collection.