Technical Information
Texas Heritage Online is a federated search portal for Texas libraries, archives, and museums with digital collections of cultural heritage materials. Most statewide digitization projects have created one or more centralized repositories of digital objects. In contrast, THDI will provide a single interface to decentralized repositories across the state hosted by libraries, archives, museums, and state agencies. The project has three components: the Texas Heritage Online Z39.50/SRU federated search application; an OAI harvester operated by the University of North Texas Libraries; and custom indexing options for collections that cannot be accessed by other means. This multi-component approach reduced the retooling needed to participate in the initiative while allowing historians, researchers, and students access to Texas heritage materials from a single interface.
Currently, there are three ways to provide a search interface. The first is to develop software routines, commonly called "bots" or "spiders," to index Web-accessible materials. This index can then be searched and links provided to the original resources. This approach is the one currently taken by most major Web search engines. The second approach is most common in the library community and involves the use of a common query language and a standard interoperability profile developed according to the Z39.50 protocol to search disparate resources in real-time. The third approach is the newest and involves the use a metadata harvester that accesses pre-identified collections and collects their metadata, building a single index to the metadata with links to the original objects. Unfortunately, Web resources, databases, and digital asset management systems do not support all of these interoperability options. Our approach is to build an interface that will use the appropriate approach for each type of collection and present the results to the user in a seamless fashion.
The software used in Texas Heritage Online has been developed by Index Data of Denmark, based on their Keystone Retriever product, which was also used for the Texas State Library and Archives Commission's Library of Texas project. Key features of this product include:
- Full support for Z39.50 and SRU/SRW search, based on the Index Data YAZ toolkit
- Metadata normalization in simple Dublin Core or any other desired format
- Dynamic, almost immediate display of broadcast searching results
- Database probing support for identification of problems with search targets via Z-Spy software
- Support for open source and open standards
Custom modifications to the Keystone Retriever product will allow the creation of browse lists, overcoming some of the difficulties caused by our lack of controlled vocabularies across repositories. Metadata is normalized during the retrieval process; each target collection is identified as using a particular metadata schema, and that scheme is transformed into Dublin Core for display purposes. In the next phase of development, we plan to add ranking and merging of search results along with custom collections. Future support for microformats is possible, along with the creation of a search widget that can be placed on any site to access the collections available in Texas Heritage Online.
