Using Workflow to Build an Information Management System for a Geographically Distributed Genome Sequencing Initiative

Document Type

Book Chapter

Publication Date


Find in a Library

Catalog Record


Genome projects are very demanding undertakings and are often carried out collaboratively by multiple research centers. There are many different types of tasks that must be performed both by researchers and automated tools. These include such activities as shotgun sequencing, sequence finishing, sequence similarity searches, data annotation, oligonucleotide synthesis, library construction, and data submission. These individual tasks are organized into workflows that carry out a particular function, such as providing an annotated sequence of a cosmid clone. The individual tasks may be carried out at one or more of the participating institutions. A single workflow may be spread across multiple research centers. Creating software systems to support distributed workflows presents developers with a number of challenges, such as coordinating the execution of applications running on different systems, transporting data between systems, integrating legacy applications, providing recovery mechanisms, and creating user interfaces. Additionally, there will likely be frequent change to the organizational procedures, especially in the early stages of a genome project. This paper discusses using a general purpose workflow management system (WfMS) to address these challenges in the implementation of an information system that manages a geographically distributed genome project. A prototype application built with the METEOR WfMS is described which is running on several systems at the University of Georgia.

Catalog Record