Current biological sequence comparison tools utilize full database searches to find approximate matches between a database and a query. A new approach to sequence comparisons can be performed by indexing the database using a novel indexing scheme. An indexed scheme can immediately eliminate highly mismatched sequences thereby improving performance and accuracy. iBlast is proposed as an indexed version of BLAST. In its initial implementation, iBlast uses a sequence-based index to catalog genomic databases in an NCR Teradata RDBMS. Several types of indexes and querying methods are explored to determine the most efficient solution utilizing the parallel nature of the Teradata system. Significant speedups were obtained and are explained in further detail in this paper. Future indexing methods based on prokaryotic and eukaryotic genome structures are also proposed.
Raymer, M. L.,
Doom, T. E.,
Krane, D. E.,
& Futamura, N.
(2004). Indexing Genomic Databases. Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering, 587-591.