Document Type

Conference Proceeding

Publication Date



Nuclear magnetic resonance (NMR) spectroscopy is a non-invasive method of acquiring a metabolic profile from biofluids. This metabolic information may provide keys to the early detection of exposure to a toxin. A typical NMR toxicology data set has low sample size and high dimensionality. Thus, traditional pattern recognition techniques are not always feasible. In this paper, we evaluate several common alternatives for isolating these biomarkers. The fold test, unpaired t-test, and paired t-test were performed on an NMR-derived toxicological data set and results were compared. The paired t-test method was preferred, due to its ability to attribute statistical significance, to take into consideration consistency of a single subject over a time course, and to mitigate the low sample, high dimensionality problem. We then grouped the resulting statistically salient potential biomarkers based on their significance patterns and compared results to several known metabolites affected by the tested toxin. Based on these results, we present a statistical protocol of sequential t-tests and clustering techniques for identifying putative biomarkers. We then present the results of this protocol applied to a specific real world toxicological data set.


Presented at the 7th IEEE International Conference on Bioinformatics and Bioengineering, Boston, MA, October 14-17, 2007.

Posted with permission from IEEE.