Towards Comprehensive Longitudinal Healthcare Data Capture

Document Type

Conference Proceeding

Publication Date


Find in a Library

Catalog Record


The ability to connect the dots in structured background knowledge and also across scientific literature has been demonstrated as a critical aspect of knowledge discovery. It is not unreasonable therefore to expect that connecting-the-dots across massive amounts of healthcare data may also lead to new insights that could impact diagnosis, treatment and overall patient care. Of critical importance is the observation that while structured Electronic Medical Records (EMR) are useful sources of health information, it is often the unstructured clinical texts such as progress notes and discharge summaries that contain rich, updated and granular information. Hence, by coupling structured EMR data with data from unstructured clinical texts, more holistic patient records, needed for connecting the dots, can be obtained. Unfortunately, free-text progress notes are fraught with a lack of proper grammatical structure, and contain liberal use of jargon and abbreviations, together with frequent misspellings. While these notes still serve their intended purpose for medical care, automatically extracting semantic information from them is a complex task. Overcoming this complexity could mean that evidence-based support for structured EMR data using unstructured clinical texts, can be provided. In this work therefore, we explore a pattern-based approach for extracting Smoker Semantic Types (SST) from unstructured clinical notes, in order to enable evidence-based resolution of SSTs asserted in structured EMRs using SSTs extracted from unstructured clinical notes. Our findings support the notion that information present in unstructured clinical text can be used to complement structured healthcare data. This is a crucial observation towards creating comprehensive longitudinal patient models for connecting-the-dots and providing better overall patient care.


Presented at the IEEE International Conference on Bioinformatics and Biomedicine Workshops, Philadelphia, PA, October 4-7, 2012.



Catalog Record