Development and Application of a High Throughput Natural Language Processing Architecture to Convert All Clinical Documents in a Clinical Data Warehouse into Standardized Medical Vocabularies
This on-demand webinar does not offer CE credit.
Afshar M, Dligach D, Sharma B, et al. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies. J Am Med Inform Assoc. 2019 May 30. pii: ocz068. doi: 10.1093/jamia/ocz068.
Watch the Recording
Presenters
Moderator
Managers
Statement of Purpose
Information in the clinical narrative of the electronic health record (EHR) is a rich source of data and comprises a large majority of patient data, but its unstructured format renders it complex and difficult to utilize. Clinical data warehouses of health systems are becoming larger and more efficient in today’s health data ecosystem; therefore, high throughput architectures to manage and process the data are needed. Large-scale efforts at de-identification of clinical notes and curation of the data for research purposes are underway in the National Center for Advancing Translational Sciences (NCATS). Methods in natural language processing (NLP) have proven effective in automatic semantic analyses of clinical documents with concept mapping to standardized medical vocabularies. Several centers have demonstrated success in high throughput NLP but little guidance exists on optimizing their performance for an entire health system. We aim to develop a high throughput NLP architecture using the cTAKES engine to concept map over ten years of clinical documents from our CDW using the Unified Medical Language System (UMLS). Second, we aim to examine the application of our architecture in the context of a hospital 30-day readmission prediction task.
Our high throughput NLP architecture converted our health system’s data corpus of over 84 million unstructured clinical notes into a completely de-identified data repository of nearly 40 billion structured and standardized data elements. This task was accomplished at a rate of over 500,000 documents per hour through our on-premise data center. The result for predicting 30-day hospital readmission demonstrate that mapped concepts from UMLS performed similar to n-grams. The processed data is a new addition to our clinical research database for researchers and administrators interested in data mining and analytics from any note or report. This may be more appealing for end-users and researchers interested in using clinical notes from their CDW, and our results suggest that CUI features with standardized medical vocabulary is one option for large-scale clinical research in data analytics.
Target Audience
The target audience for this activity is professionals and students interested in biomedical and health informatics.
Learning Objectives
The general learning objective for all of the JAMIA Journal Club webinars is that participants will
- Use a critical appraisal process to assess article validity and to gauge article findings' relevance to practice
After this live activity, the participant should be better able to:
- Understand how to design a high throughput NLP architecture to produce a deidentified clinical data warehouse of a health system’s corpus of notes converted into standardized medical vocabularies, and
- Apply concept unique identifiers (CUIs) from a big database/clinical data warehouse to perform data analytics such as applied predictive modelling or phenotyping tasks.
This JAMIA Journal Club does not offer continuing education credit.
In our dedication to providing unbiased education even when no CE credit is associated with it, we provide planners’ and presenters’ disclosure of relevant financial relationships with commercial interests that has the potential to introduce bias in the presentation:
Disclosures for this Activity
These faculty, planners, and staff who are in a position to control the content of this activity disclose that they and their life partners have no relevant financial relationships with commercial interests:
JAMIA Journal Club presenters: Majid Afshar, Ron Price, Jr.
JAMIA Journal Club planners: Michael Chiang, Kelson Zawack, Tiffany J. Callahan, Daniel Feller
AMIA staff: Susanne Arnold, Pesha Rubinstein