CILVR Seminar: Leveraging Large Datasets and LLMs to Improve Healthcare and Health Equity

Speaker: Irene Chen

Location: 60 Fifth Avenue, Room 7th floor open space

Date: Wednesday, October 16, 2024

The proliferation of medical data and the advancements of large language models (LLMs) promise to revolutionize healthcare; however, ensuring accurate, robust, and equitable healthcare remains a significant challenge. In this talk, I will present recent work on two critical aspects of this evolving landscape. First, I will examine the unexpected consequences of multi-source data scaling. Counter to intuition, adding training data can sometimes reduce overall accuracy, produce uncertain fairness outcomes, and diminish worst-subgroup performance. These findings underscore the complexity of working with disparate data sources in healthcare AI. Next, I will showcase applications of LLMs to improve healthcare. Through participatory design with healthcare workers and patients, we developed guiding principles for LLM use in maternal health. Additionally, we demonstrate how LLMs can help develop dynamic treatment protocols by extracting rationales for treatment protocols using clinical notes. The talk concludes by emphasizing vigilance and ethical considerations as we advance towards more data-driven and AI-assisted healthcare.