Reconstructing Human Speech from Recorded Brain Activity Using Deep Learning

Speaker: Yao Wang, PhD

Location: 2 MetroTech Center, Room 911

Date: Tuesday, February 11, 2025

Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech and variability in locations where neural signals are sampled among different participants. In collaboration with Prof. Adeen Flinker from NYU Grossman School of Medicine, Prof. Wang and her research team have been developing approaches for reconstructing human speech from cortical signals obtained using intracranial electrodes. We will present a novel deep-learning-based framework that includes a neural decoder that translates cortical signals into interpretable speech parameters and a differentiable speech synthesizer that maps speech parameters to spectrograms. While our earlier approach, as with most prior works, can only work with electrodes on a dense 2D grid (i.e. Electrocorticographic (ECoG) array) and data from a single patient, our more recent neural decoder architecture can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes, and can be trained using data from multiple participants. Our framework generates natural-sounding speech and has a high decoding correlation with the ground truth spectrogram over a large cohort of participants with either ECoG electrodes or sEEG electrodes, or both. Furthermore, the models trained on multiple participants demonstrated generalizability to unseen participants. Our model can leverage temporal operations that are either causal (utilizing current and past neural signals), anticausal (current and future neural signals), or noncausal (combining both), and can achieve high decoding performance even when limited to causal operations, essential for real-time neural prostheses. Furthermore, contribution analysis of causal and anti-causal models enables us to disentangle feedforward motor control from auditory feedback processing in speech production, revealing a surprisingly mixed feedforward and feedback cortical recruitment during speech production. Prof. Wang’s presentation will highlight the technical advancements, neuroscientific insights, and translational potential of this groundbreaking research.