Dissecting neural computations in the human auditory pathway using deep neural networks for speech

被引:28
作者
Li, Yuanning [1 ,4 ,5 ]
Anumanchipalli, Gopala K. [2 ,3 ]
Mohamed, Abdelrahman
Chen, Peili [4 ,5 ]
Carney, Laurel H. [6 ]
Lu, Junfeng [7 ,8 ]
Wu, Jinsong [7 ,8 ]
Chang, Edward F. [1 ,2 ]
机构
[1] Univ Calif San Francisco, Dept Neurol Surg, San Francisco, CA 94115 USA
[2] Univ Calif San Francisco, Weill Inst Neurosci, San Francisco, CA 94143 USA
[3] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA USA
[4] ShanghaiTech Univ, Sch Biomed Engn, Shanghai, Peoples R China
[5] Shanghai Tech Univ, State Key Lab Adv Med Mat & Devices, Shanghai, Peoples R China
[6] Univ Rochester, Dept Biomed Engn, Rochester, NY USA
[7] Fudan Univ, Huashan Hosp, Shanghai Med Coll, Neurol Surg Dept, Shanghai, Peoples R China
[8] Fudan Univ, Neurosurg Inst, Brain Funct Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
MODELS; CORTEX; REPRESENTATIONS; ORGANIZATION; CONNECTIONS; PERCEPTION; RESPONSES; MEANINGS; NEURONS; OBJECTS;
D O I
10.1038/s41593-023-01468-4
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex. Using direct intracranial recordings and modern speech AI models, Li and colleagues show representational and computational similarities between deep neural networks for self-supervised speech learning and the human auditory pathway.
引用
收藏
页码:2213 / 2225
页数:30
相关论文
共 74 条
[1]   Representations of Pitch and Timbre Variation in Human Auditory Cortex [J].
Allen, Emily J. ;
Burton, Philip C. ;
Olman, Cheryl A. ;
Oxenham, Andrew J. .
JOURNAL OF NEUROSCIENCE, 2017, 37 (05) :1284-1293
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]   On the relationship between maps and domains in inferotemporal cortex [J].
Arcaro, Michael J. ;
Livingstone, Margaret S. .
NATURE REVIEWS NEUROSCIENCE, 2021, 22 (09) :573-583
[4]  
Baevski A, 2020, ADV NEUR IN, V33
[5]   The organization and physiology of the auditory thalamus and its role in processing acoustic features important for speech perception [J].
Bartlett, Edward L. .
BRAIN AND LANGUAGE, 2013, 126 (01) :29-48
[6]   Brain-optimized extraction of complex sound features that drive continuous auditory perception [J].
Berezutskaya, Julia ;
Freudenburg, Zachary V. ;
Guclu, Umut ;
van Gerven, Marcel A. J. ;
Ramsey, Nick F. .
PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (07)
[7]   At 6-9 months, human infants know the meanings of many common nouns [J].
Bergelson, Elika ;
Swingley, Daniel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (09) :3253-3258
[8]   Speech Computations of the Human Superior Temporal Gyrus [J].
Bhaya-Grossman, Ilina ;
Chang, Edward F. .
ANNUAL REVIEW OF PSYCHOLOGY, 2022, 73 :79-102
[9]   Ultra-fine frequency tuning revealed in single neurons of human auditory cortex [J].
Bitterman, Y. ;
Mukamel, R. ;
Malach, R. ;
Fried, I. ;
Nelken, I. .
NATURE, 2008, 451 (7175) :197-U9
[10]  
Boersma Paul, 2021, Praat: doing phonetics by computer [Computer program]. Version 6.2.10, DOI DOI 10.1097/AUD.0B013E31821473F7