Look who's talking: A comparison of automated and human-generated speaker tags in naturalistic day-long recordings

被引:30
作者
Bulgarelli, Federica [1 ]
Bergelson, Elika [1 ]
机构
[1] Duke Univ, 417 Chapel Dr,Box 90086, Durham, NC 27708 USA
关键词
LENA system; Talker variability; LENA system reliability; SPEECH; VARIABILITY; WORDS; CUES;
D O I
10.3758/s13428-019-01265-7
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
The LENA system has revolutionized research on language acquisition, providing both a wearable device to collect day-long recordings of children's environments, and a set of automated outputs that process, identify, and classify speech using proprietary algorithms. This output includes information about input sources (e.g., adult male, electronics). While this system has been tested across a variety of settings, here we delve deeper into validating the accuracy and reliability of LENA's automated diarization, i.e., tags of who is talking. Specifically, we compare LENA's output with a gold standard set of manually generated talker tags from a dataset of 88 day-long recordings, taken from 44 infants at 6 and 7 months, which includes 57,983 utterances. We compare accuracy across a range of classifications from the original Lena Technical Report, alongside a set of analyses examining classification accuracy by utterance type (e.g., declarative, singing). Consistent with previous validations, we find overall high agreement between the human and LENA-generated speaker tags for adult speech in particular, with poorer performance identifying child, overlap, noise, and electronic speech (accuracy range across all measures: 0-92%). We discuss several clear benefits of using this automated system alongside potential caveats based on the error patterns we observe, concluding with implications for research using LENA-generated speaker tags.
引用
收藏
页码:641 / 653
页数:13
相关论文
共 60 条
[41]   Predicting the birth of a spoken word [J].
Roy, Brandon C. ;
Frank, Michael C. ;
DeCamp, Philip ;
Miller, Matthew ;
Roy, Deb .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (41) :12663-12668
[42]  
Schuster S, 2014, IEEE W SP LANG TECH, P366, DOI 10.1109/SLT.2014.7078602
[43]   Acoustical cues and grammatical units in speech to two preverbal infants [J].
Soderstrom, Melanie ;
Blossom, Megan ;
Foygel, Rina ;
Morgan, James L. .
JOURNAL OF CHILD LANGUAGE, 2008, 35 (04) :869-902
[44]   Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants [J].
Soderstrom, Melanie .
DEVELOPMENTAL REVIEW, 2007, 27 (04) :501-532
[45]   When Do Caregivers Talk? The Influences of Activity and Time of Day on Caregiver Speech and Child Vocalizations in Two Childcare Environments [J].
Soderstrom, Melanie ;
Wittebolle, Kelsey .
PLOS ONE, 2013, 8 (11)
[46]   Association of the Type of Toy Used During Play With the Quantity and Quality of Parent-Infant Communication [J].
Sosa, Anna V. .
JAMA PEDIATRICS, 2016, 170 (02) :132-137
[47]   Production and perception of listener-oriented clear speech in child language [J].
Syrett, Kristen ;
Kawahara, Shigeto .
JOURNAL OF CHILD LANGUAGE, 2014, 41 (06) :1373-1389
[48]  
Taine H., 1876, Revue Philosophique de la France et de l'Etranger, P5
[49]   CHILDRENS SPEECH REVISIONS FOR A FAMILIAR AND AN UNFAMILIAR ADULT [J].
TOMASELLO, M ;
FARRAR, MJ ;
DINES, J .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1984, 27 (03) :359-363
[50]   MATERNAL SINGING IN CROSS-CULTURAL-PERSPECTIVE [J].
TREHUB, SE ;
UNYK, AM ;
TRAINOR, LJ .
INFANT BEHAVIOR & DEVELOPMENT, 1993, 16 (03) :285-295