Speaker Identification for Business-Card-Type Sensors

被引:4
|
作者
Yamaguchi, Shunpei [1 ]
Oshima, Ritsuko [2 ]
Oshima, Jun [2 ]
Shiina, Ryota [3 ]
Fujihashi, Takuya [1 ]
Saruwatari, Shunsuke [1 ]
Watanabe, Takashi [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka 5650871, Japan
[2] Shizuoka Univ, Grad Sch Integrated Sci & Technol, Hamamatsu, Shizuoka 4328011, Japan
[3] NTT Corp, NTT Access Network Serv Syst Labs, Musashino, Tokyo 1808585, Japan
来源
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY | 2021年 / 2卷
关键词
Human activity recognition; sensor networks; speaker identification; speaker recognition; time synchronization; LOCALIZATION; RECOGNITION; NOISY;
D O I
10.1109/OJCS.2021.3075469
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human collaboration has a great impact on the performance of multi-person activities. The analysis of speaker information and speech timing can be used to extract human collaboration data in detail. Some studies have extracted human collaboration data by identifying a speaker with business-card-type sensors. However, it is difficult to realize speaker identification for business-card-type sensors at low cost and high accuracy because of spikes in the measured sound pressure data, ambient noise in the non-speaker sensor, and synchronization errors across each sensor. This study proposes a novel sound pressure sensor and speaker identification algorithm to realize speaker identification for business-card-type sensors. The sensor extracts the user's speech at low cost and high accuracy by employing a peak hold circuit and time synchronization module for spike mitigation and precise time synchronization. The algorithm identifies a speaker with high accuracy by removing ambient noise. The evaluations show that the algorithm accurately identifies a speaker in a multi-person activity considering varying numbers of users, environmental noises, and reverberation conditions as well as long or short utterances. In addition, the peak hold circuit enables accurate extraction of speech and the synchronization error between the sensors is always within +/- 30 mu s, that is, negligible error.
引用
收藏
页码:216 / 226
页数:11
相关论文
共 50 条
  • [1] Effects of phoneme type and frequency on distributed speaker identification and verification
    Fattah, Mohamed Abdel
    Ren, Fuji
    Kuroiwa, Shingo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (05): : 1712 - 1719
  • [2] NMF Based System for Speaker Identification
    Costantini, Giovanni
    Cesarini, Valerio
    Paolizzo, Fabio
    2021 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR INDUSTRY 4.0 & IOT (IEEE METROIND4.0 & IOT), 2021, : 620 - 624
  • [3] Hierarchical speaker identification using speaker clustering
    Sun, B
    Liu, WJ
    Zhong, QH
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 299 - 304
  • [4] A Deep Neural Network Model for Speaker Identification
    Ye, Feng
    Yang, Jun
    APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [5] Efficient speaker identification using spectral entropy
    Luque-Suarez, Fernando
    Camarena-Ibarrola, Antonio
    Chavez, Edgar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 16803 - 16815
  • [6] EMARATI SPEAKER IDENTIFICATION
    Shahin, Ismail
    Ba-Hutair, Mohammed Nasser
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 488 - 493
  • [7] An Approach to Speaker Identification
    Hollien, Harry
    JOURNAL OF FORENSIC SCIENCES, 2016, 61 (02) : 334 - 344
  • [8] Forensic Speaker Identification: a Tutorial
    Univaso, Pedro
    IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (09) : 1754 - 1770
  • [9] Speaker Re-identification with Speaker Dependent Speech Enhancement
    Shi, Yanpei
    Huang, Qiang
    Hain, Thomas
    INTERSPEECH 2020, 2020, : 1530 - 1534
  • [10] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    Milosevic, M.
    Nedeljkovic, Z.
    Glavitsch, U.
    Durovic, Z.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265