Autonomous Framework for Person Identification by Analyzing Vocal Sounds and Speech Patterns

被引：0

作者：

Hassan, Bilal ^{[1
]}

Ahmed, Ramsha ^{[2
]}

Li, Bo ^{[3
]}

Hassan, Omar ^{[4
]}

Hassan, Taimur ^{[5
]}

机构：

[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing, Peoples R China

[3] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

[4] Sir Syed CASE Inst Technol SSCIT, Dept Elect & Comp Engn, Islamabad, Pakistan

[5] Natl Univ Sci & Technol NUST, Dept Comp & Software Engn, Islamabad, Pakistan

来源：

CONFERENCE PROCEEDINGS OF 2019 5TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR) | 2019年

基金：

国家重点研发计划;

关键词：

speech processing; cepstrum; Support Vector Machines (SVM); SPEAKER IDENTIFICATION;

D O I：

10.1109/iccar.2019.8813463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech processing has emerged as one of the important and crucial domain over the past decade. Many researchers have worked on voice recognition and verification. Some of the reported work has been done in the field of biometrics. However, this paper proposes an autonomous algorithm for the person identification by analyzing their vocal sounds and speech patterns. First, the input voice signal is introduced to our proposed system from which the low frequency contents are extracted using finite response low pass filter based on hamming window. Then the proposed system performs a cepstral analysis and extracts two distinct features from the signal spectrum i.e. the maximum pitch frequency and maximum cepstrum value. The 2D extracted feature set is passed on to the multi-level classification system constructed on supervised Support Vector Machine (SVM), which first discriminates between the person's gender and then classifies the person based on the gender. Total 120 samples were used to train the proposed classification system and the proposed system correctly identifies the speaker with the accuracy, specificity and sensitivity of 83.33% 86.67% and 80% respectively.

引用

页码：649 / 653

页数：5

共 16 条

[1] AUTOMATIC RECOGNITION OF SPEAKERS FROM THEIR VOICES [J].

ATAL, BS .

PROCEEDINGS OF THE IEEE, 1976, 64 (04) :460-475

[2]

Bhalla A.V., 2012, INT J ADV RES COMPUT, V2

[3] PERSON IDENTIFICATION USING MULTIPLE CUES [J].

BRUNELLI, R ;

FALAVIGNA, D .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (10) :955-966

[4]

Fan RE, 2008, J MACH LEARN RES, V9, P1871

[5] Structure tensor based automated detection of macular edema and central serous retinopathy using optical coherence tomography images [J].

Hassan, Bilal ;

Raja, Gulistan ;

Hassan, Taimur ;

Akram, M. Usman .

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2016, 33 (04) :455-463

[6]

Honda K, 2004, IEICE T INF SYST, VE87D, P1050

[7]

Honda M., 2003, NTT Technical Review, V1, P24

[8] An overview of text-independent speaker recognition: From features to supervectors [J].

Kinnunen, Tomi ;

Li, Haizhou .

SPEECH COMMUNICATION, 2010, 52 (01) :12-40

[9]

Matejka P, 2016, INT CONF ACOUST SPEE, P5100, DOI 10.1109/ICASSP.2016.7472649

[10]

Pollack P., 1974, EXPT PHONETICS, P251

← 1 2 →