Defining properties of speech spectrogram images to allow effective pre-processing prior to pattern recognition

被引：0

作者：

Mohammed, Aldarkazali ^{[1
]}

Rupert, Young ^{[1
]}

Chris, Chatwin ^{[1
]}

Philip, Birch ^{[1
]}

机构：

[1] Univ Sussex, Sch Engn & Design, Ind Informat Res Grp, Brighton BN1 9QT, E Sussex, England

来源：

OPTICAL PATTERN RECOGNITION XXIV | 2013年 / 8748卷

关键词：

D O I：

10.1117/12.2014511

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The speech signal of a word is a combination of frequencies which can produce specific transition frequency shapes. These can be regarded as a written text in some unknown 'script'. Before attempting methods to read the speech spectrogram image using image processing techniques we need first to define the properties of the speech spectrogram image as well as the reduction of the clutter of the spectrogram image and the selection of the methods to be employed for image matching. Thus methods to convert the speech signal to a spectrogram image are initially employed, followed by reduction of the noise in the signal by capturing the energy associated with formants of the speech signal. This is followed by the normalisation of the size of the image and its resolution of in both the frequency and time axes. Finally, template matching methods are employed to recognise portions of text and isolated words. The paper describes the pre-processing methods employed and outlines the use of normalised grey-level correlation for the recognition of words.

引用

页数：11

共 29 条

[21] Bio-Inspired Sensor-Based Sound Pre-Processing for Speech Recognition in Noisy Conditions [J].

Johny, Sachin ;

Ved, Kalpan ;

Durstewitz, Steve ;

Ivanov, Tzvetan ;

Ziegler, Martin ;

Lenk, Claudia .

2024 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE, BIOCAS 2024, 2024,

[22] Efficient image pre-processing for topological pattern recognition - art. no. 62450M [J].

Hu, Chia-Lun John .

Optical Pattern Recogniton XVII, 2006, 6245 :M2450-M2450

[23] Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment [J].

Mporas, Iosif ;

Ganchev, Todor ;

Kocsis, Otilia ;

Fakotakis, Nikos .

SIGNAL PROCESSING, 2011, 91 (08) :2101-2111

[24] Pre-processing stereo transparent images: Extraction of non-transparent regions by variable length pattern correspondence [J].

Frye, RE ;

Ledley, RS .

PROCEEDINGS OF THE 1996 FIFTEENTH SOUTHERN BIOMEDICAL ENGINEERING CONFERENCE, 1996, :285-288

[25] Pre-processing, Extraction and Recognition of Binary Erythrocyte Shapes for Computer-Assisted Diagnosis Based on MGG Images [J].

Frejlichowski, Dariusz .

COMPUTER VISION AND GRAPHICS, PT I, 2010, 6374 :368-375

[26] Optical hardware implementation of the two-layer neural network with the pre-processing unit for invariant pattern recognition [J].

Evtikhiev, NN ;

Onyky, BN ;

Repin, DV ;

Scherbakov, IB ;

Starikov, RS ;

Zabulonov, MI .

OPTICAL PATTERN RECOGNITION VII, 1996, 2752 :281-289

[27] Contribution of noise reduction pre-processing and microphone directionality strategies in the speech recognition in noise in adult cochlear implant users [J].

Maria Valeria Schmidt Goffi-Gomez ;

Lilian Muniz ;

Gislaine Wiemes ;

Lucia Cristina Onuki ;

Luciane Calonga ;

Francisco José Osterne ;

Maria Isabel Kós ;

Fernanda Ferreira Caldas ;

Carolina Cardoso ;

Byanka Cagnacci .

European Archives of Oto-Rhino-Laryngology, 2021, 278 :2823-2828

[28] On the efficiency of classical RASTA filtering for continuous speech recognition: Keeping the balance between acoustic pre-processing and acoustic modelling [J].

de Veth, J ;

Boves, L .

SPEECH COMMUNICATION, 2003, 39 (3-4) :269-286

[29] Contribution of noise reduction pre-processing and microphone directionality strategies in the speech recognition in noise in adult cochlear implant users [J].

Goffi-Gomez, Maria Valeria Schmidt ;

Muniz, Lilian ;

Wiemes, Gislaine ;

Onuki, Lucia Cristina ;

Calonga, Luciane ;

Osterne, Francisco Jose ;

Kos, Maria Isabel ;

Caldas, Fernanda Ferreira ;

Cardoso, Carolina ;

Cagnacci, Byanka .

EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2021, 278 (08) :2823-2828

← 1 2 3 →