Sparse Sinusoidal Signal Representation for Speech and Music Signals

被引：0

作者：

Mowlaee, Pejman ^{[1
]}

Froghani, Amirhossein ^{[1
]}

Sayadiyan, Abolghasem ^{[1
]}

机构：

[1] Amir Kabir Univ Technol, Dept Elect Engn, Tehran, Iran

来源：

ADVANCES IN COMPUTER SCIENCE AND ENGINEERING | 2008年 / 6卷

关键词：

Sinusoidal subspace; STFT; Principle Component Analysis; Sparse Representation; SSNR;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a Sparse representation called Fixed Dimension Modified Sinusoid Model (FD-MSM) for parametric analysis of audible signals including speech, music and mixtures. Compared with other analysis models, the proposed scheme is both pitch independent and appropriate for sparse signal representation commonly found as a favorable choice for speech enhancement and sound separation. Using the state-of-the-art Principle Component Analysis (PCA) it is demonstrated that FD-MSM signal representation is equivalent to a non-linear mapping into sinusoidal subspace which preserves those components with largest eigenvalues by projecting the signal components into the corresponding eigen-vectors. Conducting subjective experiments, we observed that the resulting signal is perceptually indistinguishable from the original ones.

引用

页码：469 / 476

页数：8

共 7 条

[1]

FURUI S, 1992, ADV SPEECH SIGNAL PR

[2] Speech analysis synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model [J].

George, EB ;

Smith, MJT .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (05) :389-406

[3] Sinusoidal modeling and modification of unvoiced speech [J].

Macon, MW ;

Clements, MA .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (06) :557-560

[4] SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION [J].

MCAULAY, RJ ;

QUATIERI, TF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :744-754

[5]

MOWLAEE P, 2007, INT C SIGN PROC COMM, P1183

[6]

O'Shaughnessy D., 2000, SPEECH COMMUN

[7] Monaural speech segregation based on fusion of source-driven with model-driven techniques [J].

Radfar, Mohammad H. ;

Dansereau, Richard M. ;

Sayadiyan, Abolghasem .

SPEECH COMMUNICATION, 2007, 49 (06) :464-476

← 1 →