Graph based feature selection investigating boundary region of rough set for language identification

被引：17

作者：

Yasmin, Ghazaala ^{[1
]}

Das, Asit Kumar ^{[2
]}

Nayak, Janmenjoy ^{[3
]}

Pelusi, Danilo ^{[4
]}

Ding, Weiping ^{[5
]}

机构：

[1] St Thomas Coll Engn & Technol, Kolkata, W Bengal, India

[2] Indian Inst Engn Sci & Technol, Sibpur, Howrah, India

[3] Aditya Inst Technol & Management AITAM, Tekkali, India

[4] Univ Teramo, Dept Commun Sci, Teramo, Italy

[5] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2020年 / 158卷

基金：

中国国家自然科学基金;

关键词：

Language identification; Feature selection; Relative indiscernibility relation; Attribute dependency; Boundary region exploration; FEATURE-EXTRACTION; COMMUNITY STRUCTURE; NETWORK;

D O I：

10.1016/j.eswa.2020.113575

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Language can be chosen to be a species where maximum information can be extracted. In the world, there are many countries, some of which are of numerous types and flavours of regions based on their languages. The challenge is to make the spoken language recognition to be automated through machine learning. The proposed language identification system extracts various features from speech of different languages and constructs a complete weighted graph with extracted features as nodes and similarity among the features as weights of the edges. Similarity values are computed using the concepts of positive region and boundary region of rough set theory and a graph based feature selection algorithm is devised to select only the minimal subset of features relevant to language identification. It is observed that, investigating the boundary region together with the positive region, more valuable information is extracted which helps in selection of more relevant features for language identification. The constructed complete weighted graph is made sparse using Gini index based sparsity measure. As a result, the graph contains only the edges whose terminal nodes are highly similar. Next, a maximal spanning tree of the graph is generated using Prim's algorithm. This tree is a basic structure that provides the maximal similarity among the nodes in the graph. Finally, score of each node is computed based on weights of the edges in the tree and a node with the high est score is selected and removed from the spanning tree. This process of selection and removal of nodes is continued until the graph becomes null. The resultant set of selected nodes is considered as the important feature subset of the audio speeches used for language identification. Experimental results show the effectiveness of the proposed rough set theory based feature selection method. The results also demonstrate the usefulness of investigation of boundary region of rough sets. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页数：17

共 69 条

[11] Ensemble feature selection using bi-objective genetic algorithm
Das, Asit K.
Das, Sunanda
Ghosh, Arka
[J]. KNOWLEDGE-BASED SYSTEMS, 2017, 123 : 116 - 127
[12] Dash M, 2000, LECT NOTES ARTIF INT, V1805, P98
[13] Dong EQ, 2002, 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, P464, DOI 10.1109/ICOSP.2002.1181092
[14] Attribute selection for improving spam classification in online social networks: a rough set theory-based approach
Dutta S.
Ghatak S.
Dey R.
Das A.K.
Ghosh S.
[J]. Social Network Analysis and Mining, 2018, 8 (1)
[15] Etman A, 2015, 2015 SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), P220, DOI 10.1109/IntelliSys.2015.7361147
[16] A COMPARISON OF APPROACHES FOR MODELING PROSODIC FEATURES IN SPEAKER RECOGNITION
Ferrer, Luciana
Scheffer, Nicolas
Shriberg, Elizabeth
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4414 - 4417
[17] Ganapathy S, 2014, 15 ANN C INT SPEECH
[18] Fuzzy rough sets, and a granular neural network for unsupervised feature selection
Ganivada, Avatharam
Ray, Shubhra Sankar
Pal, Sankar K.
[J]. NEURAL NETWORKS, 2013, 48 : 91 - 108
[19] Fuzzy rough granular neural networks, fuzzy granules, and classification
Ganivada, Avatharam
Dutta, Soumitra
Pal, Sankar K.
[J]. THEORETICAL COMPUTER SCIENCE, 2011, 412 (42) : 5834 - 5853
[20] Garg Archana, 2014, Journal of Emerging Technologies in Web Intelligence, V6, P388, DOI 10.4304/jetwi.6.4.388-400

← 1 2 3 4 5 6 7 →