Graph based feature selection investigating boundary region of rough set for language identification

被引:17
作者
Yasmin, Ghazaala [1 ]
Das, Asit Kumar [2 ]
Nayak, Janmenjoy [3 ]
Pelusi, Danilo [4 ]
Ding, Weiping [5 ]
机构
[1] St Thomas Coll Engn & Technol, Kolkata, W Bengal, India
[2] Indian Inst Engn Sci & Technol, Sibpur, Howrah, India
[3] Aditya Inst Technol & Management AITAM, Tekkali, India
[4] Univ Teramo, Dept Commun Sci, Teramo, Italy
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
基金
中国国家自然科学基金;
关键词
Language identification; Feature selection; Relative indiscernibility relation; Attribute dependency; Boundary region exploration; FEATURE-EXTRACTION; COMMUNITY STRUCTURE; NETWORK;
D O I
10.1016/j.eswa.2020.113575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language can be chosen to be a species where maximum information can be extracted. In the world, there are many countries, some of which are of numerous types and flavours of regions based on their languages. The challenge is to make the spoken language recognition to be automated through machine learning. The proposed language identification system extracts various features from speech of different languages and constructs a complete weighted graph with extracted features as nodes and similarity among the features as weights of the edges. Similarity values are computed using the concepts of positive region and boundary region of rough set theory and a graph based feature selection algorithm is devised to select only the minimal subset of features relevant to language identification. It is observed that, investigating the boundary region together with the positive region, more valuable information is extracted which helps in selection of more relevant features for language identification. The constructed complete weighted graph is made sparse using Gini index based sparsity measure. As a result, the graph contains only the edges whose terminal nodes are highly similar. Next, a maximal spanning tree of the graph is generated using Prim's algorithm. This tree is a basic structure that provides the maximal similarity among the nodes in the graph. Finally, score of each node is computed based on weights of the edges in the tree and a node with the high est score is selected and removed from the spanning tree. This process of selection and removal of nodes is continued until the graph becomes null. The resultant set of selected nodes is considered as the important feature subset of the audio speeches used for language identification. Experimental results show the effectiveness of the proposed rough set theory based feature selection method. The results also demonstrate the usefulness of investigation of boundary region of rough sets. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:17
相关论文
共 69 条
  • [11] Ensemble feature selection using bi-objective genetic algorithm
    Das, Asit K.
    Das, Sunanda
    Ghosh, Arka
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 123 : 116 - 127
  • [12] Dash M, 2000, LECT NOTES ARTIF INT, V1805, P98
  • [13] Dong EQ, 2002, 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, P464, DOI 10.1109/ICOSP.2002.1181092
  • [14] Attribute selection for improving spam classification in online social networks: a rough set theory-based approach
    Dutta S.
    Ghatak S.
    Dey R.
    Das A.K.
    Ghosh S.
    [J]. Social Network Analysis and Mining, 2018, 8 (1)
  • [15] Etman A, 2015, 2015 SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), P220, DOI 10.1109/IntelliSys.2015.7361147
  • [16] A COMPARISON OF APPROACHES FOR MODELING PROSODIC FEATURES IN SPEAKER RECOGNITION
    Ferrer, Luciana
    Scheffer, Nicolas
    Shriberg, Elizabeth
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4414 - 4417
  • [17] Ganapathy S, 2014, 15 ANN C INT SPEECH
  • [18] Fuzzy rough sets, and a granular neural network for unsupervised feature selection
    Ganivada, Avatharam
    Ray, Shubhra Sankar
    Pal, Sankar K.
    [J]. NEURAL NETWORKS, 2013, 48 : 91 - 108
  • [19] Fuzzy rough granular neural networks, fuzzy granules, and classification
    Ganivada, Avatharam
    Dutta, Soumitra
    Pal, Sankar K.
    [J]. THEORETICAL COMPUTER SCIENCE, 2011, 412 (42) : 5834 - 5853
  • [20] Garg Archana, 2014, Journal of Emerging Technologies in Web Intelligence, V6, P388, DOI 10.4304/jetwi.6.4.388-400