Optimizing Natural Language Processing Pipelines: Opinion Mining Case Study

被引:0
作者
Estevez-Velarde, Suilan [1 ]
Gutierrez, Yoan [2 ]
Montoyo, Andres [3 ]
Almeida-Cruz, Yudivian [1 ]
机构
[1] Univ Habana, Sch Math & Comp Sci, Havana, Cuba
[2] Univ Alicante, Univ Inst Comp Res IUII, St Vicent Del Raspeig, Spain
[3] Univ Alicante, Dept Languages & Comp Syst, St Vicent Del Raspeig, Spain
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019) | 2019年 / 11896卷
关键词
Natural Language Processing; Pipeline optimization; Metaheuristics; Opinion mining; SELECTION;
D O I
10.1007/978-3-030-33904-3_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning algorithms that can be applied to different Natural Language Processing tasks. The process of selecting the algorithms and their parameters is modelled as an optimization problem and a technique was proposed to find an optimal combination based on the metaheuristic Population-Based Incremental Learning (PBIL). For validation purposes, this approach is applied to a standard opinion mining problem. NLP-Opt effectively optimizes the algorithms and parameters of pipelines. Additionally, NLP-Opt outputs probabilistic information about the optimization process, revealing the most relevant components of pipelines. The proposed technique can be applied to different Natural Language Processing problems, and the information provided by NLP-Opt can be used by researchers to gain insights on the characteristics of the best-performing pipelines. The source code is made available for other researchers. In contrast with other Auto-ML approaches, NLP-Opt provides a flexible mechanism for designing generic pipelines that can be applied to NLP problems. Furthermore, the use of the probabilistic model provides a more comprehensive approach to the Auto-ML problem that enriches researcher understanding of the possible solutions.
引用
收藏
页码:163 / 173
页数:11
相关论文
共 21 条
[1]  
Abualigah L.M., 2016, P 2016 7 INT C COMP, V1314, P1
[2]   Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering [J].
Abualigah, Laith Mohammad ;
Khader, Ahamad Tajudin ;
Al-Betar, Mohammed Azmi ;
Alomari, Osama Ahmad .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 84 :24-36
[3]  
[Anonymous], 2011, P INT AAAI C WEB SOC
[4]  
Baluja S., 1994, Advances Neural Information Processing Systems, P1
[5]   Model-based machine learning [J].
Bishop, Christopher M. .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2013, 371 (1984)
[6]   RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines [J].
de Sa, Alex G. C. ;
Pinto, Walter Jose G. S. ;
Oliveira, Luiz Otavio V. B. ;
Pappa, Gisele L. .
GENETIC PROGRAMMING, EUROGP 2017, 2017, 10196 :246-261
[7]  
Feurer M, 2015, ADV NEUR IN, V28
[8]   Dynamic selection of normalization techniques using data complexity measures [J].
Jain, Sukirty ;
Shukla, Sanyam ;
Wadhvani, Rajesh .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 106 :252-262
[9]  
Komer Brent, 2014, P SCIPY 14, P2825
[10]   Ontology-based sentiment analysis of twitter posts [J].
Kontopoulos, Efstratios ;
Berberidis, Christos ;
Dergiades, Theologos ;
Bassiliades, Nick .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (10) :4065-4074