UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks

被引:18
作者
Antonio Garcia-Diaz, Jose [1 ]
Almela, Angela [2 ]
Alcaraz-Marmol, Gema [3 ]
Valencia-Garcia, Rafael [1 ]
机构
[1] Univ Murcia, Fac Informat, Murcia, Spain
[2] Univ Murcia, Fac Letras, Murcia, Spain
[3] Univ Castilla La Mancha, Dept Filol Moderna, Ciudad Real, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2020年 / 65期
关键词
Corpus compilation; Document classification;
D O I
10.26342/2020-65-22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The development of an annotated corpus is a very time-consuming task. Although some researchers have proposed the automatic annotation of a corpus based on ad-hoc heuristics, valid hypotheses cannot always be made. Even when the annotation process is performed by human annotators, the quality of the corpus is heavily influenced by disagreements between annotators or with themselves. Therefore, the lack of supervision of the annotation process can lead to poor quality corpus. In this work, we propose a demonstration of UMUCorpusClassifier, a NLP tool for aid researches for compiling corpus as well as coordinating and supervising the annotation process. This tool eases the daily supervision process and permits to detect deviations and inconsistencies during early stages of the annotation process.
引用
收藏
页码:139 / 142
页数:4
相关论文
共 11 条
[1]   Opinion Mining for Measuring the Social Perception of Infectious Diseases. An Infodemiology Approach [J].
Antonio Garcia-Diaz, Jose ;
Apolinario-Arzube, Oscar ;
Medina-Moreira, Jose ;
Omar Salavarria-Melo, Jose ;
Lagos-Ortiz, Katty ;
Luna-Aveiga, Harry ;
Valencia-Garcia, Rafael .
TECHNOLOGIES AND INNOVATION (CITI 2018), 2018, 883 :229-239
[2]   Evaluating Information-Retrieval Models and Machine-Learning Classifiers for Measuring the Social Perception towards Infectious Diseases [J].
Apolinardo-Arzube, Oscar ;
Antonio Garcia-Diaz, Jose ;
Medina-Moreira, Jose ;
Luna-Aveiga, Harry ;
Valencia-Garcia, Rafael .
APPLIED SCIENCES-BASEL, 2019, 9 (14)
[3]  
Canovas-Garca M., 2020, FUTURE GENERATION CO, V112, P614657
[4]   Automatic detection of satire in Twitter: A psycholinguistic-based approach [J].
del Pilar Salas-Zarate, Maria ;
Andres Paredes-Valverde, Mario ;
Rodriguez-Garcia, Miguel Angel ;
Valencia-Garcia, Rafael ;
Alor-Hernandez, Giner .
KNOWLEDGE-BASED SYSTEMS, 2017, 128 :20-33
[5]  
Go A., 2009, Processing, V1, DOI DOI 10.1109/COMSNETS.2017.7945451
[6]  
Grave E, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3483
[7]  
Krippendorff K., 2018, Content analysis: An introduction to its methodology
[8]   Mining Twitter for Measuring Social Perception Towards Diabetes and Obesity in Central America [J].
Medina-Moreira, Jose ;
Antonio Garcia-Diaz, Jose ;
Apolinardo-Arzube, Oscar ;
Luna-Aveiga, Harry ;
Valencia-Garcia, Rafael .
TECHNOLOGIES AND INNOVATION (CITI 2019), 2019, 1124 :81-94
[9]   Multilingual Twitter Sentiment Classification: The Role of Human Annotators [J].
Mozetic, Igor ;
Grcar, Miha ;
Smailovic, Jasmina .
PLOS ONE, 2016, 11 (05)
[10]  
Pak Alexander, 2010, LREC 2010 7 INT C LA