Visual Classifier Training for Text Document Retrieval

被引:108
作者
Heimerl, Florian [1 ]
Koch, Steffen [1 ]
Bosch, Harald [1 ]
Ertl, Thomas [1 ]
机构
[1] Univ Stuttgart, Inst Visualizat & Interact Syst, D-7000 Stuttgart, Germany
关键词
Visual analytics; human computer interaction; information retrieval; active learning; classification; user evaluation; GENERATION; SUPPORT;
D O I
10.1109/TVCG.2012.277
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Performing exhaustive searches over a large number of text documents can be tedious, since it is very hard to formulate search queries or define filter criteria that capture an analyst's information need adequately. Classification through machine learning has the potential to improve search and filter tasks encompassing either complex or very specific information needs, individually. Unfortunately, analysts who are knowledgeable in their field are typically not machine learning specialists. Most classification methods, however, require a certain expertise regarding their parametrization to achieve good results. Supervised machine learning algorithms, in contrast, rely on labeled data, which can be provided by analysts. However, the effort for labeling can be very high, which shifts the problem from composing complex queries or defining accurate filters to another laborious task, in addition to the need for judging the trained classifier's quality. We therefore compare three approaches for interactive classifier training in a user study. All of the approaches are potential candidates for the integration into a larger retrieval system. They incorporate active learning to various degrees in order to reduce the labeling effort as well as to increase effectiveness. Two of them encompass interactive visualization for letting users explore the status of the classifier in context of the labeled documents, as well as for judging the quality of the classifier in iterative feedback loops. We see our work as a step towards introducing user controlled classification methods in addition to text search and filtering for increasing recall in analytics scenarios involving large corpora.
引用
收藏
页码:2839 / 2848
页数:10
相关论文
共 48 条
[1]  
Ankerst M., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P179, DOI 10.1145/347090.347124
[2]  
Ankerst M., 1999, P 5 ACM SIGKDD INT C, P392, DOI DOI 10.1145/312129.312298
[3]  
[Anonymous], 2009, SEARCH USER INTERFAC, DOI DOI 10.1017/CBO9781139644082
[4]  
[Anonymous], C MACH LEARN ICML
[5]  
[Anonymous], 2008, Introduction to information retrieval
[6]  
Becks A., 2004, P WORK C ADV VIS INT, P193
[7]  
Bertini E., 2009, Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration, P12
[8]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[9]  
Campbell C., 2000, ICML, P111
[10]  
CHALMERS M, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P330