Exploratory Visual Analysis and Interactive Pattern Extraction from Semi-Structured Data

被引:9
作者
Soto, Axel J. [1 ]
Kiros, Ryan [2 ]
Keselj, Vlado [1 ]
Milios, Evangelos [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, 6050 Univ Ave,POB 15000, Halifax, NS B3H 4R2, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Visual text analytics; dimensionality reduction; interactive clustering;
D O I
10.1145/2812115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-structured documents are a common type of data containing free text in natural language (unstructured data) as well as additional information about the document, or meta-data, typically following a schema or controlled vocabulary (structured data). Simultaneous analysis of unstructured and structured data enables the discovery of hidden relationships that cannot be identified from either of these sources when analyzed independently of each other. In this work, we present a visual text analytics tool for semi-structured documents (ViTA-SSD), that aims to support the user in the exploration and finding of insightful patterns in a visual and interactive manner in a semi-structured collection of documents. It achieves this goal by presenting to the user a set of coordinated visualizations that allows the linking of the metadata with interactively generated clusters of documents in such a way that relevant patterns can be easily spotted. The system contains two novel approaches in its back end: a feature-learning method to learn a compact representation of the corpus and a fast-clustering approach that has been redesigned to allow user supervision. These novel contributions make it possible for the user to interact with a large and dynamic document collection and to perform several text analytical tasks more efficiently. Finally, we present two use cases that illustrate the suitability of the system for in-depth interactive exploration of semi-structured document collections, two user studies, and results of several evaluations of our text-mining components.
引用
收藏
页数:36
相关论文
共 50 条
[1]  
Arias-Hernandez Richard, 2011, 2011 44 HAW INT C SY, P1, DOI [DOI 10.1109/HICSS.2011.339, 10.1109/HICSS.2011.339]
[2]  
Basu S., 2002, ICML, P27
[3]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[4]   UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization [J].
Choo, Jaegul ;
Lee, Changhyun ;
Reddy, Chandan K. ;
Park, Haesun .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) :1992-2001
[5]  
Cohn D, 2009, CH CRC DATA MIN KNOW, P17
[6]  
Collins Christopher, 2009, Proceedings of the 2009 IEEE Symposium on Visual Analytics Science and Technology. VAST 2009. Held co-jointly with VisWeek 2009, P91, DOI 10.1109/VAST.2009.5333443
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[8]  
2-9
[9]  
Dou WW, 2012, IEEE CONF VIS ANAL, P93, DOI 10.1109/VAST.2012.6400485
[10]  
Erhan D, 2010, J MACH LEARN RES, V11, P625