Mining Social Media Data Using Topological Data Analysis

被引:8
作者
Almgren, Khaled [1 ]
Kim, Minkyu [2 ]
Lee, Jeongkyu [1 ]
机构
[1] Univ Bridgeport, Comp Sci & Engn Dept, Bridgeport, CT 06614 USA
[2] ASML, Wilton, CT 06897 USA
来源
2017 IEEE 18TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI 2017) | 2017年
关键词
topological data analysis; social network analysis and mining; machine learning; clustering;
D O I
10.1109/IRI.2017.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Topological data analysis is a noble method to analyze high-dimensional qualitative data using a set of properties from topology. In this paper, we explore the feasibility of topological data analysis for mining social media data by investigating the problem of image popularity. We randomly crawl images from Instagram, convert their captions to 300 dimensional numerical vectors using Word2vec, calculate cosine distances to evaluate the similarities of the caption vectors, and then apply the distances to a topological data analysis algorithm called mapper. With caption vectors, the results show that topological data analysis is able to cluster the images related to the images' popularity. Moreover, the results show relationships between the clusters that are represented as a monotonic increase of popularity. This approach is compared with traditional clustering algorithms, including k-means and hierarchical clustering, and the results show that topological data analysis outperforms the others.
引用
收藏
页码:144 / 153
页数:10
相关论文
共 27 条
[1]  
Almgren K., 2016, P 3 MULT INT SOC NET, P15
[2]  
[Anonymous], 2016, 2016 ANN CONNECTICUT, DOI DOI 10.1109/CT-IETA.2016.7868253
[3]  
[Anonymous], 2014, P 2014 ACM C WEB SCI, DOI [DOI 10.1145/2615569.2615700, 10.1145/2615569.2615700]
[4]  
BIRD S, 2006, P COLING ACL INT PRE, P69, DOI DOI 10.3115/1225403.1225421
[5]   INFORMATION-THEORY, DISTANCE MATRIX, AND MOLECULAR BRANCHING [J].
BONCHEV, D ;
TRINAJSTIC, N .
JOURNAL OF CHEMICAL PHYSICS, 1977, 67 (10) :4517-4533
[6]   Predicting ReTweet Count Using Visual Cues [J].
Can, Ethem F. ;
Oktay, Huseyin ;
Manmatha, R. .
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, :1481-1484
[7]   Latent Factors of Visual Popularity Prediction [J].
Cappallo, Spencer ;
Mensink, Thomas ;
Snoek, Cees G. M. .
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, :195-202
[8]   TOPOLOGY AND DATA [J].
Carlsson, Gunnar .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 2009, 46 (02) :255-308
[9]  
Cartan H., 2016, HOMOLOGICAL ALGEBRA, V19
[10]  
Choudhary D., 2014, TOPOLOGICAL DATA ANA