Data modelling for large-scale social media analytics: design challenges and lessons learned

被引:3
作者
Aydin, Ahmet Arif [1 ]
Anderson, Kenneth M. [2 ]
机构
[1] Inonu Univ, Dept Comp Sci, TR-44280 Malatya, Turkey
[2] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
关键词
data modelling; social media analytics; big data analytics; NoSQL;
D O I
10.1504/IJDMMM.2020.111409
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We live in a world of big data; organisations collect, store, and analyse large volumes of data for various purposes. The five V's of big data introduce new challenges for developers to handle when performing data processing and analysis. Indeed, data modelling is one of the most challenging and critical aspects of big data because it determines how data will be structured and stored; these decisions then impact how that data can be processed and analysed. In this paper, we report on designing a data model for storing and analysing Twitter data in support of crisis informatics. In this work, we leverage the data model provided by columnar NoSQL data stores to design column families that can efficiently index, sort, store and analyse large Twitter datasets. In particular, our column families are designed to achieve efficient batch data processing. We evaluate these claims and discuss our future work.
引用
收藏
页码:386 / 414
页数:29
相关论文
共 32 条
[1]   Embrace the Challenges: Software Engineering in a Big Data World [J].
Anderson, Kenneth M. .
2015 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON BIG DATA SOFTWARE ENGINEERING, 2015, :19-25
[2]   Design Challenges/Solutions for Environments Supporting the Analysis of Social Media Data in Crisis Informatics Research [J].
Anderson, Kenneth M. ;
Aydin, Ahmet Arif ;
Barrenechea, Mario ;
Cardenas, Adam ;
Hakeem, Mazin ;
Jambi, Sahar .
2015 48TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2015, :163-172
[3]  
Anderson KM, 2011, 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), P844, DOI 10.1145/1985793.1985920
[4]  
Anderson KennethMark., 2013, IEEE DATA ENG B, V36, P13
[5]  
Andreolini M., 2011, Proceedings of the 2011 IEEE 11th International Conference on Computer and Information Technology (CIT 2011), P389, DOI 10.1109/CIT.2011.62
[6]   Data modeling in the NoSQL world [J].
Atzeni, Paolo ;
Bugiotti, Francesca ;
Cabibbo, Luca ;
Torlone, Riccardo .
COMPUTER STANDARDS & INTERFACES, 2020, 67 (67)
[7]   Incremental Sorting for Large Dynamic Data Sets [J].
Aydin, Ahmet Arif ;
Anderson, Kenneth M. .
2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, :170-175
[8]  
Barlow M., 2013, Zhurnal Eksperimental'noi i Teoreticheskoi Fiziki, DOI DOI 10.1007/S13398-014-0173-7.2
[9]  
Barrenechea M, 2015, ICWE C, P663, DOI DOI 10.1007/978-3-319-19890-3
[10]   A Big Data Modeling Methodology for Apache Cassandra [J].
Chebotko, Artem ;
Kashlev, Andrey ;
Lu, Shiyong .
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, :238-245