Apply clustering to analyze categorical data in longitudinal studies

被引:0
作者
Hassan, Mohammad Mahdi [1 ]
Blom, Martin [2 ]
Ansari, Gufran Ahmad [3 ]
机构
[1] Qassim Univ, Comp Sci Dept, Al Qassim, Saudi Arabia
[2] Karlstad Univ, Comp Sci Dept, Karlstad, Sweden
[3] BSARC Inst Sci & Technol, Dept Comp Applicat, Chennai, Tamil Nadu, India
来源
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY | 2019年 / 19卷 / 04期
关键词
Empirical Survey; Longitudinal Study; Clustering; Partitioning; Grouping; Data Mining; Expert Opinion; Diversity;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is common to collect data from practitioners in the software engineering field using surveys and questionnaires. This data is usually analyzed using descriptive statistics where the entire population is considered as an undivided group, sometimes complemented by sampling methods to obtain variations within the sample. In many cases, the survey population is partitioned into smaller groups by using available background knowledge of the participants. These techniques are valid, but can only reveal opinion diversity if that correlates with the background variables, and fail to identify sub-groups across multiple background variables. The existing approaches can thus capture the general trends but might miss opinions of different minority sub-groups. This problem becomes more complex in longitudinal studies where minority opinions might fade or resolute over time. Data from longitudinal studies may contain patterns which can be extracted using a clustering process. These patterns may unveil supplementary information and draw attention to alternative viewpoints than those exhibited by the sample population as a whole. This approach may reveal the range of opinion variations between diverse groups over time and makes it possible to identify the minorities. In our research, we have investigated the suitability of clustering techniques for analyzing categorical data from longitudinal studies.
引用
收藏
页码:10 / 19
页数:10
相关论文
共 19 条
[1]  
Aasheim C., 2012, J. Inf. Syst. Educ., V23, P193
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[3]  
[Anonymous], 2002, ACM SIGSOFT SOFTW EN, DOI DOI 10.1145/571681.571686
[4]  
Gamon M, 2005, LECT NOTES COMPUT SC, V3646, P121
[5]   On the use of software design models in software development practice: An empirical investigation [J].
Gorschek, Tony ;
Tempero, Ewan ;
Angelis, Lefteris .
JOURNAL OF SYSTEMS AND SOFTWARE, 2014, 95 :176-193
[6]  
Hall T., 2007, LONGITUDINAL STUDIES, P41, DOI [10.1007/978-3-540-71301-214, DOI 10.1007/978-3-540-71301-214]
[7]   Data mining: Statistics and more? [J].
Hand, DJ .
AMERICAN STATISTICIAN, 1998, 52 (02) :112-118
[8]  
Hassan M. M., 2015, EASE 15, DOI 10.1145/2745802.2745809
[9]   Preliminary guidelines for empirical research in software engineering [J].
Kitchenham, BA ;
Pfleeger, SL ;
Pickard, LM ;
Jones, PW ;
Hoaglin, DC ;
El Emam, K ;
Rosenberg, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (08) :721-734
[10]  
Kitchenharn B., 2003, Software Engineering Notes, V28, P24, DOI 10.1145/638750.638758