Productivity Reanalysis for Unbalanced Datasets with Mixed-Effects Models

被引：0

作者：

Amasaki, Sousuke ^{[1
]}

机构：

[1] Okayama Prefectural Univ, Dept Syst Engn, Okayama 7191197, Japan

来源：

PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT | 2010年 / 6156卷

关键词：

productivity analysis; mixed-effects models; unbalanced datasets; estimation; data analysis;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Data analysis is a major and important activity in software engineering research. For example, productivity analysis and evaluation of new technologies almost always conduct statistical analysis on collected data. Software data are usually unbalanced because they are collected from actual projects, not from formal experiments, and therefore their population is biased. Fixed-effects models have often been used for data analysis though they are for balanced datasets. This misuse causes analysis to be insufficient and conclusion to be wrong. The past study[1] proposed an iterative procedure to treat unbalanced datasets for productivity analysis. However, this procedure was sometimes failed to identify partially-confounded factors and its estimated effects were not easy to interpret. This study examined mixed-effects models for productivity analysis. Mixed-effects models can work the same for unbalanced datasets as for balanced datasets. Furthermore its application is straightforward and estimated effects are easy to interpret. Experiments with four datasets showed advantages of the mixed-effects models clearly.

引用

页码：276 / 290

页数：15

共 8 条

[1]

[Anonymous], 2011, Data analysis using regression and multilevel/hierarchical models

[2]

[Anonymous], 2005, School of Information Technology and Engineering, University of Ottawa, Canada

[3]

[Anonymous], 1981, Software Engineering Economics

[4]

Bazeghi C, 2005, INT SYMP MICROARCH, P209

[5] The Impact of Design and Code Reviews on Software Quality: An Empirical Study Based on PSP Data [J].

Kemerer, Chris F. ;

Paulk, Mark C. .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2009, 35 (04) :534-550

[6] A procedure for analyzing unbalanced datasets [J].

Kitchenham, B .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (04) :278-301

[7] Quantifying identifier quality: an analysis of trends [J].

Lawrie, Dawn ;

Feild, Henry ;

Binkley, David .

EMPIRICAL SOFTWARE ENGINEERING, 2007, 12 (04) :359-388

[8] OPM vs. UML - Experimenting with comprehension and construction of web application models [J].

Reinhartz-Berger, I ;

Dori, D .

EMPIRICAL SOFTWARE ENGINEERING, 2005, 10 (01) :57-79

← 1 →