Learning to classify software defects from crowds: a novel approach
MetadataShow full item record
In software engineering, associating each reported defect with a cate- gory allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using stan- dard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always available. To cir- cumvent this dependency, we propose to apply the learning from crowds paradigm, where training categories are obtained from multiple non-expert annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class information, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of the IBM’s or- thogonal defect classification working on the issue tracking systems from two different real domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects. The considered methodologies show enhanced performance regarding the straightforward solution (majority voting) according to different metrics. This shows the possibilities of using non-expert knowledge aggregation techniques when expert knowledge is unavailable.