We studied data mining from CVS repositories of two large OO projects, Eclipse and Netbeans, focusing on "fixing-issue" commits. We highlight common characteristics of issue reporting, and problems related to the identification of these messages, and compare static traditional approaches, like Knowledge Engineering, to dynamic approaches based on Machine Learning techniques. We compare for the first time performances of Machine Learning (ML) techniques to automatic classify "fixing-issues" among message commits. Our study calculates precision and recall of different Machine Learning Classifiers for the correct classification of issue-reporting commits. Our results show that some ML classifiers can correctly classify up to 99.9% of such commits
A machine learning approach for text categorization of fixing-issue commits on CVS
MURGIA, ALESSANDRO;MARCHESI, MICHELE;TONELLI, ROBERTO
2010-01-01
Abstract
We studied data mining from CVS repositories of two large OO projects, Eclipse and Netbeans, focusing on "fixing-issue" commits. We highlight common characteristics of issue reporting, and problems related to the identification of these messages, and compare static traditional approaches, like Knowledge Engineering, to dynamic approaches based on Machine Learning techniques. We compare for the first time performances of Machine Learning (ML) techniques to automatic classify "fixing-issues" among message commits. Our study calculates precision and recall of different Machine Learning Classifiers for the correct classification of issue-reporting commits. Our results show that some ML classifiers can correctly classify up to 99.9% of such commitsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.