This paper presents a statistical analysis of 20 opens ource object-oriented systems with the purpose of detecting differences in metrics distribution between Java and Python projects. We selected ten Java projects from the Java Qualitas Corpus and ten projects written in Python. For each system, we considered 10 class-level software metrics. We performed a best fit procedure on the empirical distributions through the log-normal distribution and the double Pareto distribution to identify differences between the two languages. Even though the statistical distributions for projects written in Java and Python may appear the same for lower values of the metric, performing the procedure with the double Pareto distribution for the Number of Local Methods metric reveals that major differences can be noticed along the queue of the distributions. On the contrary, the same analysis performed with the Number of Statements metric reveals that only the initial portion of the double Pareto distribution shows differences between the two languages. In addition, the dispersion parameter associated to the log-normal distribution fit for the total Number Of Methods can be used for distinguishing Java projects from Python projects.

A Statistical Comparison of Java and Python Software Metric Properties

Ortu, M;Marchesi, M
2016-01-01

Abstract

This paper presents a statistical analysis of 20 opens ource object-oriented systems with the purpose of detecting differences in metrics distribution between Java and Python projects. We selected ten Java projects from the Java Qualitas Corpus and ten projects written in Python. For each system, we considered 10 class-level software metrics. We performed a best fit procedure on the empirical distributions through the log-normal distribution and the double Pareto distribution to identify differences between the two languages. Even though the statistical distributions for projects written in Java and Python may appear the same for lower values of the metric, performing the procedure with the double Pareto distribution for the Number of Local Methods metric reveals that major differences can be noticed along the queue of the distributions. On the contrary, the same analysis performed with the Number of Statements metric reveals that only the initial portion of the double Pareto distribution shows differences between the two languages. In addition, the dispersion parameter associated to the log-normal distribution fit for the total Number Of Methods can be used for distinguishing Java projects from Python projects.
File in questo prodotto:
File Dimensione Formato  
a_Statistiscal_comparison.pdf

Solo gestori archivio

Tipologia: versione post-print (AAM)
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/308550
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 12
social impact