The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification.
Deep Learning meets Knowledge Graphs for Scholarly Data Classification
Danilo Dessi';
2021-01-01
Abstract
The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.