We introduce a method for analyzing digital transformation in the health domain by constructing a Knowledge Graph from a large corpus of 7.8 million English news articles from the Dow Jones Data, News, and Analytics platform, dating from 1987 through 2023. We first sampled around 97k articles relevant to the Digital Health topic by training and deploying a Deep Learning binary classifier by fine-tuning BERT. Successively, by deploying Natural Language Processing techniques, we extracted triples from the identified articles to form a Digital Health News Knowledge Graph, which consists of 431k distinct triples connecting 186k entities through 1866 relations. This graph provides insights into the evolution of Digital Health in news media and serves as a resource for further research in the field. Our analysis reveals significant trends in Digital Health as reflected in the news, with notable peaks coinciding with key events like the COVID-19 pandemic. We split the analysis geographically for the United States and European countries and tracked over time for each macro-region the predominant entities and relations. The classifier, the knowledge graph, and data analytics visualizations are made publicly available for future work.
Exploring Digital Health Trends in the Headlines via Knowledge Graph Analysis
Zavarella V.;reforgiato recupero diego
;Fenu G.
2025-01-01
Abstract
We introduce a method for analyzing digital transformation in the health domain by constructing a Knowledge Graph from a large corpus of 7.8 million English news articles from the Dow Jones Data, News, and Analytics platform, dating from 1987 through 2023. We first sampled around 97k articles relevant to the Digital Health topic by training and deploying a Deep Learning binary classifier by fine-tuning BERT. Successively, by deploying Natural Language Processing techniques, we extracted triples from the identified articles to form a Digital Health News Knowledge Graph, which consists of 431k distinct triples connecting 186k entities through 1866 relations. This graph provides insights into the evolution of Digital Health in news media and serves as a resource for further research in the field. Our analysis reveals significant trends in Digital Health as reflected in the news, with notable peaks coinciding with key events like the COVID-19 pandemic. We split the analysis geographically for the United States and European countries and tracked over time for each macro-region the predominant entities and relations. The classifier, the knowledge graph, and data analytics visualizations are made publicly available for future work.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


