Identifying green, digital, and twin-transition patents is essential for tracking innovation and assessing policy impact, yet existing code-based and machine-learning approaches often yield non-overlapping results, undermining comparability and reproducibility. This study introduces a scalable framework that combines configurable keyword and technology rules for candidate identification, a rule-guided seed and antiseed definition, and bidirectional citation expansion. Patent texts are encoded with a domain-specific transformer, and final selection is achieved through topic-guided pruning based on a contrastive cosine rule applied to topic-level representations. Validation against proxy labels on a held-out split indicates high precision under a conservative threshold and balanced performance under a data-driven threshold. The workflow is automated, largely unsupervised, and tractable at the scale of millions of patent families, with results robust to sensible hyperparameter choices and threshold selection, thereby improving transparency and comparability for patent landscaping in the green and digital domains.

Twin (Green and Digital) Patents Identification: an Automated Patent Landscaping Method

Francesca Ghinami
2026-01-01

Abstract

Identifying green, digital, and twin-transition patents is essential for tracking innovation and assessing policy impact, yet existing code-based and machine-learning approaches often yield non-overlapping results, undermining comparability and reproducibility. This study introduces a scalable framework that combines configurable keyword and technology rules for candidate identification, a rule-guided seed and antiseed definition, and bidirectional citation expansion. Patent texts are encoded with a domain-specific transformer, and final selection is achieved through topic-guided pruning based on a contrastive cosine rule applied to topic-level representations. Validation against proxy labels on a held-out split indicates high precision under a conservative threshold and balanced performance under a data-driven threshold. The workflow is automated, largely unsupervised, and tractable at the scale of millions of patent families, with results robust to sensible hyperparameter choices and threshold selection, thereby improving transparency and comparability for patent landscaping in the green and digital domains.
2026
Patent Landscaping; Rule-based; Topic-guided pruning
File in questo prodotto:
File Dimensione Formato  
paper85(3).pdf

accesso aperto

Tipologia: versione editoriale (VoR)
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/470225
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact