Identifying green, digital, and twin-transition patents is essential for tracking innovation and assessing policy impact, yet existing code-based and machine-learning approaches often yield non-overlapping results, undermining comparability and reproducibility. This study introduces a scalable framework that combines configurable keyword and technology rules for candidate identification, a rule-guided seed and antiseed definition, and bidirectional citation expansion. Patent texts are encoded with a domain-specific transformer, and final selection is achieved through topic-guided pruning based on a contrastive cosine rule applied to topic-level representations. Validation against proxy labels on a held-out split indicates high precision under a conservative threshold and balanced performance under a data-driven threshold. The workflow is automated, largely unsupervised, and tractable at the scale of millions of patent families, with results robust to sensible hyperparameter choices and threshold selection, thereby improving transparency and comparability for patent landscaping in the green and digital domains.
Twin (Green and Digital) Patents Identification: an Automated Patent Landscaping Method
Francesca Ghinami
2026-01-01
Abstract
Identifying green, digital, and twin-transition patents is essential for tracking innovation and assessing policy impact, yet existing code-based and machine-learning approaches often yield non-overlapping results, undermining comparability and reproducibility. This study introduces a scalable framework that combines configurable keyword and technology rules for candidate identification, a rule-guided seed and antiseed definition, and bidirectional citation expansion. Patent texts are encoded with a domain-specific transformer, and final selection is achieved through topic-guided pruning based on a contrastive cosine rule applied to topic-level representations. Validation against proxy labels on a held-out split indicates high precision under a conservative threshold and balanced performance under a data-driven threshold. The workflow is automated, largely unsupervised, and tractable at the scale of millions of patent families, with results robust to sensible hyperparameter choices and threshold selection, thereby improving transparency and comparability for patent landscaping in the green and digital domains.| File | Dimensione | Formato | |
|---|---|---|---|
|
paper85(3).pdf
accesso aperto
Tipologia:
versione editoriale (VoR)
Dimensione
1.43 MB
Formato
Adobe PDF
|
1.43 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


