Set Expansion is the problem of automatically extending a given set of seed elements with objects of the same class, for instance {red, green, white} → {red, green, white, gray, yellow,...}. In this paper we address the problem in the challenging scenario of extending singletons, that is, sets with only one seed element. Differently from existing work, we do not assume the presence of markup, such as html lists, nor whatsoever ontology, indeed relying only on free (unstructured, plain) text. Despite the challenging problem, we show that the singleton expansion can be accomplished unsupervisedly by means of nearest neighbor search (NNS) over word embeddings. We further propose an algorithm that significantly improve the performance of NNS both for small and large (long tail) expansions, while maintaining the important quality of being language independent.
Unsupervised Singleton Expansion from Free Text
Maurizio Atzori
;
2018-01-01
Abstract
Set Expansion is the problem of automatically extending a given set of seed elements with objects of the same class, for instance {red, green, white} → {red, green, white, gray, yellow,...}. In this paper we address the problem in the challenging scenario of extending singletons, that is, sets with only one seed element. Differently from existing work, we do not assume the presence of markup, such as html lists, nor whatsoever ontology, indeed relying only on free (unstructured, plain) text. Despite the challenging problem, we show that the singleton expansion can be accomplished unsupervisedly by means of nearest neighbor search (NNS) over word embeddings. We further propose an algorithm that significantly improve the performance of NNS both for small and large (long tail) expansions, while maintaining the important quality of being language independent.File | Dimensione | Formato | |
---|---|---|---|
icsc18_singleton_expansion.pdf
Solo gestori archivio
Tipologia:
versione editoriale (VoR)
Dimensione
183.21 kB
Formato
Adobe PDF
|
183.21 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.