Synthetic pattern generation for imbalanced learning in image retrieval

Piras, Luca; Giacinto, Giorgio

doi:10.1016/j.patrec.2012.08.003

Nowadays very large archives of digital images are easily produced thanks to the wide availability of digital cameras, that are often embedded into a number of portable devices. One of the ways of exploring an image archive is to search for similar images. Relevance feedback mechanisms can be employed to refine the search, as the most similar images according to a set of visual features may not contain the same semantic concepts according to the users’ needs. Relevance feedback allows users to label the images returned by the system as being relevant or not. Then, this labelled set is used to learn the characteristics of relevant images. As the number of images provided to users to receive feedback is usually quite small, and relevant images typically represent a tiny fraction, it turns out that the learning problem is heavily imbalanced. In order to reduce this imbalance, this paper proposes the use of techniques aimed at artificially increasing the number of examples of the relevant class. The new examples are generated as new points in the feature space so that they are in agreement with the local distribution of the available relevant examples. The locality of the proposed approach makes it quite suited to relevance feedback techniques based on the Nearest-Neighbor (NN) paradigm. The effectiveness of the proposed approach is assessed on two image datasets and comparisons with editing techniques that eliminate redundancies in non-relevant examples are also reported.