WO2015032670A1

WO2015032670A1 - Method of classification of images and corresponding device

Info

Publication number: WO2015032670A1
Application number: PCT/EP2014/068166
Authority: WO
Inventors: Praveen Anil KULKARNI; Gaurav Sharma; Joaquin Zepeda; Louis Chevallier
Original assignee: Thomson Licensing
Priority date: 2013-09-06
Filing date: 2014-08-27
Publication date: 2015-03-12

Abstract

A user-specified textual query is fed to an image search engine. The returned images are used as training data to learn a classifier for the class category specified by the textual query. The method employs attribute classifiers to improve the on-the-fly image classification system.

Description

Method of classification of images and corresponding device

1. Field

The present disclosure relates to the field of classification of images.

2. Background

Image classification aims to determine whether a given image contains a specific visual concept (e.g. car, cow, sea, forest). To make the task suitable for a computer, images are represented by extracting from them a feature vector. The feature vector is a high-dimensional vector of numerical features that represent an image. A widely used feature vector is the bag-of- words vector consisting of a histogram of quantized image patches centered at corners detected in the image. Throughout this document, if not explicitly mentioned, it is implicitly assumed that computations that use images in fact use the corresponding feature vectors. In image classification a training set is obtained by manual annotation of a set of images. For example, for the concept "cat", a person goes through, e.g., 1000 images of a set of images, and assigns to each image a label of "1 " if a cat is present in the image and a "0" otherwise. The resulting set of images and labels are referred to as an "annotated training data set". This annotated training data set is used to learn a classifier, and this classifier can be used to assign a label to an un- annotated set of images without manual intervention. The classifier can be thought of as an algorithm which classifies the given image and assigns it to a corresponding visual concept. The assignment is based on the visual content present in an image. Although there are many classifiers, most widely and also simple to use classifiers are support vector machines (SVM) [1 ] and K-nearest neighbors [2]. Given a set of positive training examples and negative training examples, SVM learns the separating hyper plane between the positive and negative examples. An image to classify is then classified based on which side of the hyper plane it is situated. In the K-nearest neighbors classification is based on class membership. Again given the set of training examples (positive examples and negative examples), an image is classified based on majority vote of its K nearest neighbors to training examples. For example if K=10 and the image to classify is closer to 7 positive examples and 3 negative examples. Then the image to classify is assigned the visual concept corresponding to the positive example. A traditional approach to do image classification, according to a specified visual concept (e.g. specified by a user), of an un-annotated image data set (e.g. a user collection of images) consists of 3 steps:

(i) Collecting an annotated training set consisting of positive images (i.e. "a positive data set") representing a specified visual concept and negative images (i.e. a "negative data set") representing the universe of visual concepts that do not match to the specified visual concept.

(ii) Using the collected annotated training set to train a visual classifier for the specified visual concept. The SVM (Support Vector Machine) training algorithm is the most widely used approach in this step. It produces a vector in the image feature space, referred to as classifier vector.

(iii) Applying the resulting classifier vector to the un-annotated image data set, in order to classify the images in this data set. In the case of an SVM classifier vector, this amounts to computing an inner product between the classifier vector and each of the image feature vectors of the images in the set of un-annotated images. The inner product can be used to rank the images in the un- annotated data set. The images with highest rank (with largest- value of inner product) are more likely to belong to the class.

Some of the disadvantages of this traditional approach are:

1 . In order to be able to classify images in the un-annotated data set, off line prepared training sets are needed. Annotated training sets a collected for a specific visual concept. It would thus be interesting to have some information about the visual concepts present in the un-annotated data set in order to be able to prepare off line annotated training sets of positive images that correspond to the visual concepts present in the un-annotated data set. In practice, this is difficult. For example, if the un-annotated data set is a collection of images from a movie, the task is almost impossible as there may be thousands of visual concepts in these images.

2. The annotation of the training set is a manual task, and is laborious and expensive.

3. The resulting classifier vectors are fixed and thus inflexible for learning new visual concepts. Yet the un-annotated dataset will likely vary, e.g. when the un-annotated data set is a user collection of images, images are deleted, replaced, and added to the user collection.

To address these problems of the traditional approach to do image classification, prior art e.g. Chatfield et al. " Visor: Towards On-The-Fly Large- Scale Object Category Retrievaf published on the Internet on November 16, 2012, proposes that a visual concept, specified as a text query, is to be searched for in an un-annotated dataset, which textual query is then fed to Google Image Search Engine (GIS). The images resulting of the query then form a positive data set. A fixed pool of very diverse images is used as a negative data set (this corresponds to previous discussed step (i)). With this positive and negative data sets, a classifier for the visual concept specified by the text query can then be learned (corresponding to previous mentioned a step (ii)) and the resulting classifier vector is used as in step (iii) for the classification of an un-annotated data set. However, two problems can be observed with this prior art on-the-fly image classification:

(i) Due to the universality of the web, many images retrieved from the image search engine are not relevant to the queried visual concept; they are incorrect representatives of the queried visual concept. For example, a search for the visual concept "can" produces images of aluminium cans along with posters of Obama's political slogan "Yes, we can", and logos of an African organization called "Coupe d'Afrique de Nations". As these images are in the positive data set, which is used as a training set, the incorrect training data significantly biases the classifier training process.

(ii) The images retrieved from the image search engine that are relevant to the queried visual concept, as well as the images in the fixed negative image data set, are too heterogeneous. This means that the result of classification will not be as good as when using a manually annotated training set, as in the traditional classification system.

It would therefore be desirable to address some of the above drawbacks and to improve the prior art on-the-fly image classification methods in order to get better classification results. 3. Summary

According to different embodiments there is proposed a method and device for classification of image data that solves at least some of the problems of prior art.

To this end, the disclosure comprises a method of classification of images, the method comprising: receiving a query for a visual concept; executing a first image search for the query visual concept in a first set of images and extracting first image feature descriptors from images returned by the first image search that correspond to said queried visual concept, forming a positive set of positive image feature descriptors of images; determining a set of attribute visual concepts that are related to the queried visual concept; executing a second image search in the first set of images for the set of attribute visual concepts and extracting second image feature descriptors from images returned by the second image search, forming an attribute set of attribute image feature descriptors of images; computing a first score for each of the positive image feature descriptors in the positive set by calculating inner products between each of the positive image feature descriptors in the positive set and the attribute image feature descriptors in the attribute set, and removing positive image feature descriptors from the positive set having a first score that is under a determined threshold, forming a pruned positive set; feeding the pruned positive set and a negative set with feature vectors of images not corresponding to the queried visual concept to a vector classifier, thereby defining a partition of the image feature space in a positive region and in a negative region; classifying images in an un- annotated set of images by computing an inner product between the vector classifier and each of the image feature descriptors of the un-annotated set thereby obtaining a second score for each image in the un-annotated set, and classifying of the images in the un-annotated set according to the second score.

According to a variant embodiment of the method it further comprises a step of computing a third score for each of the images in the un-annotated data set by calculating inner products between image feature descriptors of each of the images in the un-annotated set and the image feature descriptors in the attribute set, and combining the third score with the second score obtained in the classifying images step to obtain a fourth score for each of the images in the un-annotated set, and classifying of the images in the un- annotated set according to the fourth score.

According to a variant embodiment of the method the determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept from a relational image set.

According to a variant embodiment of the method the determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept from information comprised in the query.

According to a variant embodiment of the method the determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept through language processing of the query.

According to a variant embodiment of the method the determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept through a word search in textual data, whereby words that have a high frequency of occurrence relative to other words are retained for being comprised in the list of attributes.

The present disclosure also comprises a device for classification of images, the device comprising: an interface for receiving a query for a visual concept; a processing module for executing a first image search for the query visual concept in a first set of images and extracting first image feature descriptors from images returned by the first image search that correspond to said queried visual concept, forming a positive set of positive image feature descriptors of images; a processing module for determining a set of attribute visual concepts that are related to the queried visual concept; a processing module for executing a second image search in the first set of images for the set of attribute visual concepts and for extracting second image feature descriptors from images returned by the second image search, forming an attribute set of attribute image feature descriptors of images; a processing module for computing a first score for each of the positive image feature descriptors in the positive set by calculating inner products between each of the positive image feature descriptors in the positive set and the attribute image feature descriptors in the attribute set, and for removing positive image feature descriptors from the positive set having a first score that is under a determined threshold, forming a pruned positive set; a processing module for feeding the pruned positive set and a negative set with feature vectors of images not corresponding to the queried visual concept to a vector classifier, thereby defining a partition of the image feature space in a positive region and in a negative region; a processing module for classifying images in an un-annotated set of images by computing an inner product between the vector classifier and each of the image feature descriptors of the un- annotated set thereby obtaining a second score for each image in the un- annotated set, and classifying of the images in the un-annotated set according to the second score.

The discussed advantages and other advantages not mentioned in this document will become clear upon the reading of the detailed description of the present disclosure.

4. Brief description of drawings

Other characteristics and advantages of the different described example embodiments will appear when reading the following description and the annexed drawings. The embodiments described hereafter are merely provided as examples and are not meant to be restrictive.

The embodiments will be described with reference to the following figures:

Figure 1 is a traditional prior-art classification system.

Figure 2 is an on-the-fly prior-art classification system that addresses some of the problems of the traditional prior-art classification system of figure 1 .

Figure 3 is an attribute-based on-the-fly classification system according a particular example embodiment.

Figure 4 is a variant embodiment of the on-the-fly classification system of figure 3. Figure 5 is a flow chart of a particular embodiment of the disclosed on-the-fly classification system.

Figure 6 is a device implementing a particular embodiment of the disclosed on-the-fly classification system.

5. Description of embodiments

Figure 1 is a traditional prior-art classification system which has briefly been discussed in the first part of the background section.

Element 10 represents a positive data set X+ that is collected in a first step (i) of collecting an annotated training set consisting of positive images representing the visual concept. Element 1 1 is a negative dataset X- of negative images representing the universe of opposite non matching concepts. The annotated data is then used to train a visual classifier for the specified visual concept, using an SVM training algorithm. This produces a vector, or classifier ^<W in the image feature space. The resulting classifier ^<W is then applied to all un-annotated images of an un-annotated data set (element 14) (e.g. a 'raw' user collection of images that has not been classified), which amounts to computing the inner product between the classifier vector and the image feature vector of the un-annotated image. The computed inner product is then associated with the un-annotated image thereby resulting in an annotated image. The computing of inner products is repeated for any un-annotated image. Finally the images are ranked according to their inner product (element 13). As mentioned, this prior-art method has the drawbacks as mentioned in the background part.

Figure 2 is a prior-art on-the-fly classification system (Chatfield) that has briefly been discussed in the second part of the background section and that addresses some problems of the traditional prior art of figure 1 . The classification system of figure 2 is referred to as an "on-the-fly" classification system and is entirely machine processed to the contrary of the traditional classification system that comprises a manual annotation step of the training set. This machine processing makes it possible to classify with a delay that is much shorter than is possible with the traditional classification system. Processing is as follows. A visual concept, specified as a text query (element 17) is to be searched for in an un-annotated data set (element 14), e.g. a user collection of unclassified images. The textual query is fed to an image search engine (element 18). The images returned by the image search engine form a positive data set (element 20). Image features (see background for information on feature vectors) are extracted by an image feature vector extractor (element 19) from the images returned by the image search engine and are converted in vector form. The obtained vectors are associated with the images of the positive data set. A fixed pool of very diverse images is used as a negative data set (21 ), With this positive and negative data sets, a classifier for the visual concept specified by the text query is learned using the process that has been described for figure 1 . Disadvantages of this prior-art on-the-fly classification system have already briefly been discussed: (i) among the images retrieved from the image search engine, are many incorrect representatives of the queried visual concept. This results in incorrect training data (the positive annotated data set) and significantly biases the result of the classification process; (ii) the relevant images retrieved from the image search engine as well as in the fixed negative image data set are too heterogeneous. This means that though the prior art on-the-fly classification system of figure 2 is faster and much less laborious than the prior art traditional classification system of figure 1 , the classification of the prior-art on-the-fly classification system is not as good as the classification of the prior art traditional classification system that uses a manually annotated positive data set.

Figure 3 is an attribute-based on-the-fly classification system according an example embodiment. Attributes are high level visual characteristics that the human brain uses when classifying images in categories. For example a class of animals can also be represented by attributes four-legged, furry, and striped. Attribute identifiers are built based on the same system setup as shown in figure 2, with positive images being representative of the attribute, and negative images that are not representative of the attribute. For example, to train an image classifier for the attribute "furry", positive images containing furry animals (e.g., cats, bears, ...) and negative images containing animals without fur (e.g., lizards, birds) are used. In the on-the-fly image classification system described here, a_k is used to denote the attribute classifier thus learned for attribute number k. A database of attribute classifiers is used. This attribute database can be built offline and/or built/extended on-the-fly. The classification system of figure 3 shows the following elements that are additional to the prior-art on- the-fly image classification system of figure 2: attribute engine 34, attribute database 35, positive dataset pruning function 36, and optionally a hybrid attribute/SVM ranking function 33. Elements in figure 3 that have been discussed previously in the context of figure 2 have the same numbers and are not discussed here further. As an alternative to SVM classification, other classification methods can be used, such as the K-nearest neighbor classifier. The latter does not produce a vector W but a partition of feature space, akin to Voronoi cells. In general, it can be said that the positive set with feature vectors of images corresponding to the query visual concept, and the negative with feature vectors of images not corresponding to the query are fed to a vector classifier. This defines a partition of the image feature space that consists of positive and negative regions of the feature space.

The function of elements 34, 35, 36 and 33 is as follows: (1 ) Attribute engine (element 34): The attribute engine maps a query visual concept (e.g., "sheep") into a list of related attributes (e.g. "sheep" related attributes are {furry,hooves}). This function can be implemented according to several variant embodiments:

a. Using relational image data sets like ImageNet. Such relational image data sets are organized into a network of semantic/conceptual relationships, with a representative image set per concept. Given a query visual concept, neighboring concepts serve as attributes, and the corresponding representative images of the neighboring concepts is used to extend the attribute database block as described hereafter.

Annotating a dataset like ImageNet has been a crowd sourced task that has begun in the 1980s.

b. The user provides query-related attributes as part of the query specification. This is done for example using a pre-defined syntax to enter the concept and related concept(s): e.g., <visual concept>: {attribute 1 , attribute 2} like "sheep: {furry, hooves}") is used or different text boxes are used to enter the query, thereby clarifying which elements of the query correspond to the visual concept and which elements of the query correspond to the query related attributes. According to a variant embodiment, natural language processing techniques are used to extract visual concepts and related attributes from the user query text (e.g., nouns represent the visual concept, adjectives in the query represent the query attributes),

c. Using web mining. According to this variant embodiment the text corresponding to the user query is fed to a textual search engine to retrieve attributes that are related to the query. Words that have a relative high frequency of appearance in the retrieved documents are used as attributes.

(2) Attribute Database (element 35): The list of attributes (e.g., {furry, hooves}) at the output of the attribute engine 34 is fed to this function. The output of this function is a set of corresponding attribute classifiers in matrix form, where e.g. a_furry and a_h00v_es are attribute classifiers for the visual concepts "furry" and "hooves": A = j_{u rry} , a _{h oo ves} J

The attribute database 35 contains a large storage of such attribute classifiers. Since attributes can be shared by a wide range of visual concepts, it is conceivable to build a useful, generic set of attributes. For example, these are color attributes ("red", "blue", "green"), shape attributes

("round", "straight", "curved"), texture attributes ("rough", "smooth", "wavy").

These can also be application-specific attributes, such as "furry" and

"hooves", for the case of animal databases.

A variant embodiment for creating the attribute database is to extend it while using the on-the-fly system: each attribute in the input list of attributes is fed independently to the on-the-fly system in Figure 3. The resulting attribute classifier is stored in the attribute database.

(3) Positive dataset pruner (element 36): The matrix A, which is the output of the attribute database 35, is used to prune (=remove) irrelevant images retrieved from the image search engine (18). "Irrelevant" means here not relevant to a query visual concept. Mathematical steps for realizing this function are for example: a. All positive images retrieved from the image search engine are classified (their inner product is computed) in a matrix X₊ using all attributes in matrix A, giving a matrix A^TX₊: a furry ^X+l ' ^•a furry ^X+N

T T

_^ahooves ^X+l ^ahooves ^X+N .

b. The matrix A^TX₊ is normalized along rows to make different attribute scores comparable across images by removing the mean and dividing by the standard deviation in a row-wise manner, thereby obtaining a normalized matrix A_n0rm:

furry ⁺1 • CI furry ^†'^v

Normalize each row a by removing the h,ooves x + .CI hooves ^†'^v

row's mean and row's standard deviation.

c. The elements of each column of the normalized matrix Anorm are added to obtain a score s per image:

S— 1 Anorm

The resulting attribute scores s are sorted in descending order and the last "n" images are discarded. The remaining images form the positive data set 30.

Then, the pruned positive data set 30 of image feature vectors is used as an input in the classification process of the un-annotated data set 14, as for example described for figure 2, using a SVM classifier 12, that also receives as an input the image feature vectors of a set of negative images representing visual concepts that do not correspond to the query visual concept, the SVM classifier produces a vector W. This vector is used in a SVM ranker 33 to rank the images in the un-annotated data set 14 according to their relevance to the query visual concept.

Figure 4 is a variant embodiment of the on-the-fly classification system of figure 3. The classification system described above comprises previous discussed functions (1 ) to (3), and further comprises an optional additional function :

(4) Hybrid attribute / SVM ranking function (element 33) : The matrix A, which is the output of the attribute database block, is additionally used to rank the un-annotated dataset X_T (element 14). A hybrid approach is thus used that mixes both the SVM classifier score and the attribute classifier score: a: similar to steps (a) to (c) of above step (3) (positive data set pruning), in order to get an attribute score for the un-annotated data Xj. classification of elements in matrix Xj using the attributes in matrix A, giving a matrix Α_τΧτ ; normalization of this matrix, giving a matrix T_n0rm:

Adding the elements of each column of the normalized matrix T_n0rm to obtain a score s_t per un-annotated image:

Sf = 1 T_norm

b: Finally the hybrid score used to carry out a hybrid ranking is given by:

score = alpha * norm( w^T X_T) + (1 -alpha) norm(St)

Figure 5 is a flow chart of a particular embodiment of the method for classification of images. In a step 50, a query for a visual concept is received. In a step 51 , a first image search for the query visual concept in a first set of images is executed, and image feature descriptors are extracted from images returned by the first image search, thereby forming a positive set of positive image feature descriptors of images corresponding to the query visual concept. In a step 52, a set of attribute visual concepts is determined that are related to the query visual concept. In a step 53, a second image search is executed in the first set of images for the attribute visual concepts and image feature descriptors are extracted from images returned by the second image search, thereby forming an attribute set of attribute image feature descriptors of images corresponding to the attribute visual concepts. In a step 54, a first score is computed for each of the image feature descriptors in the positive set by calculating inner products between each of the positive image feature descriptors and the attribute image feature descriptors, and positive image feature descriptors are removed from the positive set having a first score that is under a determined threshold, teherby forming a pruned positive set. In a step 55, the pruned positive set and a negative set with feature vectors of images not corresponding to the query visual concept are fed to a vector classifier, thereby partitioning the image feature space in a positive feature space region and in a negative feature space region . In a step 56, images in an un-annotated set of images are classified by computing an inner product between vector W and each of the image feature descriptors of the un- annotated set thereby obtaining a second score for each image in the un- annotated set, and classifying of the images in the un-annotated set according to the second score.

Figure 6 is a device 600 implementing a particular embodiment of the described method. The device comprises: an interface (60) for receiving a query for a visual concept; a processing module (62) for executing a first image search for the query visual concept in a first set of images and extracting first image feature descriptors from images returned by the first image search that correspond to the queried visual concept, forming a positive set of positive image feature descriptors of images; a processing module (62) for determining a set of attribute visual concepts that are related to the query visual concept; a processing module (62) for executing a second image search in the first set of images for the set of attribute visual concepts and for extracting second image feature descriptors from images returned by the second image search, forming an attribute set of attribute image feature descriptors of images; a processing module (62) for computing a first score for each of the positive image feature descriptors in the positive set by calculating inner products between each of the positive image feature descriptors in the positive set and the attribute image feature descriptors in the attribute set, and for removing positive image feature descriptors from the positive set having a first score that is under a determined threshold, forming a pruned positive set; a processing module (62) for feeding the pruned positive set and a negative set with feature vectors of images not corresponding to the query visual concept to a vector classifier, thereby defining a partition of the image feature space in a positive region and in a negative region; a processing module (62) for classifying images in an un- annotated set of images by computing an inner product between the vector classifier and each of the image feature descriptors of the un-annotated set thereby obtaining a second score for each image in the un-annotated set, and classifying of the images in the un-annotated set according to the second score. The device further comprises a memory 61 for storage of variables, such as vectors and image sets. Processing module 62, interface 60 and memory 61 are interconnected through a data communication bus 63. The device 600 is connected to the outside through a network connection 64.

As will be appreciated by one skilled in the art, aspects of the present principles can be embodied as a system, method or computer readable medium. Accordingly, aspects of the present principles can take the form of an entirely hardware embodiment, en entirely software embodiment (including firmware, resident software, micro-code and so forth), or an embodiment combining hardware and software aspects that can all generally be defined to herein as a "circuit", "module" or "system". Furthermore, aspects of the present principles can take the form of a computer readable storage medium. Any combination of one or more computer readable storage medium(s) can be utilized.

Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative system components and/or circuitry embodying the principles of the present disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable storage media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information there from. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette; a hard disk; a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory); a portable compact disc read-only memory (CD-ROM); an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.

Claims

1 . A method of classification of images, the method being characterized in that it comprises:

receiving (50) a query for a visual concept;

executing (51 ) a first image search for said query visual concept in a first set of images and extracting first image feature descriptors from images returned by said first image search that correspond to said queried visual concept, forming a positive set of positive image feature descriptors of images;

determining (52) a set of attribute visual concepts that are related to said queried visual concept;

executing (53) a second image search in said first set of images for said set of attribute visual concepts and extracting second image feature descriptors from images returned by said second image search, forming an attribute set of attribute image feature descriptors of images;

computing (54) a first score for each of the positive image feature descriptors in the positive set by calculating inner products between each of the positive image feature descriptors in the positive set and the attribute image feature descriptors in the attribute set, and removing positive image feature descriptors from the positive set having a first score that is under a determined threshold, forming a pruned positive set;

feeding (55) said pruned positive set and a negative set with feature vectors of images not corresponding to the queried visual concept to a vector classifier, thereby defining a partition of the image feature space in a positive region and in a negative region;

classifying (56) images in an un-annotated set of images by computing an inner product between the vector classifier and each of the image feature descriptors of the un-annotated set thereby obtaining a second score for each image in the un-annotated set, and classifying of the images in the un- annotated set according to said second score.

2. The method according to claim 1 , wherein the method further comprises a step of computing a third score for each of the images in the un- annotated data set by calculating inner products between image feature descriptors of each of the images in the un-annotated set and the image feature descriptors in the attribute set, and combining the third score with the second score obtained in the classifying images step to obtain a fourth score for each of the images in the un-annotated set, and classifying of the images in the un-annotated set according to said fourth score.

3. The method according to claim 1 , wherein said determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept from a relational image set.

4. The method according to claim 1 , wherein said determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept from information comprised in the query.

5. The method according to claim 1 , wherein said determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept through language processing of the query.

6. The method according to claim 1 , wherein said determining of said set of attribute visual concepts that are related to said queried visual concept comprises a step of retrieving a list of attributes that are related to the queried visual concept through a word search in textual data, whereby words that have a high frequency of occurrence relative to other words are retained for being comprised in said list of attributes.

7. A device (600) for classification of images, the device being characterized in that it comprises:

an interface (60) for receiving a query for a visual concept;

a processing module (62) for executing a first image search for said query visual concept in a first set of images and extracting first image feature descriptors from images returned by said first image search that correspond to said queried visual concept, forming a positive set of positive image feature descriptors of images;

a processing module for determining a set of attribute visual concepts that are related to said queried visual concept; a processing module for executing a second image search in said first set of images for said set of attribute visual concepts and for extracting second image feature descriptors from images returned by said second image search, forming an attribute set of attribute image feature descriptors of images;

a processing module for computing a first score for each of the positive image feature descriptors in the positive set by calculating inner products between each of the positive image feature descriptors in the positive set and the attribute image feature descriptors in the attribute set, and for removing positive image feature descriptors from the positive set having a first score that is under a determined threshold; forming a pruned positive set;

a processing module for feeding said pruned positive set and a negative set with feature vectors of images not corresponding to the queried visual concept to a vector classifier, thereby defining a partition of the image feature space in a positive region and in a negative region;

a processing module for classifying images in an un-annotated set of images by computing an inner product between the vector classifier and each of the image feature descriptors of the un-annotated set thereby obtaining a second score for each image in the un-annotated set, and classifying of the images in the un-annotated set according to said second score.