CN115457309A

CN115457309A - Image unsupervised classification method based on natural language

Info

Publication number: CN115457309A
Application number: CN202210992923.4A
Authority: CN
Inventors: 孟超越; 常智山; 史建华; 周志扬
Original assignee: Mingtai Beijing Technology Co ltd
Current assignee: Mingtai Beijing Technology Co ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-12-09

Abstract

The invention discloses an unsupervised image classification method based on natural language, and particularly relates to the field of image classification, which comprises the following steps of S1, setting keywords of a classification target; s2, data acquisition and S3, image classification label generation: converting the class number generated in the last step into one-hot codes, and multiplying the one-hot codes by the similarity; s4, training an image classification model; s5, reasoning process: the image classification is directly used, a natural language processing process is not needed, the unsupervised classification of the images is realized by utilizing the correlation between the titles or the related character descriptions of the images and the images, and the semantic information associated with the images is introduced, so that the classification uncertainty is reduced, the classified images are rich in specific meanings, the keywords are manually set, the images can be classified according to the specific scene requirements, the manual labeling cost is saved, the time is saved, and the development efficiency is improved.

Description

Image unsupervised classification method based on natural language

Technical Field

The invention relates to the field of image classification, in particular to an unsupervised image classification method based on natural language.

Background

The image classification task is realized to a great extent by means of supervised learning, a classification target training model is labeled manually, and classification is finally realized. In engineering practice, labeling cost can exponentially increase along with target quantity and recognizable difficulty, and a machine is expected to replace manual labeling, so that in recent years, many researches on unsupervised image classification appear, and the methods are essentially based on a clustering algorithm of image self characteristics, such as a DeepCluster method, and some additional contextual information assists clustering.

The unsupervised method solves the classification problem of automatic labeling to a certain extent, for example, the method of DeepCluster of Facebook has good and representative effect, the whole process comprises clustering network generated feature vectors, updating parameters of the network based on the result of clustering (k-means) as pseudo labels, then leading the network to predict the pseudo labels and generate new feature vectors, re-clustering the new vectors, …, and continuously iterating the two processes.

The existing method has the defects that the classified images have no definite meaning and the similarity of the images is difficult to carry out dequantization expression according to specific requirements.

Disclosure of Invention

The invention aims to provide an image unsupervised classification method based on natural language, which enables the result of unsupervised classification of images to have purposiveness and be directly used for specific scenes, utilizes the correlation between the images and the natural language and guides the classification of the images by using the result of natural language classification, and can realize the classification effect according to the specific purpose.

In order to achieve the purpose, the invention provides the following technical scheme:

an unsupervised image classification method based on natural language includes the following steps,

s1, setting keywords of a classification target;

s2, data acquisition, comprising the following steps:

s2.1, acquiring image data and a title and description information related to the image data by using a distributed crawler technology;

s2.2, analyzing the similarity between the keywords and the title information by using a natural language processing technology, and classifying the images into corresponding categories;

s2.3, establishing a mapping table, numbering artificially set classes, and recording image classes, similarity and text names corresponding to the image classes;

s3, image classification label generation: converting the class number generated in the last step into one-hot codes, and multiplying the one-hot codes by the similarity;

s4, training an image classification model;

s5, reasoning process: image classification is directly used without a natural language processing process.

Preferably, the similarity between the keyword and the title information analyzed in S2.2 adopts the cosine theorem.

Preferably, the method for establishing the mapping table in S2.3 is to capture an image with a title or a description of a related character from the internet, store the character information in a text file, uniformly number the text and the image, and establish the mapping table according to the number.

Preferably, the similarity in S2.3 is the similarity between the text and the keyword, the text-related images exceeding a set threshold are classified into categories corresponding to the keyword, and the similarity is recorded to generate a classification label.

Preferably, the training of the image classification model in S4 includes pre-training by using a deep cluster method, and then training the classification model by using a transfer learning method.

Compared with the prior art, the invention has the beneficial effects that:

the method has the advantages that the unsupervised classification of the images is realized by utilizing the correlation between the titles or the related character descriptions of the images and the images, and the semantic information related to the images is introduced, so that the classification uncertainty is reduced, the classified images are rich in specific meanings, the keywords are manually set, the images can be classified according to specific scene requirements, the manual labeling cost is saved, the time is saved, and the development efficiency is improved.

Detailed Description

Examples

Setting keywords of a classification target:

if the expression of a person is to be analyzed, the keywords are set as happiness, anger, sadness, happiness, etc., and if the purpose is anti-terrorism, the keywords are set as guns, explosives, drugs, etc.

Data acquisition:

1) Acquiring image data and title and description information related to the image data by using a distributed crawler technology;

2) Analyzing similarity between the keywords and the title information (such as a method … … using cosine theorem and the like) by using a natural language processing technology, classifying the images into corresponding categories, wherein if the similarity between the description information and the explosive is 0.8, the image labels are as follows: an explosive (similarity 0.8) and possibly a plurality of labels on one figure, and an explosive (similarity 0.6) and a gun (similarity 0.4) are simultaneously contained;

3) And establishing a mapping table, numbering the manually set categories, and recording the image categories, the similarity and the corresponding text names.

And (3) generating an image classification label:

the class number generated in the previous step is converted into a one-hot code, and then multiplied by a similarity, such as 00000 … 010 (assuming that the car class number is the second last), and multiplied by a similarity (assuming that 0.7) to become 000 … 0.7, so that the tag is easier to capture the intrinsic features, such as labeling a car image of the beetle, assuming that 0.7 similarity to the car and 0.3 similarity to the beetle, if the tag is 00000 … 010, the map is considered not to have the beetle features in the training process, and then when learning other images of the beetle, in order to avoid contradiction, the effective features of the beetle can be extracted, which is not beneficial to network learning, the tag is changed to 0.3 … 0.7, which is more reasonable (if one person learns the classification is used, the map is a car image, and at the same time, the person can understand both the car and the beetle more reasonably, and the scale recognition capability of the beetle can not be reduced.

Training an image classification model:

1. ) Pre-training using the DeepCluster method;

2. ) The classification model is trained using a transfer learning approach.

And (3) reasoning process:

the image classification is directly used, a natural language processing process is not needed, and some pictures which are not specifically described can be correctly classified.

Claims

1. An unsupervised image classification method based on natural language is characterized in that: comprises the following steps of (a) carrying out,

s1, setting keywords of a classification target;

s2, data acquisition, comprising the following steps:

s2.3, establishing a mapping table, numbering the manually set categories, and recording the image categories, the similarity and the corresponding text names;

s4, training an image classification model;

s5, reasoning process: the image classification is directly used without a natural language processing process.

2. A natural language based image unsupervised classification method according to claim 1, characterized in that: and the similarity of the keywords and the title information is analyzed in the S2.2 by adopting the cosine theorem.

3. A natural language based image unsupervised classification method according to claim 1, characterized in that: the method for establishing the mapping table in the S2.3 comprises the steps of capturing images with titles or related character descriptions from the Internet, storing character information into a text file, uniformly numbering the text and the images, and establishing the mapping table according to the numbers.

4. A natural language based image unsupervised classification method according to claim 1, characterized in that: and in the S2.3, the similarity is the similarity between the text and the keywords, the text related images exceeding a set threshold are classified into the categories corresponding to the keywords, the similarity is recorded, and the classification labels are generated.

5. A natural language based image unsupervised classification method according to claim 1, characterized in that: and the step 4 of training the image classification model comprises the steps of pre-training by using a DeepCluster method and training the classification model by using a transfer learning method.