CN115457309A - Image unsupervised classification method based on natural language - Google Patents

Image unsupervised classification method based on natural language Download PDF

Info

Publication number
CN115457309A
CN115457309A CN202210992923.4A CN202210992923A CN115457309A CN 115457309 A CN115457309 A CN 115457309A CN 202210992923 A CN202210992923 A CN 202210992923A CN 115457309 A CN115457309 A CN 115457309A
Authority
CN
China
Prior art keywords
classification
image
images
natural language
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210992923.4A
Other languages
Chinese (zh)
Inventor
孟超越
常智山
史建华
周志扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mingtai Beijing Technology Co ltd
Original Assignee
Mingtai Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mingtai Beijing Technology Co ltd filed Critical Mingtai Beijing Technology Co ltd
Priority to CN202210992923.4A priority Critical patent/CN115457309A/en
Publication of CN115457309A publication Critical patent/CN115457309A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unsupervised image classification method based on natural language, and particularly relates to the field of image classification, which comprises the following steps of S1, setting keywords of a classification target; s2, data acquisition and S3, image classification label generation: converting the class number generated in the last step into one-hot codes, and multiplying the one-hot codes by the similarity; s4, training an image classification model; s5, reasoning process: the image classification is directly used, a natural language processing process is not needed, the unsupervised classification of the images is realized by utilizing the correlation between the titles or the related character descriptions of the images and the images, and the semantic information associated with the images is introduced, so that the classification uncertainty is reduced, the classified images are rich in specific meanings, the keywords are manually set, the images can be classified according to the specific scene requirements, the manual labeling cost is saved, the time is saved, and the development efficiency is improved.

Description

Image unsupervised classification method based on natural language
Technical Field
The invention relates to the field of image classification, in particular to an unsupervised image classification method based on natural language.
Background
The image classification task is realized to a great extent by means of supervised learning, a classification target training model is labeled manually, and classification is finally realized. In engineering practice, labeling cost can exponentially increase along with target quantity and recognizable difficulty, and a machine is expected to replace manual labeling, so that in recent years, many researches on unsupervised image classification appear, and the methods are essentially based on a clustering algorithm of image self characteristics, such as a DeepCluster method, and some additional contextual information assists clustering.
The unsupervised method solves the classification problem of automatic labeling to a certain extent, for example, the method of DeepCluster of Facebook has good and representative effect, the whole process comprises clustering network generated feature vectors, updating parameters of the network based on the result of clustering (k-means) as pseudo labels, then leading the network to predict the pseudo labels and generate new feature vectors, re-clustering the new vectors, …, and continuously iterating the two processes.
The existing method has the defects that the classified images have no definite meaning and the similarity of the images is difficult to carry out dequantization expression according to specific requirements.
Disclosure of Invention
The invention aims to provide an image unsupervised classification method based on natural language, which enables the result of unsupervised classification of images to have purposiveness and be directly used for specific scenes, utilizes the correlation between the images and the natural language and guides the classification of the images by using the result of natural language classification, and can realize the classification effect according to the specific purpose.
In order to achieve the purpose, the invention provides the following technical scheme:
an unsupervised image classification method based on natural language includes the following steps,
s1, setting keywords of a classification target;
s2, data acquisition, comprising the following steps:
s2.1, acquiring image data and a title and description information related to the image data by using a distributed crawler technology;
s2.2, analyzing the similarity between the keywords and the title information by using a natural language processing technology, and classifying the images into corresponding categories;
s2.3, establishing a mapping table, numbering artificially set classes, and recording image classes, similarity and text names corresponding to the image classes;
s3, image classification label generation: converting the class number generated in the last step into one-hot codes, and multiplying the one-hot codes by the similarity;
s4, training an image classification model;
s5, reasoning process: image classification is directly used without a natural language processing process.
Preferably, the similarity between the keyword and the title information analyzed in S2.2 adopts the cosine theorem.
Preferably, the method for establishing the mapping table in S2.3 is to capture an image with a title or a description of a related character from the internet, store the character information in a text file, uniformly number the text and the image, and establish the mapping table according to the number.
Preferably, the similarity in S2.3 is the similarity between the text and the keyword, the text-related images exceeding a set threshold are classified into categories corresponding to the keyword, and the similarity is recorded to generate a classification label.
Preferably, the training of the image classification model in S4 includes pre-training by using a deep cluster method, and then training the classification model by using a transfer learning method.
Compared with the prior art, the invention has the beneficial effects that:
the method has the advantages that the unsupervised classification of the images is realized by utilizing the correlation between the titles or the related character descriptions of the images and the images, and the semantic information related to the images is introduced, so that the classification uncertainty is reduced, the classified images are rich in specific meanings, the keywords are manually set, the images can be classified according to specific scene requirements, the manual labeling cost is saved, the time is saved, and the development efficiency is improved.
Detailed Description
Examples
Setting keywords of a classification target:
if the expression of a person is to be analyzed, the keywords are set as happiness, anger, sadness, happiness, etc., and if the purpose is anti-terrorism, the keywords are set as guns, explosives, drugs, etc.
Data acquisition:
1) Acquiring image data and title and description information related to the image data by using a distributed crawler technology;
2) Analyzing similarity between the keywords and the title information (such as a method … … using cosine theorem and the like) by using a natural language processing technology, classifying the images into corresponding categories, wherein if the similarity between the description information and the explosive is 0.8, the image labels are as follows: an explosive (similarity 0.8) and possibly a plurality of labels on one figure, and an explosive (similarity 0.6) and a gun (similarity 0.4) are simultaneously contained;
3) And establishing a mapping table, numbering the manually set categories, and recording the image categories, the similarity and the corresponding text names.
And (3) generating an image classification label:
the class number generated in the previous step is converted into a one-hot code, and then multiplied by a similarity, such as 00000 … 010 (assuming that the car class number is the second last), and multiplied by a similarity (assuming that 0.7) to become 000 … 0.7, so that the tag is easier to capture the intrinsic features, such as labeling a car image of the beetle, assuming that 0.7 similarity to the car and 0.3 similarity to the beetle, if the tag is 00000 … 010, the map is considered not to have the beetle features in the training process, and then when learning other images of the beetle, in order to avoid contradiction, the effective features of the beetle can be extracted, which is not beneficial to network learning, the tag is changed to 0.3 … 0.7, which is more reasonable (if one person learns the classification is used, the map is a car image, and at the same time, the person can understand both the car and the beetle more reasonably, and the scale recognition capability of the beetle can not be reduced.
Training an image classification model:
1. ) Pre-training using the DeepCluster method;
2. ) The classification model is trained using a transfer learning approach.
And (3) reasoning process:
the image classification is directly used, a natural language processing process is not needed, and some pictures which are not specifically described can be correctly classified.

Claims (5)

1. An unsupervised image classification method based on natural language is characterized in that: comprises the following steps of (a) carrying out,
s1, setting keywords of a classification target;
s2, data acquisition, comprising the following steps:
s2.1, acquiring image data and a title and description information related to the image data by using a distributed crawler technology;
s2.2, analyzing the similarity between the keywords and the title information by using a natural language processing technology, and classifying the images into corresponding categories;
s2.3, establishing a mapping table, numbering the manually set categories, and recording the image categories, the similarity and the corresponding text names;
s3, image classification label generation: converting the class number generated in the last step into one-hot codes, and multiplying the one-hot codes by the similarity;
s4, training an image classification model;
s5, reasoning process: the image classification is directly used without a natural language processing process.
2. A natural language based image unsupervised classification method according to claim 1, characterized in that: and the similarity of the keywords and the title information is analyzed in the S2.2 by adopting the cosine theorem.
3. A natural language based image unsupervised classification method according to claim 1, characterized in that: the method for establishing the mapping table in the S2.3 comprises the steps of capturing images with titles or related character descriptions from the Internet, storing character information into a text file, uniformly numbering the text and the images, and establishing the mapping table according to the numbers.
4. A natural language based image unsupervised classification method according to claim 1, characterized in that: and in the S2.3, the similarity is the similarity between the text and the keywords, the text related images exceeding a set threshold are classified into the categories corresponding to the keywords, the similarity is recorded, and the classification labels are generated.
5. A natural language based image unsupervised classification method according to claim 1, characterized in that: and the step 4 of training the image classification model comprises the steps of pre-training by using a DeepCluster method and training the classification model by using a transfer learning method.
CN202210992923.4A 2022-08-18 2022-08-18 Image unsupervised classification method based on natural language Pending CN115457309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210992923.4A CN115457309A (en) 2022-08-18 2022-08-18 Image unsupervised classification method based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210992923.4A CN115457309A (en) 2022-08-18 2022-08-18 Image unsupervised classification method based on natural language

Publications (1)

Publication Number Publication Date
CN115457309A true CN115457309A (en) 2022-12-09

Family

ID=84298597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210992923.4A Pending CN115457309A (en) 2022-08-18 2022-08-18 Image unsupervised classification method based on natural language

Country Status (1)

Country Link
CN (1) CN115457309A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091982A (en) * 2023-04-03 2023-05-09 浪潮电子信息产业股份有限公司 Image detection method, device, electronic equipment and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091982A (en) * 2023-04-03 2023-05-09 浪潮电子信息产业股份有限公司 Image detection method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US20220270369A1 (en) Intelligent cataloging method for all-media news based on multi-modal information fusion understanding
Amir et al. IBM Research TRECVID-2003 Video Retrieval System.
CN110705490B (en) Visual emotion recognition method
CN113946677A (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN108229285B (en) Object classification method, object classifier training method and device and electronic equipment
CN116975340A (en) Information retrieval method, apparatus, device, program product, and storage medium
CN115457309A (en) Image unsupervised classification method based on natural language
CN111444720A (en) Named entity recognition method for English text
US20070005529A1 (en) Cross descriptor learning system, method and program product therefor
CN114741556A (en) Short video frequency classification method based on scene segment and multi-mode feature enhancement
CN111259109B (en) Method for converting audio frequency into video frequency based on video big data
CN110688461B (en) Online text education resource label generation method integrating multi-source knowledge
Zhang et al. Movie genre classification by language augmentation and shot sampling
CN116958677A (en) Internet short video classification method based on multi-mode big data
CN116821297A (en) Stylized legal consultation question-answering method, system, storage medium and equipment
CN111737507A (en) Single-mode image Hash retrieval method
CN115481254A (en) Method, system, readable storage medium and equipment for analyzing video effect content of movie and television play script
CN113377959A (en) Few-sample social media rumor detection method based on meta learning and deep learning
CN112579666A (en) Intelligent question-answering system and method and related equipment
CN111309933A (en) Automatic marking system for cultural resource data
Vishwakarma et al. Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers
Reddy et al. Automatic caption generation for annotated images by using clustering algorithm
Guangli et al. Video summarization based on ListNet scoring mechanism
Yu et al. Harvesting web images for realistic facial expression recognition
Gupta et al. Multimodal Meme Sentiment Analysis with Image Inpainting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination