CN117953320A - Training method and device for image category annotation model and electronic equipment - Google Patents

Training method and device for image category annotation model and electronic equipment Download PDF

Info

Publication number
CN117953320A
CN117953320A CN202410025012.3A CN202410025012A CN117953320A CN 117953320 A CN117953320 A CN 117953320A CN 202410025012 A CN202410025012 A CN 202410025012A CN 117953320 A CN117953320 A CN 117953320A
Authority
CN
China
Prior art keywords
image
training
model
category
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410025012.3A
Other languages
Chinese (zh)
Inventor
游鹏
李志涵
张朋
张学涵
王仁根
汪志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202410025012.3A priority Critical patent/CN117953320A/en
Publication of CN117953320A publication Critical patent/CN117953320A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a training method and device for an image category annotation model and electronic equipment, wherein the method comprises the following steps: training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels; performing category prediction on a second image without a target category label through the first model to obtain a predicted category label of the second image; determining category information of the second image through a CLIP model, selecting a target category label from the predicted category label based on the category information of the second image and the predicted category label, and taking the corresponding second image as a new first image; updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set. The embodiment not only realizes high-efficiency automatic labeling of the image categories, but also further ensures the labeling accuracy.

Description

Training method and device for image category annotation model and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to a training method and device for an image category annotation model and electronic equipment.
Background
In image processing, it is often necessary to classify images by model, while accurate models need to be trained with a large amount of high quality annotation data.
In the related art, the method is realized by means of manual marking and a part of automatic tools in the face of huge data to be marked.
However, the manual labeling method often consumes a lot of manpower and time.
Disclosure of Invention
The embodiment of the application provides a training method and device for an image category labeling model and electronic equipment, which are used for accurately and efficiently automatically labeling image categories.
In a first aspect, an embodiment of the present application provides a training method for a first image class labeling model, where the method includes:
Training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels;
Performing category prediction on a second image without a target category label through the first model to obtain a predicted category label of the second image;
Determining category information of the second Image through a Pre-training (Contrastive Language-Image Pre-training) model based on a comparison text-Image pair, selecting a target category label from the prediction category label based on the category information of the second Image and the prediction category label, and taking the corresponding second Image as a new first Image;
updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
According to the scheme, the small sample (the initial first image) is used as a training set to train the pre-training model to obtain the first model, so that the problem that a large amount of data is required to train is solved, and manual labeling is reduced; automatically labeling the second image through the first model; after labeling, the predicted category label is not directly used as a target category label, category information of the second image is determined through the CLIP model, part of the predicted category label which does not meet the requirements is filtered based on the category information of the second image and the predicted category label, the predicted category label which meets the requirements is used as the target category label, and then the corresponding second image is used as a new first image, and the training set is updated continuously; training based on the continuously updated training set, training the labels while continuously iterating to improve the labeling effect of the first model and increase the accuracy of the labeling result (prediction type label); the automatic labeling method not only realizes the efficient automatic labeling of the image categories, but also further ensures the labeling accuracy.
In some alternative embodiments, training the pre-training model based on the training set to obtain the first model includes:
Amplifying the images in the training set based on the target class label distribution in the training set, and carrying out data enhancement on the amplified images;
Training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
According to the scheme, through carrying out data amplification and data enhancement on the images in the training set, the number of the images in the training set is increased, and the category distribution of the images in the training set is more uniform, so that the first model can be trained more accurately based on the images after the data amplification and the data enhancement.
In some optional embodiments, before determining the category information of the second image by the CLIP model, the method further includes:
And adjusting the CLIP model based on the training set.
According to the scheme, the CLIP model is adjusted based on the training set, so that the CLIP model accurately learns the association between the images in the training set and the categories, and the CLIP model can accurately determine the category information of the second image.
In some alternative embodiments, adjusting the CLIP model based on the training set includes:
converting the target category labels in the training set into text information and converting the images in the training set into characteristic information;
And adjusting the CLIP model based on the text information and the characteristic information.
In some alternative embodiments, the category information includes a score under each preset category label.
According to the scheme, the scores of the second images under the labels of the preset categories are output through the CLIP model, so that the possibility that the second images are marked into various categories can be accurately evaluated.
In some alternative embodiments, selecting a target category label from the predicted category label based on the category information of the second image and the predicted category label includes:
for any second image, if the highest score preset category label of the second image is different from the predicted category label of the second image; and/or determining that the predicted category label of the second image is not the target category label if the score of the second image under the predicted category label is smaller than a preset score;
otherwise, determining that the predicted class label of the second image is a target class label.
According to the scheme, the score of the second image under each preset type label represents the possibility that the second image is marked as various preset type labels, and if the highest score preset type label of the second image is different from the predicted type label of the second image, other preset type labels which are more suitable for the second image exist, and the predicted type label is not suitable for serving as a target type label; if the score of the second image under the predicted category label is smaller than the preset score, the second image is not highly likely to be marked as the predicted category label and is not suitable for serving as the target category label; filtering the predicted category label if at least one of the above conditions is satisfied, determining that the predicted category label of the second image is not the target category label; otherwise, if none of the above conditions is satisfied, the prediction type label of the second image is determined to be suitable, and the prediction type label is determined to be the target type label, so that the prediction type label is accurately filtered.
In a second aspect, an embodiment of the present application provides a training apparatus for a first image class labeling model, where the apparatus includes:
The training module is used for training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels;
the prediction module is used for carrying out category prediction on the second image without the target category label through the first model to obtain a predicted category label of the second image;
the screening module is used for determining the category information of the second image through the CLIP model, selecting a target category label from the prediction category label based on the category information of the second image and the prediction category label, and taking the corresponding second image as a new first image;
And the iteration module is used for updating the training set based on the new first image and carrying out iterative training on the first model based on the updated training set.
In some alternative embodiments, the training module is specifically configured to:
based on the target class label distribution in the training set, performing data enhancement on the amplified images;
Training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
In some optional embodiments, before the screening module determines the category information of the second image through the CLIP model, the screening module is further configured to:
And adjusting the CLIP model based on the training set.
In some alternative embodiments, the screening module, in particular:
converting the target category labels in the training set into text information and converting the images in the training set into characteristic information;
And adjusting the CLIP model based on the text information and the characteristic information.
In some alternative embodiments, the category information includes a score under each preset category label.
In some alternative embodiments, the screening module is specifically configured to:
for any second image, if the highest score preset category label of the second image is different from the predicted category label of the second image; and/or determining that the predicted category label of the second image is not the target category label if the score of the second image under the predicted category label is smaller than a preset score;
otherwise, determining that the predicted class label of the second image is a target class label.
In a third aspect, an embodiment of the present application provides an electronic device, including at least one processor and at least one memory, where the memory stores a computer program, and when the program is executed by the processor, causes the processor to execute the training method of the image class labeling model in any one of the first aspects.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program executable by a processor, where the program when executed on the processor causes the processor to perform the training method of the image class annotation model according to any one of the above first aspects.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a training method of a first image class annotation model according to an embodiment of the present application;
FIG. 2 is a flowchart of a training method of a second image class annotation model according to an embodiment of the present application;
FIG. 3 is a flowchart of a training method of a third image class annotation model according to an embodiment of the present application;
FIG. 4 is a flowchart of a training method of a fourth image class annotation model according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a training device for an image class annotation model according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present application, it should be noted that, unless explicitly stated and limited otherwise, the term "connected" should be interpreted broadly, and for example, it may be directly connected, or it may be indirectly connected through an intermediate medium, or it may be communication between two devices. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.
In image processing, it is often necessary to classify images by model, while accurate models need to be trained with a large amount of high quality annotation data.
In the related art, the method is realized by means of manual marking and a part of automatic tools in the face of huge data to be marked.
The manual labeling method often consumes a great deal of manpower and time cost.
In view of this, an embodiment of the present application provides a training method and apparatus for an image class labeling model, and an electronic device, where the method includes: training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels; performing category prediction on a second image without a target category label through the first model to obtain a predicted category label of the second image; determining category information of the second image through a CLIP model, selecting a target category label from the predicted category label based on the category information of the second image and the predicted category label, and taking the corresponding second image as a new first image; updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
According to the scheme, the small sample (the initial first image) is used as a training set to train the pre-training model to obtain the first model, so that the problem that a large amount of data is required to train is solved, and manual labeling is reduced; automatically labeling the second image through the first model; after labeling, the predicted category label is not directly used as a target category label, category information of the second image is determined through the CLIP model, part of the predicted category label which does not meet the requirements is filtered based on the category information of the second image and the predicted category label, the predicted category label which meets the requirements is used as the target category label, and then the corresponding second image is used as a new first image, and the training set is updated continuously; training based on the continuously updated training set, training the labels while continuously iterating to improve the labeling effect of the first model and increase the accuracy of the labeling result (prediction type label); the automatic labeling method not only realizes the efficient automatic labeling of the image categories, but also further ensures the labeling accuracy.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems with reference to the drawings and specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 1 is a flow chart of a training method of a first image class labeling model according to an embodiment of the present application, as shown in fig. 1, including the following steps:
Step S101: training the pre-training model based on the training set to obtain a first model.
The training set comprises a first image with target class labels and corresponding target class labels.
In the implementation, in order to reduce manual labeling, a small number of first images with target class labels are set, and a model capable of labeling the labels is difficult to train by directly using a small sample; based on this, in this embodiment, a pre-training model is further provided, and a small sample (an initial first image) is used as a training set to perform migration training on the pre-training model, so as to obtain a first model.
The pre-training model is obtained through training of a large sample, and the labeling standard of the large sample can be different from the labeling standard adopted in the embodiment.
The specific training mode of the pre-training model is not limited in this embodiment, and the pre-training model is obtained by training and fine tuning based on ImageNet (a data set) by way of example.
Step S102: and carrying out category prediction on the second image without the target category label through the first model to obtain a predicted category label of the second image.
In this embodiment, in order to reduce manual labeling, only a small number of first images with target class labels are set, and a large number of second images without target class labels exist; the trained first model has a labeling function, and can label the second image based on the first model to obtain a prediction type label of the second image.
Step S103: and determining the category information of the second image through the CLIP model, selecting a target category label from the prediction category label based on the category information of the second image and the prediction category label, and taking the corresponding second image as a new first image.
In practice, since the first model is obtained by performing migration training on a small sample, the labeling effect is not very accurate, that is, in the initial stage, a large number of predicted class labels of the second image are wrong; based on the above, the embodiment determines the category information of the second image by means of the CLIP model, filters out part of the prediction category labels which do not meet the requirements based on the category information of the second image and the prediction category labels, and takes the prediction category labels which meet the requirements as target category labels, so that the labeling result of the part of the second image is accurate.
Step S104: updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
Because the first model is obtained by migration training of a small sample, the labeling effect is not very accurate; the new first image has accurate prediction category label, so that the corresponding second image can be used as the new first image to continuously update the training set; based on the training of the continuously updated training set, the label is trained while iterating continuously, so that the labeling effect of the first model is improved, and the accuracy of the labeling result (the prediction type label) is increased.
Of course, if there are no more undesirable prediction category labels, it is indicated that all the prediction category labels of the second image are accurate, and all the samples are accurately labeled, at this time, the iterative updating of the first model is stopped.
According to the scheme, the small sample (the initial first image) is used as a training set to train the pre-training model to obtain the first model, so that the problem that a large amount of data is required to train is solved, and manual labeling is reduced; automatically labeling the second image through the first model; after labeling, the predicted category label is not directly used as a target category label, category information of the second image is determined through the CLIP model, part of the predicted category label which does not meet the requirements is filtered based on the category information of the second image and the predicted category label, the predicted category label which meets the requirements is used as the target category label, and then the corresponding second image is used as a new first image, and the training set is updated continuously; training based on the continuously updated training set, training the labels while continuously iterating to improve the labeling effect of the first model and increase the accuracy of the labeling result (prediction type label); the automatic labeling method not only realizes the efficient automatic labeling of the image categories, but also further ensures the labeling accuracy.
Fig. 2 is a flow chart of a training method of a second image class labeling model according to an embodiment of the application, as shown in fig. 2, including the following steps:
step S201: and amplifying the images in the training set based on the target class label distribution in the training set, and carrying out data enhancement on the amplified images.
As described above, in order to reduce manual labeling, a small number of first images with target class labels are set, that is, the images in the training set are not too many, and in order to improve the training effect on the first model, the data of the images can be amplified based on the target class label distribution in the training set, and then the amplified data set is subjected to data enhancement.
The number of images of a certain type of labels is smaller, the number of amplifications is larger (the purpose of data amplification is to realize the material number balance of each type, the purpose of data amplification is to enrich training samples, the purpose of data amplification is to directly copy the images with the small number of the type labels, and the purpose of data enhancement is to perform data enhancement on the amplified data set by adopting some data enhancement strategies).
The data enhancement mode in this embodiment is not specifically limited, for example, random mirror-surface inversion, horizontal inversion, normalization, and HSV (one color mode) transformation are performed on an image, for example, values of Hue (Hue), saturation (Saturation), brightness (Value) and the like of the image are randomly modified, so as to obtain the data enhancement image identical to the image type label.
Step S202: training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
In implementation, through carrying out data amplification and data enhancement to the images in the training set, not only increased the quantity of images in the training set, still make the classification distribution of images in the training set more even, consequently, can train more accurate first model based on the image after data amplification and data enhancement.
Step S203: and carrying out category prediction on the second image without the target category label through the first model to obtain a predicted category label of the second image.
Step S204: and determining the category information of the second image through the CLIP model, selecting a target category label from the prediction category label based on the category information of the second image and the prediction category label, and taking the corresponding second image as a new first image.
Step S205: updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
The specific implementation of steps S203 to S205 may refer to other embodiments, and will not be described herein.
According to the scheme, through carrying out data amplification and data enhancement on the images in the training set, the number of the images in the training set is increased, and the category distribution of the images in the training set is more uniform, so that the first model can be trained more accurately based on the images after the data amplification and the data enhancement.
Fig. 3 is a flowchart of a training method of a third image class labeling model according to an embodiment of the present application, as shown in fig. 3, including the following steps:
Step S301: training the pre-training model based on the training set to obtain a first model.
The training set comprises a first image with target class labels and corresponding target class labels.
Step S302: and carrying out category prediction on the second image without the target category label through the first model to obtain a predicted category label of the second image.
The specific implementation of steps S301 to S302 may refer to other embodiments, and will not be described herein.
Step S303: and adjusting the CLIP model based on the training set.
In implementation, the CLIP model is adjusted based on the training set, so that the CLIP model accurately learns the association between the images and the categories in the training set, and the CLIP model can accurately determine the category information of the second image.
Step S304: and determining the category information of the second image through the CLIP model, selecting a target category label from the prediction category label based on the category information of the second image and the prediction category label, and taking the corresponding second image as a new first image.
Step S305: updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
The specific implementation of steps S304 to S305 may refer to other embodiments, and will not be described herein.
According to the scheme, the CLIP model is adjusted based on the training set, so that the CLIP model accurately learns the association between the images in the training set and the categories, and the CLIP model can accurately determine the category information of the second image.
In some alternative embodiments, the step S303 may be implemented by, but not limited to, the following ways:
converting the target category labels in the training set into text information and converting the images in the training set into characteristic information;
And adjusting the CLIP model based on the text information and the characteristic information.
The training set includes, for example, an image and a corresponding target class label, to learn the association of the two, converting the target class label into text information, converting the image into feature information, using the text information and the feature information as a CLIP model, then the CLIP model performs picture retrieval on the second image according to each class,
It will be appreciated that during iterative updating of the first model, the images in the training set are increasing, and therefore, the CLIP model is continually adjusted.
In some alternative embodiments, the category information includes a score under each preset category label.
In this embodiment, the CLIP model outputs the scores of the second images under the labels of the preset categories, so that the possibility that each second image is labeled as each category can be accurately evaluated.
The preset category labels are all target category labels related to the first image.
Correspondingly, fig. 4 is a flow chart of a training method of a fourth image class labeling model according to an embodiment of the present application, as shown in fig. 4, including the following steps:
Step S401: training the pre-training model based on the training set to obtain a first model.
The training set comprises a first image with target class labels and corresponding target class labels.
Step S402: and carrying out category prediction on the second image without the target category label through the first model to obtain a predicted category label of the second image.
The specific implementation of steps S401 to S402 may refer to other embodiments, and will not be described herein.
Step S403: determining the scores of the second images under each preset category label through a CLIP model, and aiming at any second image, if the highest score preset category label of the second image is different from the predicted category label of the second image; and/or determining that the predicted category label of the second image is not the target category label if the score of the second image under the predicted category label is smaller than a preset score; otherwise, determining that the predicted class label of the second image is a target class label.
In this embodiment, the score of the second image under each preset category label characterizes the possibility that the second image is labeled as various preset category labels, and if the highest score preset category label of the second image is different from the predicted category label of the second image, it is indicated that other preset category labels more suitable for the second image exist, and the predicted category label is not suitable for serving as the target category label; if the score of the second image under the predicted class label is less than the preset score, the second image is not highly likely to be marked as the predicted class label and is not suitable as the target class label. Thus, if at least one of the above conditions is met, filtering out the predicted category label, determining that the predicted category label of the second image is not the target category label; otherwise, if none of the above conditions is satisfied, the predicted class label of the second image is determined to be appropriate, and is determined to be the target class label.
Step S404: and taking the second image corresponding to the target category label as a new first image.
Step S405: updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
The specific implementation of steps S404 to S405 may refer to other embodiments, and will not be described herein.
According to the scheme, the score of the second image under each preset type label represents the possibility that the second image is marked as various preset type labels, and if the highest score preset type label of the second image is different from the predicted type label of the second image, other preset type labels which are more suitable for the second image exist, and the predicted type label is not suitable for serving as a target type label; if the score of the second image under the predicted category label is smaller than the preset score, the second image is not highly likely to be marked as the predicted category label and is not suitable for serving as the target category label; filtering the predicted category label if at least one of the above conditions is satisfied, determining that the predicted category label of the second image is not the target category label; otherwise, if none of the above conditions is satisfied, the prediction type label of the second image is determined to be suitable, and the prediction type label is determined to be the target type label, so that the prediction type label is accurately filtered.
As shown in fig. 5, an embodiment of the present application provides a training apparatus 500 for an image class annotation model, which includes:
The training module 501 is configured to train the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels;
the prediction module 502 is configured to perform category prediction on a second image without a target category label through the first model, so as to obtain a predicted category label of the second image;
A screening module 503, configured to determine, by using a CLIP model, category information of the second image, select, based on the category information of the second image and a predicted category label, a target category label from the predicted category label, and use the corresponding second image as a new first image;
An iteration module 504, configured to update the training set based on the new first image, and perform iterative training on the first model based on the updated training set.
In some alternative embodiments, training module 501 is specifically configured to:
Amplifying the images in the training set based on the target class label distribution in the training set, and carrying out data enhancement on the amplified images;
Training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
In some alternative embodiments, before the screening module 503 determines the category information of the second image through the CLIP model, the screening module is further configured to:
And adjusting the CLIP model based on the training set.
In some alternative embodiments, the screening module 503 specifically:
converting the target category labels in the training set into text information and converting the images in the training set into characteristic information;
And adjusting the CLIP model based on the text information and the characteristic information.
In some alternative embodiments, the category information includes a score under each preset category label.
In some alternative embodiments, the screening module 503 is specifically configured to:
for any second image, if the highest score preset category label of the second image is different from the predicted category label of the second image; and/or determining that the predicted category label of the second image is not the target category label if the score of the second image under the predicted category label is smaller than a preset score;
otherwise, determining that the predicted class label of the second image is a target class label.
Since the device is the device in the method according to the embodiment of the present application, and the principle of the device for solving the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.
Based on the same technical concept, the embodiment of the present application further provides an electronic device 600, as shown in fig. 6, including at least one processor 601 and a memory 602 connected to the at least one processor, where a specific connection medium between the processor 601 and the memory 602 is not limited in the embodiment of the present application, and in fig. 6, the processor 601 and the memory 602 are connected by a bus 603 as an example. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.
The processor 601 is a control center of the electronic device, and may use various interfaces and lines to connect various parts of the electronic device, and execute or execute instructions stored in the memory 602 and call data stored in the memory 602, thereby implementing data processing. Alternatively, the processor 601 may include one or more processing units, and the processor 601 may integrate an application processor and a modem processor, wherein the application processor primarily processes an operating system, a user interface, an application program, and the like, and the modem processor primarily processes issuing instructions. It will be appreciated that the modem processor described above may not be integrated into the processor 601. In some embodiments, processor 601 and memory 602 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 601 may be a general purpose processor such as a CPU, digital signal processor, application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or a combination of both, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the training method embodiments in connection with the image class annotation model may be directly embodied as a hardware processor executing or may be executed by a combination of hardware and software modules in the processor.
The memory 602 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 602 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 602 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.
In an embodiment of the present application, the memory 602 stores a computer program that, when executed by the processor 601, causes the processor 601 to perform:
Training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels;
Performing category prediction on a second image without a target category label through the first model to obtain a predicted category label of the second image;
Determining category information of the second image through a CLIP model, selecting a target category label from the predicted category label based on the category information of the second image and the predicted category label, and taking the corresponding second image as a new first image;
updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
In some alternative embodiments, processor 601 specifically performs:
Amplifying the images in the training set based on the target class label distribution in the training set, and carrying out data enhancement on the amplified images;
Training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
In some alternative embodiments, before determining the category information of the second image by the CLIP model, the processor 601 further performs:
And adjusting the CLIP model based on the training set.
In some alternative embodiments, processor 601 specifically performs:
converting the target category labels in the training set into text information and converting the images in the training set into characteristic information;
And adjusting the CLIP model based on the text information and the characteristic information.
In some alternative embodiments, the category information includes a score under each preset category label.
In some alternative embodiments, processor 601 specifically performs:
for any second image, if the highest score preset category label of the second image is different from the predicted category label of the second image; and/or determining that the predicted category label of the second image is not the target category label if the score of the second image under the predicted category label is smaller than a preset score;
otherwise, determining that the predicted class label of the second image is a target class label.
Because the electronic device is the electronic device in the method according to the embodiment of the present application, and the principle of solving the problem by the electronic device is similar to that of the method, the implementation of the electronic device may refer to the implementation of the method, and the repetition is omitted.
Based on the same technical concept, the embodiment of the application also provides a computer readable storage medium, which stores a computer program executable by a processor, and when the program runs on the processor, the program causes the processor to execute the steps of the training method of the image category annotation model.
In some alternative embodiments, aspects of the method for training an image class annotation model provided by the present application may also be implemented in the form of a program product, which contains computer-executable instructions for causing a computer device to perform the steps of the method for training an image class annotation model according to the various exemplary embodiments of the present application described above when the program product is run on the computer device.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, electronic devices (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. The training method of the image category annotation model is characterized by comprising the following steps of:
Training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels;
Performing category prediction on a second image without a target category label through the first model to obtain a predicted category label of the second image;
Determining category information of the second image through a CLIP model, selecting a target category label from the predicted category label based on the category information of the second image and the predicted category label, and taking the corresponding second image as a new first image;
updating the training set based on the new first image, and performing iterative training on the first model based on the updated training set.
2. The method of claim 1, wherein training the pre-training model based on the training set to obtain the first model comprises:
Amplifying the images in the training set based on the target class label distribution in the training set, and carrying out data enhancement on the amplified images;
Training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
3. The method of claim 1, further comprising, prior to determining the category information of the second image by a CLIP model:
And adjusting the CLIP model based on the training set.
4. The method of claim 3, wherein adjusting the CLIP model based on the training set comprises:
converting the target category labels in the training set into text information and converting the images in the training set into characteristic information;
And adjusting the CLIP model based on the text information and the characteristic information.
5. The method of claim 1, wherein the category information includes a score under each preset category label.
6. The method of claim 5, wherein selecting a target category label from the predicted category label based on the category information of the second image and the predicted category label comprises:
for any second image, if the highest score preset category label of the second image is different from the predicted category label of the second image; and/or determining that the predicted category label of the second image is not the target category label if the score of the second image under the predicted category label is smaller than a preset score;
otherwise, determining that the predicted class label of the second image is a target class label.
7. A training device for an image class annotation model, the device comprising:
The training module is used for training the pre-training model based on the training set to obtain a first model; the training set comprises a first image with target class labels and corresponding target class labels;
the prediction module is used for carrying out category prediction on the second image without the target category label through the first model to obtain a predicted category label of the second image;
the screening module is used for determining the category information of the second image through the CLIP model, selecting a target category label from the prediction category label based on the category information of the second image and the prediction category label, and taking the corresponding second image as a new first image;
And the iteration module is used for updating the training set based on the new first image and carrying out iterative training on the first model based on the updated training set.
8. The apparatus of claim 7, wherein the training module is configured to:
Amplifying the images in the training set based on the target class label distribution in the training set, and carrying out data enhancement on the amplified images;
Training the pre-training model based on the image with the enhanced data and the corresponding target class label to obtain the first model.
9. An electronic device comprising at least one processor and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the method of any of claims 1-6.
10. A computer readable storage medium, characterized in that it stores a computer program executable by a computer, which when run on the computer causes the computer to perform the method according to any one of claims 1 to 6.
CN202410025012.3A 2024-01-08 2024-01-08 Training method and device for image category annotation model and electronic equipment Pending CN117953320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410025012.3A CN117953320A (en) 2024-01-08 2024-01-08 Training method and device for image category annotation model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410025012.3A CN117953320A (en) 2024-01-08 2024-01-08 Training method and device for image category annotation model and electronic equipment

Publications (1)

Publication Number Publication Date
CN117953320A true CN117953320A (en) 2024-04-30

Family

ID=90803161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410025012.3A Pending CN117953320A (en) 2024-01-08 2024-01-08 Training method and device for image category annotation model and electronic equipment

Country Status (1)

Country Link
CN (1) CN117953320A (en)

Similar Documents

Publication Publication Date Title
CN112232346B (en) Semantic segmentation model training method and device, and image semantic segmentation method and device
CN109858476B (en) Tag expansion method and electronic equipment
CN110969600A (en) Product defect detection method and device, electronic equipment and storage medium
CN112200218B (en) Model training method and device and electronic equipment
CN112634201B (en) Target detection method and device and electronic equipment
CN112446441B (en) Model training data screening method, device, equipment and storage medium
CN112926471A (en) Method and device for identifying image content of business document
CN112487930A (en) Method, system and storage medium for automated karyotype analysis
CN113344079B (en) Image tag semi-automatic labeling method, system, terminal and medium
CN113780287A (en) Optimal selection method and system for multi-depth learning model
CN117953320A (en) Training method and device for image category annotation model and electronic equipment
US20160092729A1 (en) Information processing device, information processing method, and computer program product
CN115878793B (en) Multi-label document classification method, device, electronic equipment and medium
CN115310277A (en) Model training method, system, device and storage medium
CN114926437A (en) Image quality evaluation method and device
CN114170604A (en) Character recognition method and system based on Internet of things
CN113284141A (en) Model determination method, device and equipment for defect detection
CN114118950A (en) Method and device for arranging consultation scheme based on project
CN111753926A (en) Data sharing method and system for smart city
CN111985583B (en) Deep learning sample labeling method based on learning data
CN111400522B (en) Traffic sign recognition method, training method and equipment
CN111612023A (en) Classification model construction method and device
CN110991296B (en) Video annotation method and device, electronic equipment and computer-readable storage medium
CN113449814B (en) Picture level classification method and system
CN116434228A (en) Image tag labeling method, device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination