WO2024022376A1 - Image processing method and apparatus, and device and medium - Google Patents

Image processing method and apparatus, and device and medium Download PDF

Info

Publication number
WO2024022376A1
WO2024022376A1 PCT/CN2023/109269 CN2023109269W WO2024022376A1 WO 2024022376 A1 WO2024022376 A1 WO 2024022376A1 CN 2023109269 W CN2023109269 W CN 2023109269W WO 2024022376 A1 WO2024022376 A1 WO 2024022376A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample image
unlabeled sample
label
target object
target
Prior art date
Application number
PCT/CN2023/109269
Other languages
French (fr)
Chinese (zh)
Inventor
吕永春
朱徽
周迅溢
蒋宁
吴海英
Original Assignee
马上消费金融股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 马上消费金融股份有限公司 filed Critical 马上消费金融股份有限公司
Publication of WO2024022376A1 publication Critical patent/WO2024022376A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • This application relates to the field of deep learning technology, and in particular, to an image processing method, device, equipment and medium.
  • SSL Semi-Supervised Learning
  • Semi-supervised learning uses a large number of unlabeled samples and a small number of labeled samples to perform pattern recognition. Semi-supervised learning can effectively save the cost of manually labeling samples while improving the model training effect.
  • the teacher model when using unlabeled samples for semi-supervised learning, is usually used to process the unlabeled samples to generate prediction results, and then filter low-confidence prediction results based on a preset threshold to generate guidance for training. pseudo-labels of the student model, and then train the student model based on the pseudo-labels and labeled samples.
  • This application provides an image processing method, device, equipment and medium.
  • embodiments of the present application provide an image processing method, including:
  • the pseudo-label data set includes the target object in each round of target detection.
  • the corresponding pseudo label in , the N is a positive integer greater than 1;
  • the unlabeled sample image is determined to be the target unlabeled sample image.
  • an image processing device including:
  • An image acquisition module used to acquire unlabeled sample images, where the unlabeled sample images include at least one target object
  • a label data set acquisition module used to perform N rounds of target detection on the unlabeled sample image, and obtain a pseudo label data set corresponding to each target object in the unlabeled sample image, where the pseudo label data set includes the The pseudo label corresponding to the target object in each round of target detection, the N is a positive integer greater than 1;
  • a confidence determination module configured to determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set;
  • Image determination module used if there is at least one pseudo label corresponding to the target object in the unlabeled sample image The confidence level is greater than the preset threshold, and the unlabeled sample image is determined to be the target unlabeled sample image.
  • embodiments of the present application provide a computer device, including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer execution instructions
  • the processor executes computer execution instructions stored in the memory to implement the above method.
  • embodiments of the present application provide a computer-readable storage medium.
  • Computer instructions are stored in the computer-readable storage medium. When the computer instructions are executed by a processor, they are used to implement the above image processing method.
  • embodiments of the present application provide a computer program product, which includes computer instructions that implement the above image processing method when executed by at least one processor.
  • embodiments of the present application provide a computer program that, when executed by a processor, implements the above image processing method.
  • Figure 1 is a schematic scene diagram of the image processing method provided by the embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the target detection results provided by the embodiment of the present application.
  • Figure 5 is a schematic diagram of a target detection result provided by another embodiment of the present application.
  • Figure 6 is a schematic diagram of a target detection result provided by another embodiment of the present application.
  • Figure 7 is a schematic diagram of the results of target detection provided by another embodiment of the present application.
  • Figure 8 is a schematic diagram of the results of target detection provided by another embodiment of the present application.
  • Figure 9 is a schematic flow chart of the model training method provided by the embodiment of the present application.
  • Figure 10 is a schematic diagram of the end-to-end semi-supervised target detection framework training provided by the embodiment of this application.
  • Figure 11 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of the model training device provided by the embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a computer device provided by another embodiment of the present application.
  • Deep learning has achieved great success in fields such as computer vision, multimedia, and image processing.
  • the success of deep learning relies on the training of large-scale neural networks
  • the training of large-scale neural networks relies on the driving of large-scale labeled data sets.
  • building such large-scale labeled data sets for supervised learning training models is time-consuming and labor-intensive, and the labeling process relies too much on expert experience. It is very easy and fast to obtain unlabeled image data. Therefore, semi-supervised learning has emerged.
  • Semi-supervised learning can effectively improve model performance by using labeled data to provide necessary guidance information for model training, combined with supplementary information from a large amount of unlabeled data.
  • Semi-supervised learning can effectively save the cost of manual annotation while improving the model training effect.
  • semi-supervised image classification and semi-supervised object detection tasks are related applications of semi-supervised learning.
  • the semi-supervised target detection task is more difficult and complex because there are more targets in the image, and the application of unlabeled images is more difficult and complex, so the task is more difficult.
  • the accuracy of pseudo labels is the method that achieves excellent performance.
  • Current semi-supervised target detection methods usually generate prediction results through the teacher model, use preset thresholds to filter low-confidence candidate boxes and categories, and generate pseudo labels used to guide student model training.
  • a single teacher model generates pseudo labels based on a fixed threshold.
  • the pseudo label information generated often contains a large amount of noise information, has poor reliability, and cannot effectively improve model performance when it is subsequently used to train a student model.
  • embodiments of the present application provide an image processing method, device, equipment and medium to solve the existing problem of pseudo labels generated from unlabeled samples in semi-supervised learning that are prone to noise and have poor reliability.
  • the pseudo-label generation process needs to be optimized. Specifically, by performing multiple rounds of prediction on unlabeled sample images, counting the confidence of the pseudo-label corresponding to each target object in the unlabeled sample image in each round of prediction, and screening out those whose confidence meets the requirements. Pseudo labels, which can improve the reliability of pseudo labels generated in target unlabeled sample images, reduce noise, and improve the performance of semi-supervised target detection methods.
  • Figure 1 is a schematic diagram of a scene of an image processing method provided by an embodiment of the present application.
  • the unlabeled sample image set in the computer device 100 usually includes several images, and each unlabeled sample image has failed Labeled manually.
  • the prediction result 102 can be obtained by predicting it through the teacher model (that is, inputting the image 101 to the teacher model, performing target detection through the teacher model and outputting the prediction result 102), the prediction result 102 may include a target object 1021 (the target object 1021 is a person as an example) and a candidate frame that identifies the location area where the target object 1021 is located (ie, the dotted box in Figure 1).
  • a pseudo label of the category to which the target object 1021 belongs can be attached (for example, the category to which the target object 1021 belongs is "person", or the category to which it belongs can be further refined, for example, the category to which the target object 1021 belongs is female), and at the same time, further A candidate box label can also be attached (the candidate box label is used to identify the position information of the target object 1021).
  • FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Figure 2, the method may specifically include the following steps:
  • Step S201 Obtain an unlabeled sample image, which includes at least one target object.
  • the unlabeled sample image may include multiple target objects or a single target object.
  • the target object can be a person, a dog, a cat, a flower, etc.
  • the categories to which different target objects belong can be the same or different (for example, cats and dogs both belong to animals, and flowers belong to plant).
  • the unlabeled sample image can be classified in advance. Specifically, the classification can be performed according to the category to which the target object in the unlabeled sample image belongs. For example, unlabeled sample images with the target object "people" can be classified as class I images, unlabeled sample images with the target object "animals” can be classified as class II images, and images with the target object "plants” can be classified as class II images.
  • the unlabeled sample images can be classified into three categories of images. In addition, the category can be divided into broad categories and more detailed sub-categories according to the actual situation.
  • the different target objects may be of the same category.
  • the unlabeled sample image includes two target objects, and both target objects are women.
  • different target objects can also be of different categories.
  • an unlabeled sample image includes three target objects. The first target object is a man, the second target object is a woman, and the third target object is a cat. , then the first target object and the second target object both belong to the category "human" (but their "gender" is different, that is, they belong to different subcategories), while the third target object belongs to the category "animal".
  • Step S202 Perform N rounds of target detection on the unlabeled sample image to obtain a pseudo label data set corresponding to each target object in the unlabeled sample image.
  • the pseudo label data set includes the pseudo label data set corresponding to the target object in each round of target detection.
  • Label, N is a positive integer greater than 1.
  • pseudo-labels may include category labels and other labels used to describe the characteristics of the target object.
  • the value of N can be 3-5 rounds.
  • the unlabeled sample image includes multiple target objects of different categories.
  • the unlabeled sample image is input to, for example, a teacher model to perform N rounds of target detection. For each target object in the unlabeled sample image, After each round of object detection there is a corresponding pseudo label.
  • the target object "person” is obtained in the Nth round of target detection.
  • the corresponding pseudo-label was obtained, and the target object "cat” also obtained the corresponding pseudo-label in the Nth round of target detection.
  • the pseudo labels obtained can be used as the following set means that in the Nth round of target detection, the pseudo labels obtained can be used as a set express.
  • the pseudo-label data set corresponding to the target object "person” is The pseudo-label data set corresponding to the target object "cat” is
  • the teacher model can be used to perform N rounds of target detection and generate the prediction results (ie, pseudo labels) of each target object in each round.
  • prediction results include the category to which the target object belongs.
  • step S302 For specific examples of the target object, reference may also be made to FIG. 3 and the related description in step S302 below.
  • Step S203 Determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set.
  • the pseudo labels of the target objects obtained in each round of target detection can be compared with each other to obtain the confidence of the pseudo labels.
  • Confidence can be understood as a measure of the similarity of the pseudo-labels determined in each round of target detection. The lower the value, the lower the similarity.
  • the value range can be [0, 1].
  • the pseudo label data set corresponding to the target object "people” is The pseudo-label data set corresponding to the target object "cat” is At this time, the pseudo labels obtained by the target object "person” in each round can be compared, that is, each element in the pseudo label data set can be compared, that is, Whether they are the same or not, it is found through comparison that the category labels of the two are different (for example, some category labels are male and the rest are female), that is, it can be determined that there is a difference in the category labels corresponding to the target object, and the confidence level is not 1; if are all the same, that is, the pseudo labels obtained by the target object in each round of detection are the same, and the confidence level is 1.
  • Step S204 If the confidence level of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than the preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image.
  • the unlabeled sample image After obtaining the confidence of the pseudo-label corresponding to each target object in the unlabeled sample image, it can be determined based on the confidence whether the unlabeled sample image is the target unlabeled sample image.
  • step S201 there may be one target object or multiple target objects in the unlabeled sample image.
  • a target object it can be determined whether the confidence of the pseudo label corresponding to the target object is greater than the preset threshold. If it is greater than the preset threshold, it means that the unlabeled sample image is the target unlabeled sample image; when there are multiple targets object, it can be judged whether the confidence of the pseudo label corresponding to each target object is greater than the preset threshold. If the confidence of the pseudo label corresponding to at least one target object among the multiple target objects is greater than the preset threshold, it can be determined that the pseudo label corresponding to the target object is greater than the preset threshold.
  • the unlabeled sample image is the target unlabeled sample image.
  • the embodiment of this application performs N rounds of target detection on unlabeled sample images to obtain the pseudo labels corresponding to each target object in each round of target detection, and then determines each target object based on the pseudo labels obtained in each round.
  • the confidence of the pseudo-label corresponding to each target object select the unlabeled sample image corresponding to the pseudo-label with high confidence as the target unlabeled sample image, and use the target unlabeled sample image as the final sample used to train the student model, Effectively filter out noise information to improve target detection performance.
  • image samples can also be processed in the form of image data sets, which can include one or more unlabeled sample images. This process will be described below with reference to Figure 3. It can be understood that in the following embodiments, there may be more than one unlabeled sample image in the image data set, and the processing process for each unlabeled sample image is similar.
  • FIG 3 is a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Figure 3, the method may specifically include the following steps:
  • Step S301 Obtain an image data set.
  • the image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object.
  • each unlabeled sample image in the image data set may include multiple target objects or a single target object.
  • the target object may be a person, a dog, a cat, a flower, etc.
  • Different target objects belong to different categories (for example, cats and dogs belong to animals, and flowers belong to plants).
  • the unlabeled sample images in the image data set can be classified in advance. Specifically, this can be done according to the category to which the target object in the unlabeled sample image belongs. Classification. For example, unlabeled sample images with the target object "people" are classified as Class I images, unlabeled sample images with the target object "Animals” are classified as Class II images, and images with the target object "Plants" are classified as Class II images. Unlabeled sample images of objects are classified into three categories of images. In addition, the category can be divided into broad categories and more detailed sub-categories according to the actual situation.
  • the target object in the unlabeled sample image T1 and the unlabeled sample image T2 is a woman
  • the target object in the unlabeled sample image T3 is a dog
  • the target object in the unlabeled sample image T4 is a flower.
  • the unlabeled sample image T1 and the unlabeled sample image T2 are classified into the category " woman”
  • the unlabeled sample image T3 is classified into the category "dog”
  • the unlabeled sample image T4 is classified into the category "plant”.
  • a type of image. Unlabeled sample images of different types are distinguished and then separately subjected to subsequent target detection to obtain corresponding detection results.
  • the unlabeled sample image may also include multiple target objects, and the different target objects may be of the same category.
  • the unlabeled sample image may include two target objects, and the two targets The subjects are all women.
  • different target objects can also be of different categories.
  • an unlabeled sample image includes three target objects. The first target object is a man, the second target object is a woman, and the third target object is a cat. , then the first target object and the second target object both belong to the category "human" (but their "gender" is different, that is, they belong to different subcategories), while the third target object belongs to the category "animal".
  • Step S302 Perform N rounds of target detection on each unlabeled sample image in the image data set to obtain a pseudo-label data set corresponding to each target object in each unlabeled sample image.
  • the pseudo-label data set includes the pseudo-labels corresponding to the target objects in each round of object detection.
  • the pseudo labels include at least category labels, and N is a positive integer greater than 1. For example, the value of N is 3-5 rounds.
  • the unlabeled sample image includes multiple target objects of different categories.
  • the unlabeled sample image is input to, for example, a teacher model to perform N rounds of target detection. For each target object in the unlabeled sample image, There is a corresponding pseudo-label after each round of object detection.
  • the target object "person” is obtained in the Nth round of target detection.
  • the corresponding pseudo-label was obtained, and the target object "cat” also obtained the corresponding pseudo-label in the Nth round of target detection.
  • the pseudo labels obtained can be used as the following set means that in the Nth round of target detection, the pseudo labels obtained can be used as a set express.
  • the pseudo-label data set corresponding to the target object "person” is The pseudo-label data set corresponding to the target object "cat” is
  • the teacher model can be used to perform N rounds of target detection and generate the prediction results (ie, pseudo labels) of each target object in each round.
  • prediction results include the category to which the target object belongs.
  • Figure 4 is a schematic diagram of the teacher model prediction process provided by the embodiment of the present application. As shown in Figure 4, the teacher model 40 can be used to perform target detection on the unlabeled sample image A, and record each of the unlabeled sample images A. Round prediction results ( Figure 4 includes N round results).
  • the target object corresponds to a category label, where the category label indicates that the target object is female. .
  • the category label corresponding to the target object indicates that the target object is male.
  • Step S303 Determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set.
  • the pseudo labels of the target objects obtained in each round of target detection can be compared with each other.
  • the prediction results 41 of the target object obtained in the first round of target detection are compared with each other.
  • the category label (the category label indicates female) is compared with the category label of the target object in the prediction result 42 obtained in the Nth round of target detection (the category label indicates male).
  • the category labels of the two are different. (That is, one category label is male and the other category label is female), that is, it can be determined that there is a difference in the category label corresponding to the target object, and the confidence level is not 1.
  • the value range of the confidence level can be [0, 1].
  • the confidence level is 1. If the pseudo labels obtained by the target object in each round of detection are the same, the confidence level is 1. If the pseudo-labels are all different, the confidence level is 0. Confidence can be understood as a measure of the similarity of the pseudo-labels determined in each round of target detection. The lower the value, the lower the similarity.
  • Step S304 Determine the target unlabeled sample image in the image data set according to the confidence level, where the confidence level that there is at least one pseudo label corresponding to the target object in the target unlabeled sample image is greater than a preset threshold.
  • the preset threshold value is 1
  • the unlabeled sample image A where the confidence of the pseudo-label of the target object obtained in N rounds of target detection is less than the preset threshold (for example, the preset threshold value is 1), it will are filtered out and will not be used as target unlabeled sample images.
  • Figure 5 is a schematic diagram of the target detection results provided by another embodiment of the present application.
  • the teacher model 40 performs N rounds of target detection on the unlabeled sample image B and finds that the results obtained in the first round of target detection are The category label corresponding to the target object included in the prediction result 51 is female, and the category label corresponding to the target object included in the prediction result 52 obtained in the Nth round of target detection is also female. Since the pseudo object corresponding to the target object obtained in the first round of target detection is The pseudo labels corresponding to the target object obtained from the label to the Nth round of target detection are all the same. Therefore, the confidence level of the pseudo label corresponding to the target object can be given to 1.
  • the unlabeled sample image B can be used as the target unlabeled Label sample image.
  • the target object "person” corresponds to There is a confidence level for the pseudo-label, and there is also a confidence level for the pseudo-label corresponding to the target object "dog”.
  • this unlabeled sample image can be used as the target unlabeled sample image.
  • the unlabeled sample image C can still be used as the target unlabeled sample image.
  • the target unlabeled sample image is used as training data to train the student model.
  • Its network parameters are obtained through the gradient descent of the loss function, and the parameters of the teacher model are updated based on the parameters of the student model.
  • the parameters of the teacher model can be updated through the Exponential Moving Average (EMA) strategy.
  • EMA Exponential Moving Average
  • the target unlabeled sample image when using the target unlabeled sample image to train the student model, if the confidence of the category label corresponding to a certain target object in the target unlabeled sample image is less than the preset threshold (such as the above-mentioned unlabeled sample image C), then This target object will be eliminated during training, and only target objects whose category label confidence is greater than the preset threshold (such as "person" in unlabeled sample image C) will be retained.
  • the preset threshold such as the above-mentioned unlabeled sample image C
  • the embodiment of this application performs N rounds of target detection on unlabeled sample images to obtain the pseudo labels corresponding to each target object in each round of target detection, and then determines each target object based on the pseudo labels obtained in each round.
  • the confidence of the pseudo-label corresponding to each target object is used, and the unlabeled sample image corresponding to the pseudo-label with high confidence is used as the final sample for training the student model, which effectively filters out the noise information to improve the target detection performance.
  • step S301 when performing the above step S301, it can be implemented through the following steps: perform N rounds of target detection on all unlabeled sample images, and obtain each target object in each unlabeled sample image in each round.
  • the attribute categories determined in target detection based on the attribute categories determined in each round of target detection for each target object in each unlabeled sample image, determine the attributes of each target object in each unlabeled sample image.
  • the corresponding category label in each round of target detection according to the category label corresponding to each target object in each unlabeled sample image in each round of target detection, determine the corresponding category label of each target object in each unlabeled sample image. Pseudo-labeled dataset.
  • both unlabeled sample image A and unlabeled sample image B only have Single target object "person".
  • unlabeled sample image A determines in the first round of target detection that the attribute category corresponding to the target object "person” is "female" (the attribute category here refers to the gender of "person”).
  • a category label "female” can be attached to the target object determined in the first round of target detection.
  • the attribute category corresponding to the target object "person” is "male”.
  • a category label "male” can be attached to the target object determined in the Nth round of target detection.
  • the category of the target object "person” is female, then the corresponding category label is "female", and the detection results obtained in the Nth round of target detection are In result 52, the category to which the target object "person” belongs is still female, and the corresponding category label remains unchanged and is still "female".
  • N rounds of target detection are performed on unlabeled sample images.
  • Each round of target detection can obtain a prediction result for each target object in the unlabeled sample image, thereby determining the accuracy of each target object in each round.
  • the category label in target detection, as well as the confidence of the category label can determine the reliability of the category label based on the confidence and reduce the noise of pseudo-label information.
  • the pseudo labels may also include candidate box labels. Therefore, the above-mentioned step S302 may specifically include the following steps: perform N rounds of target analysis on all unlabeled sample images. Detection, obtain the position information of each target object in each unlabeled sample image in each round of target detection in the unlabeled sample image; according to the position information of each target object in each unlabeled sample image in each round of target detection The position information in the unlabeled sample image during detection is used to determine the candidate box label corresponding to each target object in each unlabeled sample image in each round of target detection; according to each target in each unlabeled sample image The corresponding category labels and candidate box labels of the objects in each round of target detection are used to determine the pseudo-label data set corresponding to each target object in each unlabeled sample image.
  • Figure 6 is a schematic diagram of the target detection result provided by another embodiment of the present application.
  • a target object "person” in the unlabeled sample image 61.
  • the teacher model 40 there is a target object "person”.
  • a pseudo label can be attached to the target object.
  • pseudo labels can include category labels and candidate box labels.
  • the category label can be used to indicate the category to which the target object belongs (for example, the category to which the target object belongs is female), and the candidate box label is used to indicate the position of the target object in the unlabeled sample image 61.
  • the candidate box label is in the form of It can be (Z1, Z2, Z3, Z4), where Z1, Z2, Z3, and Z4 respectively represent the position coordinates of the four corners of the dotted box in the prediction result 62.
  • the unlabeled sample image may include multiple target objects.
  • the unlabeled sample image may include people and animals (such as cats, dogs, etc.).
  • Figure 7 is a schematic diagram of the results of target detection provided by another embodiment of the present application.
  • the unlabeled sample image 70 includes a female target object and a male target object, and the teacher model 40
  • the unlabeled sample image 70 is subjected to target detection and a prediction result 71 is obtained.
  • the prediction result 71 includes pseudo-labels of female target objects and pseudo-labels of male target objects, and the pseudo-labels specifically include category labels and candidate box labels.
  • the category label is used to indicate the gender of the target object
  • the candidate box label is used to indicate the position of the target object in the unlabeled sample image.
  • the embodiment of this application divides the pseudo labels into two categories: category labels and candidate box labels, and filters the target unlabeled sample images based on the confidence of the two labels corresponding to the target object, so that a more reliable target unlabeled sample image can be obtained. Reduce the interference of noise information, thereby further improving the target detection performance of the student model obtained by subsequent training.
  • the category label corresponding to the target object based on all the pseudo-labels of each target object in the pseudo-label data set.
  • the confidence level and the confidence level corresponding to the candidate box label of the target object are necessary to determine the category label corresponding to the target object based on all the pseudo-labels of each target object in the pseudo-label data set.
  • each target object has its corresponding pseudo label.
  • the pseudo label corresponding to the male target object in Figure 7 includes the category label "male” and the candidate box label "(Z1' , Z2', Z3', Z4')", each label corresponds to a confidence level.
  • the pseudo-labels obtained by the target object in each round of target detection can be compared with each other to determine the confidence of the pseudo-label corresponding to the target object.
  • the gender indicated by the category label corresponding to the male target object in each round of target detection can be determined first. .
  • the corresponding category label is "male” in the second round of target detection.
  • the corresponding category label is "male”.
  • the corresponding category label is "female”, then comparing the category labels obtained in the three rounds, the probability that the category label is "male” is 2/3, and the probability that the category label is "female” is 1/3. Taking the maximum probability as the confidence of the category label, the confidence of the category label corresponding to the male target object is 2/3.
  • FIG. 8 is a schematic diagram of the results of target detection provided by yet another embodiment of the present application.
  • the unlabeled sample image 70 includes a first target object 701 and a second target object 702 .
  • the teacher model 40 performs three rounds of target detection on the unlabeled sample image 70 to obtain the first pseudo-label image 81, the second pseudo-label image 82, and the third pseudo-label image 83 respectively.
  • the first target object 701 detected in the first pseudo-label image 81 includes a category label (i.e., "female") and a candidate box label (i.e., (Z1", Z2", Z3", Z4")), and the second target The object 702 includes a category label (ie, "male”) and a candidate box label (ie, (Z1', Z2', Z3', Z4')).
  • the first target object 701 detected in the second pseudo-label image 82 includes a category label (i.e., “female”) and a candidate box label (i.e., (Z1′′, Z2′′, Z3′′, Z4′′)).
  • the second target object 702 includes category labels (i.e.
  • the first target object 701 detected in the third pseudo-label image 83 includes a category label (i.e., “male”) and a candidate box label (i.e., (Z1”, Z2”, Z3”, Z4”)), and the second target object 702 includes category labels (i.e., "male”) and candidate box labels (i.e., (Z1"', Z2"', Z3"', Z4"')).
  • the category labels of the same target detected in each round are compared to determine the confidence of the category label of the target.
  • the category label of the first target object 701 in the first pseudo label image 81 can be combined with the category label of the first target object 701 in the second pseudo label image 82 and the first target in the third pseudo label image 83 The category labels of the objects 701 are compared respectively.
  • the category label of the first target object 701 in the third pseudo-label image 83 is "male"
  • the category label of the first target object 701 in the first pseudo-label image 81 is
  • the category label and the category label of the first target object 701 in the second pseudo-label image 82 are both “female”, that is, the category label of the first target object 701 in the third pseudo-label image 83 is the same as the category label in the first pseudo-label image 81
  • the category label of the first target object 701 and the category label of the first target object 701 in the second pseudo-label image 82 are both different.
  • the confidence of the category label of the first target object 701 can be determined.
  • the confidence of the candidate box label corresponding to the male target object in Figure 7 can also be calculated.
  • the candidate box labels corresponding to the male target object in each round of target detection can be obtained from the pseudo-label data set. Based on the coordinate information in the candidate box labels, the intersection ratio between each two candidate areas can be calculated and the average value can be obtained. Determine the confidence of the candidate box label.
  • the coordinate information of the candidate box label corresponding to the second target object 702 in the pseudo-label image 81 is (Z1', Z2' , Z3', Z4')
  • the coordinate information of the candidate box label corresponding to the second target object 702 is (Z1', Z2', Z3', Z4')
  • the second target object in the pseudo label image 83 The coordinate information of the candidate box label corresponding to the target object 702 is (Z1"', Z2"', Z3"', Z4"').
  • the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 81 is the same as the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 82 (both are (Z1', Z2', Z3', Z4')), that is, the candidate frame areas completely overlap, and the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 81, the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 82
  • the coordinate information of the candidate box label is different from the coordinate information of the candidate box label corresponding to the second target object 702 in the pseudo label image 83, and only partially overlaps.
  • the second target frame label in the pseudo label image 81 is 1, the second target frame label in the pseudo label image 81
  • the coincidence degree between the position area of the candidate frame label of the target object 702 and the position area of the candidate frame label of the second target object 702 in the pseudo label image 83 is 0.5.
  • the position area of the candidate frame label of the second target object 702 in the pseudo label image 82 is 0.5.
  • the confidence of the category label corresponding to each target object it is determined based on the category label obtained by the target object in each round of target detection, and when determining the candidate box label corresponding to each target object The confidence level is determined based on the candidate box label obtained by the target object in each round of target detection.
  • the embodiment of the present application can quickly determine the confidence of the pseudo label corresponding to each target object, and use the confidence as the reliability of the pseudo label. property, thereby effectively filtering out the false label noise existing in unlabeled sample images and improving screening efficiency.
  • the target object can be determined by comparing the category labels obtained by the target object in each round of target detection with each other. The confidence of the corresponding pseudo label.
  • the reliability of the category labels of the target objects is determined one by one for all rounds of detected target objects.
  • the category labels of the detected m target objects are expressed as In the e+nth round, for the unlabeled sample image x u , the category labels of the m target objects detected are expressed as Compare the category labels of these n+1 rounds and count the changes in the category labels of each target object.
  • m represents the m-th target object in the unlabeled sample image x u
  • m is a positive integer greater than or equal to 1
  • e is a positive integer greater than or equal to 1
  • n is a positive integer greater than or equal to 1.
  • the embodiment of the present application can determine the similarity of the category label by comparing the category labels of the targets obtained in each round of target detection. The higher the similarity, the higher the reliability of the category label. Based on this, it can be accurately screened out Which unlabeled sample images can be used as training samples for subsequent student models to improve the reliability of training samples.
  • the target can be determined based on the candidate box label corresponding to the target object in the pseudo label image obtained in each round of target detection.
  • the confidence of the pseudo-label corresponding to the object Specifically, the position of the target object obtained in each round of target detection in the unlabeled sample image can be determined based on the candidate frame label of the target obtained in each round of target detection, and the position of the target object obtained in each round of target detection in the unlabeled sample image can be obtained.
  • the position coincidence degree in the label sample image; based on the coincidence degree, the confidence of the candidate box label of the target object is obtained.
  • the position of the target object in the unlabeled sample image in each round of target detection is obtained, based on In each round of target detection, the position of the target object in the unlabeled sample image is used to determine the confidence of the candidate box pseudo-label of the target object one by one.
  • the candidate box labels corresponding to the detected m target objects are expressed as:
  • the candidate box labels corresponding to the detected m target objects are expressed as:
  • m represents the m-th target object in the unlabeled sample image
  • m represents the m-th target object in the unlabeled sample image x u
  • m is a positive integer greater than or equal to 1
  • e is a positive integer greater than or equal to 1
  • n is greater than or equal to A positive integer of 1.
  • the following steps can be implemented: extract the first unlabeled sample in the image data set The image is determined as the target unlabeled sample image.
  • the first unlabeled sample image is that the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset first category label threshold, and the candidate box label corresponding to at least one target object is greater than the preset first category label threshold. Images with confidence greater than the preset first candidate box label threshold.
  • the teacher model can first be used to predict the target unlabeled sample image to determine the pseudo labels in the target unlabeled sample image (these pseudo labels have undergone the aforementioned verification, and their reliability higher), and then participate in the training of the student model.
  • the preset first category label threshold can be 0.8
  • the preset first candidate box label threshold can be The value is 0.7.
  • the unlabeled sample Image X can be used as the target unlabeled sample image.
  • the unlabeled sample image The confidence of the candidate box label is 0.8.
  • the target object "person” does not meet the above-mentioned first category label threshold.
  • the confidence of the category label corresponding to the target object "dog” is 0.9, and the confidence of the candidate box label corresponding to the target object "dog” is 0.8, that is, the target object "dog" satisfies the above first category label threshold and the first candidate box label threshold, then the unlabeled sample image X can be used as the target unlabeled sample image.
  • the second unlabeled sample image can also be screened out from the image data set; then the third unlabeled sample image can be screened out from the second unlabeled sample image, and the third unlabeled sample image can be determined as the target. Unlabeled sample images.
  • the second unlabeled sample image is an unlabeled sample image in which the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset second category label threshold
  • the third unlabeled sample image is An image in which the confidence of the candidate box label corresponding to at least one target object among all target objects in the second unlabeled sample image is greater than the preset second candidate box label threshold.
  • the number of target objects in the unlabeled sample image X and unlabeled sample image X1 is not limited, for example, unlabeled samples
  • the image X includes two target objects (such as "person” and "dog"), and the unlabeled sample image X1 includes one target object (such as "flower”).
  • the unlabeled sample image X can be used as the second unlabeled sample image.
  • the unlabeled sample image X will be used as the target unlabeled sample image.
  • the embodiment of this application filters unlabeled sample images by setting category label thresholds and candidate box label thresholds, which can screen out unlabeled sample images with better reliability, reduce pseudo-label noise in target unlabeled sample images, and improve sample quality. Extraction accuracy.
  • Figure 9 is a schematic flowchart of a model training method provided by an embodiment of the present application. This method can be applied to computer equipment. As shown in Figure 9, the method may specifically include the following steps: Step S901, obtain unlabeled sample images and labeled sample images. The number of unlabeled sample images is greater than the number of labeled sample images. Step S902: Perform N rounds of target detection on the unlabeled sample image, and obtain pseudo labels corresponding to the target objects obtained in each round of target detection. Pseudo labels include at least one of category labels and candidate box labels, and N is a positive integer greater than 1.
  • Step S903 Based on the pseudo labels corresponding to the target objects obtained in each round of target detection, extract the target unlabeled sample images from the unlabeled sample images.
  • Step S904 Train the student model based on the target sample image and the labeled sample image. Among them, the student model obtained after training is used to update the teacher model.
  • the required samples include labeled sample images with real labels and target unlabeled sample images with pseudo labels. Among them, the number of labeled sample images is less than the target unlabeled sample images with pseudo labels.
  • a strong enhancement method (operation) can also be used to strongly enhance the target unlabeled sample image.
  • the strong enhancement method can specifically include color transformation, random elimination, color filling and other methods.
  • the overall training loss is calculated when training the student model. The loss function is as follows:
  • N l is the number of labeled sample images in this batch
  • N u is the number of target unlabeled sample images
  • Represents the b-th labeled sample image represents the label corresponding to the b-th labeled sample image
  • Represents the b-th target unlabeled sample image Represents the pseudo label of the b-th target unlabeled sample image.
  • represents the classification loss represents the bounding box regression loss
  • ⁇ u is a hyperparameter used to balance the weight of the unsupervised loss.
  • steps S201-S204 or steps S301-S304 can be executed to extract target unlabeled sample images, and the student model can be trained together with the obtained labeled sample images. It should be noted that the embodiment of the present application does not limit the order in which the target unlabeled sample image is extracted and the labeled sample image is acquired.
  • the embodiments of this application can improve the target detection performance of the trained student model by using unlabeled sample images carrying highly reliable pseudo-labels and combining them with a small number of labeled sample images for training.
  • Figure 10 is a schematic diagram of the end-to-end semi-supervised target detection framework training provided by the embodiment of this application.
  • unlabeled sample images enter the teacher model and the student model through the strong enhancement method and the weak enhancement method respectively, and the prediction results are generated through the teacher model and the student model respectively.
  • strong enhancement methods include color transformation, random elimination, color Color filling and weak enhancement methods include image shearing, rotation/reflection/flip transformation, scaling transformation, translation transformation, scale transformation, etc.
  • the prediction results generated by the student model are directly involved in the unsupervised loss calculation, while the prediction results generated by the teacher model can be recorded, and multiple rounds of target detection are performed on the weakly enhanced unlabeled sample images through the teacher model.
  • Obtain the prediction results of each round of target detection including the pseudo labels corresponding to each target object in the sample image with and without labels, and then calculate the pseudo labels corresponding to each target object based on the prediction results of each round of target detection.
  • unlabeled sample images that meet the confidence threshold requirements are selected, and the prediction results generated by the student model are combined for unsupervised loss calculation.
  • the parameters of the student model can be used to update the parameters of the teacher model through the EMA strategy.
  • the image processing method and model training method provided by the embodiments of this application can be applied to semi-supervised target detection tasks.
  • the pseudo-label corresponding to each target object in the unlabeled sample image is calculated. Confidence, and then based on its confidence, false labels with noise are screened out, improving the utilization of unlabeled sample images in semi-supervised target detection methods, thereby improving the performance of semi-supervised target detection.
  • FIG 11 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device 1100 includes: an image acquisition module 1101, a label data set acquisition module 1120, a confidence determination module 1130 and an image determination module. 1140.
  • the image acquisition module 1101 is used to acquire an unlabeled sample image, wherein the unlabeled sample image includes at least one target object;
  • the label data set acquisition module 1120 is used to perform N rounds of target detection on the unlabeled sample image, and obtain The pseudo label data set corresponding to each target object in the unlabeled sample image.
  • the pseudo label data set includes the pseudo label corresponding to the target object in each round of target detection.
  • the N is a positive number greater than 1.
  • the confidence determination module 1130 is used to determine the confidence of the pseudo label corresponding to each target object according to the pseudo label in the pseudo label data set; the image determination module 1140 is used to determine if there is at least one in the unlabeled sample image. If the confidence of the pseudo label corresponding to a target object is greater than the preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image.
  • the image acquisition module 1100 may be specifically configured to: acquire an image data set, where the image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object.
  • the label data set acquisition module 1120 may be configured to: for each unlabeled sample image in the image data set, perform N rounds of target detection on the unlabeled sample image to obtain the unlabeled sample image. Label the pseudo-label dataset corresponding to each target object in the sample image.
  • the pseudo labels include category labels; the label data set acquisition module 1120 can be specifically configured to perform N rounds of target detection on the unlabeled sample images to obtain each target in the unlabeled sample images.
  • the pseudo labels also include candidate box labels
  • the label data set acquisition module 1120 can also be used to: perform N rounds of target detection on the unlabeled sample images to obtain each of the unlabeled sample images.
  • the position information of each target object in the unlabeled sample image in each round of target detection; according to the position information of each target object in each unlabeled sample image in the unlabeled sample in each round of target detection Based on the position information in the image, determine the candidate box label corresponding to each target object in each unlabeled sample image in each round of target detection; according to the target object in each unlabeled sample image in each round of target
  • the corresponding category labels and candidate box labels in the detection are used to determine the pseudo label data set corresponding to each target object in the unlabeled sample image.
  • the confidence determination module 1130 may be configured to: based on all pseudo labels of each target object in the pseudo label data set, determine the confidence corresponding to the category label of the target object and the The confidence corresponding to the candidate box label of the target object.
  • the image determination module 1140 may be configured to: determine the first unlabeled sample image in the image data set as the target unlabeled sample image, wherein the first unlabeled sample image is each unlabeled sample image.
  • the confidence of the category label corresponding to at least one target object among all target objects in the label sample image is greater than the preset first category label threshold, and the confidence of the candidate box label corresponding to the at least one target object is greater than the preset first candidate Image with box label thresholding.
  • the image determination module 1140 may be configured to: filter out a second unlabeled sample image from the image data set, where the second unlabeled sample image is at least one of all target objects in the unlabeled sample image.
  • An unlabeled sample image whose category label corresponding to a target object has a confidence greater than the preset second category label threshold; filter out a third unlabeled sample image from the second unlabeled sample image, and add the third unlabeled sample image to the The sample image is determined to be the target unlabeled sample image, and the third unlabeled sample image is a candidate box label whose confidence level corresponding to at least one target object among all target objects in the second unlabeled sample image is greater than the preset second Image of candidate box label threshold.
  • the image determination module 1140 may be configured to: filter out a fourth unlabeled sample image from the image data set, where the fourth unlabeled sample image is at least one of all target objects corresponding to the unlabeled sample image.
  • the image is determined to be the target unlabeled sample image, and the fifth unlabeled sample image is a category label corresponding to at least one target object among all target objects corresponding to the fourth unlabeled sample image.
  • the confidence level of the category label is greater than the preset second Image with category label threshold.
  • the image acquisition module 1101 is also used to: acquire labeled sample images;
  • the image processing 1100 device further includes: a training module (not shown), configured to train the student model according to the target sample image and the labeled sample image.
  • a training module (not shown), configured to train the student model according to the target sample image and the labeled sample image.
  • the training module can be specifically used to: use the teacher model to perform N rounds of target detection on the unlabeled sample images; the training module can also be used to use the trained student model to detect the target based on the exponential moving average EMA strategy.
  • the parameters of the teacher model are updated.
  • the training module can also be used to perform strong enhancement on the target unlabeled sample image using a strong enhancement operation, where the strong enhancement operation includes at least one of the following operations: color transformation, random elimination, and color filling.
  • the device provided by the embodiment of the present application can be used to perform the method in the embodiment shown in Figure 2. Its implementation principles and technical effects are similar and will not be described again here.
  • FIG 12 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device 1200 includes an image data set acquisition module 1210, a label data set acquisition module 1220, a confidence determination module 1230 and an image determination module. Module 1240.
  • the image data set acquisition module 1210 is used to acquire an image data set.
  • the image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object.
  • the label data set acquisition module 1220 is used to perform N rounds of target detection on each unlabeled sample image in the image data set, and obtain a pseudo label data set corresponding to each target object in each unlabeled sample image.
  • the pseudo-label data set includes pseudo-labels corresponding to the target object in each round of target detection.
  • the pseudo-labels at least include category labels, and N is a positive integer greater than 1.
  • the confidence determination module 1230 is configured to determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set.
  • the image determination module 1240 is used to determine the target unlabeled sample image in the image data set according to the confidence level. Wherein, the confidence that there is at least one pseudo-label corresponding to the target object in the target unlabeled sample image is greater than a preset threshold.
  • the above-mentioned label data set acquisition module 1220 can be specifically configured to: perform N rounds of target detection on all unlabeled sample images, and obtain the results of each target object in each unlabeled sample image in each round of target detection. Determined attribute categories; Based on the attribute categories determined in each round of target detection for each target object in each unlabeled sample image, determine the target status of each target object in each unlabeled sample image in each round. The corresponding category label in the detection; according to the category label corresponding to each target object in each unlabeled sample image in each round of target detection, determine the pseudo label data corresponding to each target object in each unlabeled sample image. set.
  • the above-mentioned label data set acquisition module 1220 can also be used to: perform N rounds of target detection on all unlabeled sample images, and obtain the target in each unlabeled sample image.
  • the confidence determination module 1230 may be configured to: determine the confidence corresponding to the category label of the target object and the candidate box label corresponding to the target object based on all pseudo labels of each target object in the pseudo label data set. Confidence.
  • the image determination module 1240 may be specifically configured to determine the first unlabeled sample image in the image data set as the target unlabeled sample image.
  • the first unlabeled sample image is that the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset first category label threshold, and the candidate box label corresponding to at least one target object is greater than the preset first category label threshold. Images with confidence greater than the preset first candidate box label threshold.
  • the image determination module 1240 can be specifically configured to: filter out the second unlabeled sample image from the image data set; filter out the third unlabeled sample image from the second unlabeled sample image, and combine the third unlabeled sample image with the third unlabeled sample image.
  • the sample image is determined as the target unlabeled sample image.
  • the second unlabeled sample image is an unlabeled sample image in which the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset second category label threshold; the third unlabeled sample image is the third unlabeled sample image.
  • An image in which the confidence of the candidate box label corresponding to at least one target object among all target objects in the two unlabeled sample images is greater than the preset second candidate box label threshold.
  • the image determination module 1240 can be specifically configured to: filter out the fourth unlabeled sample image from the image data set; filter out the fifth unlabeled sample image from the fourth unlabeled sample image, and combine the fifth unlabeled sample image with the fifth unlabeled sample image. Sample graph The image is determined as the target unlabeled sample image.
  • the fourth unlabeled sample image is an image in which the confidence of the candidate box label corresponding to at least one target object among all target objects corresponding to the unlabeled sample image is greater than the preset second candidate box label threshold
  • the fifth unlabeled sample image is The confidence level of the category label corresponding to at least one target object among all target objects corresponding to the fourth unlabeled sample image is greater than the preset second category label threshold.
  • the device provided by the embodiment of the present application can be used to execute the method in the embodiment shown in Figure 3. Its implementation principles and technical effects are similar and will not be described again here.
  • Figure 13 is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • the model training device can be integrated in a computer device, or can be independent of the computer device and collaborate with the computer device to implement this solution.
  • the model training device 1300 includes an image acquisition module 1310, a target detection module 1320, an image acquisition module 1330 and a model training model 1340.
  • the image acquisition module 1310 is used to acquire unlabeled sample images and labeled sample images.
  • the number of unlabeled sample images is larger than that of labeled sample images.
  • the target detection module 1320 is used to perform N rounds of target detection on unlabeled sample images, and obtain pseudo labels corresponding to the target objects obtained in each round of target detection.
  • Pseudo labels include at least one of category labels and candidate box labels, and N is a positive integer greater than 1.
  • the image acquisition module 1330 is used to extract the target unlabeled sample image from the unlabeled sample image according to the pseudo label corresponding to the target object obtained in each round of target detection.
  • the model training module 1340 is used to train the student model based on the target sample image and the labeled sample image. Among them, the student model obtained after training is used to update the teacher model.
  • the device provided by the embodiment of the present application can be used to perform the method in the embodiment shown in Figure 9. Its implementation principles and technical effects are similar and will not be described again here.
  • each module of the above device is only a division of logical functions. In actual implementation, they can be fully or partially integrated into a physical entity, or they can also be physically separated. And these modules can all be implemented in the form of software calling through processing components; they can also all be implemented in the form of hardware; some modules can also be implemented in the form of software calling through processing components, and some modules can be implemented in the form of hardware.
  • the image acquisition module can be a separate processing element, or can be integrated into a chip of the above-mentioned device.
  • it can also be stored in the memory of the above-mentioned device in the form of program code, and processed by one of the above-mentioned devices.
  • the component calls and executes the functions of the above image acquisition module.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated together or implemented independently.
  • the processing element here may be an integrated circuit with signal processing capabilities.
  • each step of the above method or each of the above modules can be completed by instructions in the form of hardware integrated logic circuits or software in the processor element.
  • Figure 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1400 includes: at least one processor 1410, a memory 1420, a bus 1430 and a communication interface 1440.
  • the processor 1410, the communication interface 1430 and the memory 1420 complete communication with each other through the bus 1430.
  • Communication interface 1430 is used to communicate with other devices.
  • the communication interface includes a communication interface for data transmission and a display interface or operation interface for human-computer interaction.
  • the processor 1410 is configured to execute computer execution instructions stored in the memory 1420. Specifically, the processor 1410 can execute relevant steps in the image processing method described in the corresponding embodiments of FIG. 2 and FIG. 3.
  • the processor may be a central processing unit, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • the one or more processors included in the computer device may be the same type of processor, such as one or more CPUs; or they may be different types of processors, such as one or more CPUs and one or more ASICs.
  • Memory 1420 is used for Stores computer execution instructions.
  • the memory may include high-speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
  • Figure 15 is a schematic structural diagram of a computer device provided by another embodiment of the present application.
  • the computer device 1500 includes: at least one processor 1510, a memory 1520, a bus 1530 and a communication interface 1540.
  • the processor 1510, the communication interface 1540 and the memory 1520 complete communication with each other through the bus 1530.
  • Communication interface 1540 is used to communicate with other devices.
  • the communication interface includes a communication interface for data transmission and a display interface or operation interface for human-computer interaction.
  • the processor 1510 is configured to execute computer execution instructions stored in the memory 1520. Specifically, the processor 1510 can execute relevant steps in the model training method described in the corresponding embodiment of FIG. 9 above.
  • the processor may be a central processing unit, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • the computer equipment includes one or more processors, which can be the same type of processor, such as one or more central processing units (Central Processing Unit, CPU); or they can be different types of processors, such as one or more CPUs and one or more ASICs.
  • Memory 1520 is used to store computer execution instructions.
  • the memory may include high-speed random access memory (Random Access Memory, RAM) memory, and may also include non-volatile memory, such as at least one disk memory.
  • RAM Random Access Memory
  • Embodiments of the present application also provide a readable storage medium.
  • Computer instructions are stored in the readable storage medium.
  • the computer device executes the computer instructions, the computer device performs the image processing provided by the various embodiments described above. method or model training method.
  • Embodiments of the present application also provide a program product.
  • the program product includes computer instructions, and the computer instructions are stored in a readable storage medium.
  • At least one processor of the computer device can read the computer instructions from the readable storage medium, and the at least one processor executes the computer instructions so that the computer device implements the image processing method or model training method provided by the above-mentioned various embodiments.
  • An embodiment of the present application also provides a computer program, which can be executed by a processor of a computer device to implement the image processing method or model training method provided by the various embodiments above.
  • At least one refers to one or more, and “plurality” refers to two or more.
  • “And/or” describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally means that the related objects before and after are an “or” relationship; in the formula, the character “/” means that the related objects before and after are a “division” relationship.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • At least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple. indivual.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the present application are an image processing method and apparatus, and a device and a medium. The method comprises: acquiring an image data set, wherein the image data set comprises at least one label-free sample image, and each label-free sample image comprises at least one target object; performing N rounds of target detection on each label-free sample image in the image data set, so as to obtain a pseudo-label data set corresponding to each target object in the label-free sample image, wherein the pseudo-label data set comprises a pseudo label corresponding to the target object in each round of target detection, and the pseudo label comprises at least a category label, N being a positive integer greater than 1; according to the pseudo labels in the pseudo-label data set, determining the confidence level of the pseudo labels corresponding to each target object; and if at least one target object with the corresponding pseudo labels that have a confidence level greater than a preset threshold value for the confidence level is present in the label-free sample image, determining as a target label-free sample image the label-free sample image in the image data set. By means of the technical solution, a label-free sample image having pseudo labels with high reliability can be screened out.

Description

图像处理方法、装置、设备和介质Image processing methods, devices, equipment and media
本申请要求于2022年07月29日提交的申请号为202210904939.5、名称为“图像处理方法、装置、设备和介质”的中国专利申请的优先权,上述申请的内容通过引用并入本文。This application claims priority to the Chinese patent application with application number 202210904939.5 and titled "Image processing method, device, equipment and medium" submitted on July 29, 2022. The contents of the above application are incorporated herein by reference.
技术领域Technical field
本申请涉及深度学习技术领域,尤其涉及一种图像处理方法、装置、设备和介质。This application relates to the field of deep learning technology, and in particular, to an image processing method, device, equipment and medium.
背景技术Background technique
半监督学习(Semi-Supervised Learning,SSL)是模式识别和机器学习领域研究的重点问题,是监督学习与无监督学习相结合的一种学习方法。半监督学习使用大量的无标签样本并同时使用少量有标签样本,来进行模式识别工作。半监督学习能够在有效节约人工标注样本开销的同时,提升模型训练效果。Semi-Supervised Learning (SSL) is a key issue in the field of pattern recognition and machine learning. It is a learning method that combines supervised learning and unsupervised learning. Semi-supervised learning uses a large number of unlabeled samples and a small number of labeled samples to perform pattern recognition. Semi-supervised learning can effectively save the cost of manually labeling samples while improving the model training effect.
相关技术中,在使用无标签样本进行半监督学习时,通常会利用教师模型处理无标签样本以产生预测结果,然后基于预设好的阈值过滤低置信度的预测结果,产生用于指导待训练的学生模型的伪标签,然后基于伪标签和有标签样本对学生模型进行训练。In related technologies, when using unlabeled samples for semi-supervised learning, the teacher model is usually used to process the unlabeled samples to generate prediction results, and then filter low-confidence prediction results based on a preset threshold to generate guidance for training. pseudo-labels of the student model, and then train the student model based on the pseudo-labels and labeled samples.
发明内容Contents of the invention
本申请提供一种图像处理方法、装置、设备和介质。This application provides an image processing method, device, equipment and medium.
第一方面,本申请实施例提供一种图像处理方法,包括:In a first aspect, embodiments of the present application provide an image processing method, including:
获取无标签样本图像,其中,所述无标签样本图像包括至少一个目标对象;Obtaining an unlabeled sample image, wherein the unlabeled sample image includes at least one target object;
对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,所述伪标签数据集包括所述目标对象在每一轮目标检测中对应的伪标签,所述N为大于1的正整数;Perform N rounds of target detection on the unlabeled sample image to obtain a pseudo-label data set corresponding to each target object in the unlabeled sample image. The pseudo-label data set includes the target object in each round of target detection. The corresponding pseudo label in , the N is a positive integer greater than 1;
根据所述伪标签数据集中的伪标签确定所述每个目标对象对应的伪标签的置信度;Determine the confidence of the pseudo-label corresponding to each target object according to the pseudo-label in the pseudo-label data set;
若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像。If the confidence level of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than the preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image.
第二方面,本申请实施例提供一种图像处理装置,包括:In a second aspect, embodiments of the present application provide an image processing device, including:
图像获取模块,用于获取无标签样本图像,所述无标签样本图像包括至少一个目标对象;An image acquisition module, used to acquire unlabeled sample images, where the unlabeled sample images include at least one target object;
标签数据集获取模块,用于对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,所述伪标签数据集包括所述目标对象在每一轮目标检测中对应的伪标签,所述N为大于1的正整数;A label data set acquisition module, used to perform N rounds of target detection on the unlabeled sample image, and obtain a pseudo label data set corresponding to each target object in the unlabeled sample image, where the pseudo label data set includes the The pseudo label corresponding to the target object in each round of target detection, the N is a positive integer greater than 1;
置信度确定模块,用于根据所述伪标签数据集中的伪标签确定所述每个目标对象对应的伪标签的置信度;A confidence determination module, configured to determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set;
图像确定模块,用于若所述无标签样本图像中存在至少一个目标对象对应的伪标签 的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像。Image determination module, used if there is at least one pseudo label corresponding to the target object in the unlabeled sample image The confidence level is greater than the preset threshold, and the unlabeled sample image is determined to be the target unlabeled sample image.
第三方面,本申请实施例提供一种计算机设备,包括:处理器,以及与所述处理器通信连接的存储器;In a third aspect, embodiments of the present application provide a computer device, including: a processor, and a memory communicatively connected to the processor;
所述存储器存储计算机执行指令;The memory stores computer execution instructions;
所述处理器执行所述存储器存储的计算机执行指令,以实现上述的方法。The processor executes computer execution instructions stored in the memory to implement the above method.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,所述计算机指令被处理器执行时用于实现上述的图像处理方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium. Computer instructions are stored in the computer-readable storage medium. When the computer instructions are executed by a processor, they are used to implement the above image processing method.
第五方面,本申请实施例提供一种计算机程序产品,包括计算机指令,所述计算机指令被至少一个处理器执行时实现上述的图像处理方法。In a fifth aspect, embodiments of the present application provide a computer program product, which includes computer instructions that implement the above image processing method when executed by at least one processor.
第六方面,本申请实施例提供一种计算机程序,该计算机程序被处理器执行时,实现上述的图像处理方法。In a sixth aspect, embodiments of the present application provide a computer program that, when executed by a processor, implements the above image processing method.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理;The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application;
图1为本申请实施例提供的图像处理方法的场景示意图;Figure 1 is a schematic scene diagram of the image processing method provided by the embodiment of the present application;
图2为本申请实施例提供的图像处理方法的流程示意图;Figure 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图3为本申请实施例提供的图像处理方法的流程示意图;Figure 3 is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图4为本申请实施例提供的目标检测结果的示意图;Figure 4 is a schematic diagram of the target detection results provided by the embodiment of the present application;
图5为本申请另一实施例提供的目标检测结果的示意图;Figure 5 is a schematic diagram of a target detection result provided by another embodiment of the present application;
图6为本申请另一实施例提供的目标检测结果的示意图;Figure 6 is a schematic diagram of a target detection result provided by another embodiment of the present application;
图7为本申请另一实施例提供的目标检测的结果示意图;Figure 7 is a schematic diagram of the results of target detection provided by another embodiment of the present application;
图8为本申请又一实施例提供的目标检测的结果示意图;Figure 8 is a schematic diagram of the results of target detection provided by another embodiment of the present application;
图9为本申请实施例提供的模型训练方法的流程示意图;Figure 9 is a schematic flow chart of the model training method provided by the embodiment of the present application;
图10为本申请实施例提供的端到端半监督目标检测框架训练示意图;Figure 10 is a schematic diagram of the end-to-end semi-supervised target detection framework training provided by the embodiment of this application;
图11为本申请实施例提供的图像处理装置的结构示意图;Figure 11 is a schematic structural diagram of an image processing device provided by an embodiment of the present application;
图12为本申请实施例提供的图像处理装置的结构示意图;Figure 12 is a schematic structural diagram of an image processing device provided by an embodiment of the present application;
图13为本申请实施例提供的模型训练装置的结构示意图;Figure 13 is a schematic structural diagram of the model training device provided by the embodiment of the present application;
图14为本申请实施例提供的计算机设备的结构示意图;Figure 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application;
图15为本申请另一实施例提供的计算机设备的结构示意图。Figure 15 is a schematic structural diagram of a computer device provided by another embodiment of the present application.
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。Through the above-mentioned drawings, clear embodiments of the present application have been shown, which will be described in more detail below. These drawings and text descriptions are not intended to limit the scope of the present application's concepts in any way, but are intended to illustrate the application's concepts for those skilled in the art with reference to specific embodiments.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例 是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are part of the embodiments of this application, but not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
深度学习在计算机视觉、多媒体、图像处理等领域取得了巨大成功。然而深度学习的成功依赖于大规模神经网络的训练,而大规模神经网络的训练依赖大规模有标签数据集的驱动。在实际应用中,构建这种大规模有标签数据集用于有监督学习训练模型是耗时耗力的,同时标注过程过度依赖专家经验。而获取无标注的图像数据是非常容易、快速的。因此出现了半监督学习,半监督学习通过利用有标签数据提供模型训练的必要指导信息,结合大量无标签数据的补充信息,可以有效地对模型性能进行提升。半监督学习能够在有效节约人工标注开销的同时提升模型训练效果。Deep learning has achieved great success in fields such as computer vision, multimedia, and image processing. However, the success of deep learning relies on the training of large-scale neural networks, and the training of large-scale neural networks relies on the driving of large-scale labeled data sets. In practical applications, building such large-scale labeled data sets for supervised learning training models is time-consuming and labor-intensive, and the labeling process relies too much on expert experience. It is very easy and fast to obtain unlabeled image data. Therefore, semi-supervised learning has emerged. Semi-supervised learning can effectively improve model performance by using labeled data to provide necessary guidance information for model training, combined with supplementary information from a large amount of unlabeled data. Semi-supervised learning can effectively save the cost of manual annotation while improving the model training effect.
半监督学习的相关应用主要包括半监督图像分类和半监督目标检测任务。其中,半监督目标检测任务由于图像中目标更多,对无标签图片的应用更加困难和复杂,因此任务难度更高。在半监督目标检测模型训练过程中,伪标签(伪标签是对无标签样本进行分类后的目标类,在训练的时候可以像真正的标签一样使用它们)生成的准确率是方法取得优异性能的关键。目前半监督目标检测方法通常通过教师模型产生预测结果,采用预设定好的阈值过滤低置信度的候选框和类别,产生用于指导学生模型训练的伪标签。这种由单一教师模型依据固定阈值产生伪标签的方法,产生的伪标签信息往往包含大量的噪声信息,可靠性差,而且在后续用来训练学生模型时无法有效提高模型性能。Related applications of semi-supervised learning mainly include semi-supervised image classification and semi-supervised object detection tasks. Among them, the semi-supervised target detection task is more difficult and complex because there are more targets in the image, and the application of unlabeled images is more difficult and complex, so the task is more difficult. In the training process of the semi-supervised target detection model, the accuracy of pseudo labels (pseudo labels are target classes after classifying unlabeled samples, and they can be used like real labels during training) is the method that achieves excellent performance. The essential. Current semi-supervised target detection methods usually generate prediction results through the teacher model, use preset thresholds to filter low-confidence candidate boxes and categories, and generate pseudo labels used to guide student model training. In this method, a single teacher model generates pseudo labels based on a fixed threshold. The pseudo label information generated often contains a large amount of noise information, has poor reliability, and cannot effectively improve model performance when it is subsequently used to train a student model.
针对上述问题,本申请实施例提供了一种图像处理方法、装置、设备和介质,用于解决现有的半监督学习中无标签样本产生的伪标签容易存在噪声,可靠性差的问题。为了提高产生的伪标签信息的可靠性,需要对伪标签的产生过程进行优化。具体的,通过对无标签样本图像进行多轮次的预测,统计在无标签样本图像中的每个目标对象在每一轮次预测中对应的伪标签的置信度,筛选出置信度符合要求的伪标签,由此可以提高目标无标签样本图像中生成的伪标签的可靠性,减少噪音,提升半监督目标检测方法的性能。In response to the above problems, embodiments of the present application provide an image processing method, device, equipment and medium to solve the existing problem of pseudo labels generated from unlabeled samples in semi-supervised learning that are prone to noise and have poor reliability. In order to improve the reliability of the generated pseudo-label information, the pseudo-label generation process needs to be optimized. Specifically, by performing multiple rounds of prediction on unlabeled sample images, counting the confidence of the pseudo-label corresponding to each target object in the unlabeled sample image in each round of prediction, and screening out those whose confidence meets the requirements. Pseudo labels, which can improve the reliability of pseudo labels generated in target unlabeled sample images, reduce noise, and improve the performance of semi-supervised target detection methods.
图1为本申请实施例提供的图像处理方法的场景示意图,如图1所示,计算机设备100中的无标签样本图像集通常包括有若干张图像,每一张无标签样本图像都是没有通过人工贴标签的。以无标签样本图像中的一张图像101为例,其通过教师模型进行预测可以得到预测结果102(即将图像101输入到教师模型,通过教师模型进行目标检测后输出得到预测结果102),预测结果102中可以包括有目标对象1021(目标对象1021以人为例)以及标识出目标对象1021所处位置区域的候选框(即图1中的虚线框)。其中,可以给目标对象1021贴上一个所属类别的伪标签(例如目标对象1021所属类别为“人”,或者所属类别还可以更进一步的细化,例如目标对象1021所属类别为女性),同时进一步的还可以贴一个候选框标签(候选框标签于标识出目标对象1021的位置信息)。通过贴上伪标签,相当于实现了人工打标的过程,如此就减少了人工标注开销,而且也能达到有标签样本图像的效果。Figure 1 is a schematic diagram of a scene of an image processing method provided by an embodiment of the present application. As shown in Figure 1, the unlabeled sample image set in the computer device 100 usually includes several images, and each unlabeled sample image has failed Labeled manually. Taking an image 101 in the unlabeled sample image as an example, the prediction result 102 can be obtained by predicting it through the teacher model (that is, inputting the image 101 to the teacher model, performing target detection through the teacher model and outputting the prediction result 102), the prediction result 102 may include a target object 1021 (the target object 1021 is a person as an example) and a candidate frame that identifies the location area where the target object 1021 is located (ie, the dotted box in Figure 1). Among them, a pseudo label of the category to which the target object 1021 belongs can be attached (for example, the category to which the target object 1021 belongs is "person", or the category to which it belongs can be further refined, for example, the category to which the target object 1021 belongs is female), and at the same time, further A candidate box label can also be attached (the candidate box label is used to identify the position information of the target object 1021). By attaching pseudo labels, it is equivalent to realizing the manual labeling process, which reduces the manual labeling overhead and can also achieve the effect of labeled sample images.
下面,通过具体实施例对本申请的技术方案进行详细说明。需要说明的是,下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。Below, the technical solution of the present application will be described in detail through specific embodiments. It should be noted that the following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.
图2为本申请实施例提供的图像处理方法的流程示意图,如图2所示,该方法具体可以包括如下步骤: Figure 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Figure 2, the method may specifically include the following steps:
步骤S201,获取无标签样本图像,该无标签样本图像包括至少一个目标对象。Step S201: Obtain an unlabeled sample image, which includes at least one target object.
在本实施例中,无标签样本图像中可以包括有多个目标对象或者单个目标对象。示例性的,以单个目标对象为例,该目标对象可以是人或者狗或者猫或者花等等,不同的目标对象所属类别可以相同,也可以不同(例如猫、狗均属于动物,而花属于植物)。In this embodiment, the unlabeled sample image may include multiple target objects or a single target object. For example, taking a single target object as an example, the target object can be a person, a dog, a cat, a flower, etc. The categories to which different target objects belong can be the same or different (for example, cats and dogs both belong to animals, and flowers belong to plant).
示例性的,以无标签样本图像中只包括有单个目标对象为例,可以事先对无标签样本图像进行分类,具体的,可以根据无标签样本图像中的目标对象所属类别来进行归类。例如,具有“人”这一目标对象的无标签样本图像可以归为一类图像,具有“动物”这一目标对象的无标签样本图像可以归为二类图像,具有“植物”这一目标对象的无标签样本图像可以归为三类图像。另外,所属类别可以根据实际情况分为大类以及更加细化的小类,例如“人”属于一个大类,而“性别”属于“人”这一类别中的一个小类,同理,“成人”、“婴儿”、“老人”也属于“人”这里类别中的一个小类。又例如“动物”属于一个大类,而“猫”、“狗”、“鸟”等属于“动物”这一类别下的一个小类。For example, taking an unlabeled sample image that only includes a single target object as an example, the unlabeled sample image can be classified in advance. Specifically, the classification can be performed according to the category to which the target object in the unlabeled sample image belongs. For example, unlabeled sample images with the target object "people" can be classified as class I images, unlabeled sample images with the target object "animals" can be classified as class II images, and images with the target object "plants" can be classified as class II images. The unlabeled sample images can be classified into three categories of images. In addition, the category can be divided into broad categories and more detailed sub-categories according to the actual situation. For example, "people" belongs to a large category, and "gender" belongs to a sub-category of "people". In the same way, " "Adults", "babies", and "elderly people" also belong to a subcategory of the "people" category. Another example is that "animals" belong to a large category, while "cats", "dogs", "birds", etc. belong to a subcategory under the category of "animals".
示例性的,以无标签样本图像包括多个目标对象为例,不同的目标对象可以是同一个类别,例如无标签样本图像中包括两个目标对象,且两个目标对象均为女人。同样,不同的目标对象也可以是不同的类别,例如一张无标签样本图像中包括有三个目标对象,第一个目标对象为男人,第二个目标对象为女人,第三个目标对象为猫,则第一个目标对象和第二个目标对象同属于“人”这一类别(但是“性别”不同,即属于不同小类),而第三目标对象属于“动物”这一类别。For example, if the unlabeled sample image includes multiple target objects, the different target objects may be of the same category. For example, the unlabeled sample image includes two target objects, and both target objects are women. Similarly, different target objects can also be of different categories. For example, an unlabeled sample image includes three target objects. The first target object is a man, the second target object is a woman, and the third target object is a cat. , then the first target object and the second target object both belong to the category "human" (but their "gender" is different, that is, they belong to different subcategories), while the third target object belongs to the category "animal".
步骤S202,对无标签样本图像进行N轮目标检测,得到无标签样本图像中的每个目标对象各自对应的伪标签数据集,伪标签数据集包括目标对象在每一轮目标检测中对应的伪标签,N为大于1的正整数。Step S202: Perform N rounds of target detection on the unlabeled sample image to obtain a pseudo label data set corresponding to each target object in the unlabeled sample image. The pseudo label data set includes the pseudo label data set corresponding to the target object in each round of target detection. Label, N is a positive integer greater than 1.
示例性的,伪标签可以包括类别标签等用于描述目标对象特性的标签。For example, pseudo-labels may include category labels and other labels used to describe the characteristics of the target object.
示例性的,N的取值可以为3-5轮。For example, the value of N can be 3-5 rounds.
在本实施例中,以无标签样本图像包括多个不同类别的目标对象为例,将无标签样本图像输入到例如教师模型进行N轮目标检测,针对无标签样本图像中的每个目标对象,在每一轮目标检测后都有一个对应的伪标签。In this embodiment, for example, the unlabeled sample image includes multiple target objects of different categories. The unlabeled sample image is input to, for example, a teacher model to perform N rounds of target detection. For each target object in the unlabeled sample image, After each round of object detection there is a corresponding pseudo label.
例如,以某一张无标签样本图像包括人和猫两个目标对象为例,将该无标签样本图像输入到教师模型进行N轮目标检测时,目标对象“人”在第N轮目标检测得到了对应的伪标签,目标对象“猫”也在第N轮目标检测中得到了对应的伪标签。用字母R表示人,字母M表示猫为例,则在对该无标签样本图像进行第e轮目标检测时,得到的伪标签可以用如下集合表示,第N轮目标检测时,得到的伪标签可以用集合表示。For example, take an unlabeled sample image that includes two target objects, a human and a cat. When the unlabeled sample image is input to the teacher model for N rounds of target detection, the target object "person" is obtained in the Nth round of target detection. The corresponding pseudo-label was obtained, and the target object "cat" also obtained the corresponding pseudo-label in the Nth round of target detection. Taking the letter R to represent a person and the letter M to represent a cat as an example, when performing the e-th round of target detection on the unlabeled sample image, the pseudo labels obtained can be used as the following set means that in the Nth round of target detection, the pseudo labels obtained can be used as a set express.
其中,表示在第e轮目标检测时,目标对象“人”对应的伪标签,表示在第e轮目标检测时,目标对象“猫”对应的伪标签,同理,表示在第N轮目标检测时,目标对象“人”对应的伪标签,表示在第N轮目标检测时,目标对象“猫”对应的伪标签,e和N均为大于1的正整数。in, Indicates the pseudo label corresponding to the target object "person" in the e-th round of target detection, Indicates the pseudo label corresponding to the target object "cat" in the e-th round of target detection. In the same way, Indicates the pseudo label corresponding to the target object "person" in the Nth round of target detection, Indicates the pseudo label corresponding to the target object "cat" in the Nth round of target detection. Both e and N are positive integers greater than 1.
由此,我们可以得到在完成N轮目标检测后,目标对象“人”对应的伪标签数据集为 目标对象“猫”对应的伪标签数据集为 From this, we can get that after completing N rounds of target detection, the pseudo-label data set corresponding to the target object "person" is The pseudo-label data set corresponding to the target object "cat" is
其中,对于给定的任意一张无标签样本图像,可以利用教师模型进行N轮次的目标检测并产生每一个目标对象在每一个轮次的预测结果(即伪标签),示例性的,预测结果中包括有该目标对象的所属类别。Among them, for any given unlabeled sample image, the teacher model can be used to perform N rounds of target detection and generate the prediction results (ie, pseudo labels) of each target object in each round. For example, prediction The results include the category to which the target object belongs.
关于目标对象的具体示例,也可以参照后文的图3及步骤S302中的相关描述。For specific examples of the target object, reference may also be made to FIG. 3 and the related description in step S302 below.
步骤S203,根据伪标签数据集中的伪标签确定每个目标对象对应的伪标签的置信度。Step S203: Determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set.
在本实施例中,每一轮目标检测得到的目标对象的伪标签是可以互相比对来得到伪标签的置信度。置信度可以理解为是衡量各轮目标检测所确定的伪标签的相似度高低,其取值越低,表明相似度越低,其取值范围可以为[0,1]。In this embodiment, the pseudo labels of the target objects obtained in each round of target detection can be compared with each other to obtain the confidence of the pseudo labels. Confidence can be understood as a measure of the similarity of the pseudo-labels determined in each round of target detection. The lower the value, the lower the similarity. The value range can be [0, 1].
继续以上述目标对象为“人”和“猫”为例,在经过N轮目标检测后,目标对象“人”对应的伪标签数据集为目标对象“猫”对应的伪标签数据集为 此时,可以对目标对象“人”在各轮中得到的伪标签进行比较,即比较伪标签数据集中的各个元素,即是否相同,通过对比发现两者的类别标签不相同(比如一部分类别标签为男,其余部分类别标签为女),即可以判定该目标对象对应的类别标签存在有分歧,置信度不为1;如果均相同,即该目标对象在每一轮检测得到的伪标签都相同,则置信度为1。Continuing to take the above target objects as "people" and "cats" as an example, after N rounds of target detection, the pseudo label data set corresponding to the target object "people" is The pseudo-label data set corresponding to the target object "cat" is At this time, the pseudo labels obtained by the target object "person" in each round can be compared, that is, each element in the pseudo label data set can be compared, that is, Whether they are the same or not, it is found through comparison that the category labels of the two are different (for example, some category labels are male and the rest are female), that is, it can be determined that there is a difference in the category labels corresponding to the target object, and the confidence level is not 1; if are all the same, that is, the pseudo labels obtained by the target object in each round of detection are the same, and the confidence level is 1.
步骤S204,若无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定无标签样本图像为目标无标签样本图像。Step S204: If the confidence level of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than the preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image.
在得到无标签样本图像中的各目标对象对应的伪标签的置信度后,可以基于置信度来确定该无标签样本图像是不是目标无标签样本图像。After obtaining the confidence of the pseudo-label corresponding to each target object in the unlabeled sample image, it can be determined based on the confidence whether the unlabeled sample image is the target unlabeled sample image.
示例性的,如步骤S201所述,无标签样本图像中既可以有一个目标对象,也可以有多个目标对象。当有一个目标对象时,可以确定该目标对象对应的伪标签的置信度是否大于预设阈值,如果大于预设阈值,则表示该无标签样本图像为目标无标签样本图像;当有多个目标对象时,可以分别判断每个目标对象对应的伪标签的置信度是否大于预设阈值,如果多个目标对象中有至少一个目标对象对应的伪标签的置信度大于预设阈值,则可以确定该无标签样本图像为目标无标签样本图像。For example, as described in step S201, there may be one target object or multiple target objects in the unlabeled sample image. When there is a target object, it can be determined whether the confidence of the pseudo label corresponding to the target object is greater than the preset threshold. If it is greater than the preset threshold, it means that the unlabeled sample image is the target unlabeled sample image; when there are multiple targets object, it can be judged whether the confidence of the pseudo label corresponding to each target object is greater than the preset threshold. If the confidence of the pseudo label corresponding to at least one target object among the multiple target objects is greater than the preset threshold, it can be determined that the pseudo label corresponding to the target object is greater than the preset threshold. The unlabeled sample image is the target unlabeled sample image.
本申请实施例通过对无标签样本图像进行N轮目标检测,得到每一轮目标检测中每个目标对象对应的伪标签,然后基于每个目标对象在每一轮得到的伪标签,确定出每个目标对象对应的伪标签的置信度,将置信度高的伪标签对应的无标签样本图像选定为目标无标签样本图像,并以目标无标签样本图像作为最后用于训练学生模型的样本,有效的滤除了噪声信息,以此来提高目标检测性能。The embodiment of this application performs N rounds of target detection on unlabeled sample images to obtain the pseudo labels corresponding to each target object in each round of target detection, and then determines each target object based on the pseudo labels obtained in each round. The confidence of the pseudo-label corresponding to each target object, select the unlabeled sample image corresponding to the pseudo-label with high confidence as the target unlabeled sample image, and use the target unlabeled sample image as the final sample used to train the student model, Effectively filter out noise information to improve target detection performance.
以上参照图2对本申请实施例提供的图像处理方法进行了说明。在实际中,也可以以图像数据集的形式处理图像样本,图像数据集可以包括一张或多张无标签样本图像。下面结合图3对该过程进行说明。可以理解的是,以下实施例中,图像数据集中的无标签样本图像可以不止一张,针对每张无标签样本图像的处理过程都是类似的。The image processing method provided by the embodiment of the present application has been described above with reference to FIG. 2 . In practice, image samples can also be processed in the form of image data sets, which can include one or more unlabeled sample images. This process will be described below with reference to Figure 3. It can be understood that in the following embodiments, there may be more than one unlabeled sample image in the image data set, and the processing process for each unlabeled sample image is similar.
图3为本申请实施例提供的图像处理方法的流程示意图,如图3所示,该方法具体可以包括如下步骤:Figure 3 is a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Figure 3, the method may specifically include the following steps:
步骤S301,获取图像数据集。其中,图像数据集包括至少一个无标签样本图像,每个无标签样本图像包括至少一个目标对象。 Step S301: Obtain an image data set. Wherein, the image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object.
在本实施例中,图像数据集中的每个无标签样本图像中可以包括有多个目标对象或者单个目标对象。示例性的,以单个目标对象为例,该目标对象可以是人或者狗或者猫或者花等等,不同的目标对象所属类别不同(例如猫、狗属于动物,而花属于植物)。In this embodiment, each unlabeled sample image in the image data set may include multiple target objects or a single target object. For example, taking a single target object as an example, the target object may be a person, a dog, a cat, a flower, etc. Different target objects belong to different categories (for example, cats and dogs belong to animals, and flowers belong to plants).
示例性的,以无标签样本图像中只包括有单个目标对象为例,可以事先对图像数据集中的无标签样本图像进行分类,具体的,可以根据无标签样本图像中的目标对象所属类别来进行分类。例如,将具有“人”这一目标对象的无标签样本图像归为一类图像,将具有“动物”这一目标对象的无标签样本图像归为二类图像,将具有“植物”这一目标对象的无标签样本图像归为三类图像。另外,所属类别可以根据实际情况分为大类以及更加细化的小类,例如“人”属于一个大类,而“性别”属于“人”这一类别中的一个小类,同理,“成人”、“婴儿”、“老人”也属于“人”这里类别中的一个小类。又例如“动物”属于一个大类,而“猫”、“狗”、“鸟”等属于“动物”这一类别下的一个小类。For example, taking an unlabeled sample image that only includes a single target object as an example, the unlabeled sample images in the image data set can be classified in advance. Specifically, this can be done according to the category to which the target object in the unlabeled sample image belongs. Classification. For example, unlabeled sample images with the target object "people" are classified as Class I images, unlabeled sample images with the target object "Animals" are classified as Class II images, and images with the target object "Plants" are classified as Class II images. Unlabeled sample images of objects are classified into three categories of images. In addition, the category can be divided into broad categories and more detailed sub-categories according to the actual situation. For example, "people" belongs to a large category, and "gender" belongs to a sub-category of "people". In the same way, " "Adults", "babies", and "elderly people" also belong to a subcategory of the category "people". Another example is that "animals" belong to a large category, while "cats", "dogs", "birds", etc. belong to a subcategory under the category of "animals".
示例性的,无标签样本图像T1和无标签样本图像T2中的目标对象均为一个女人,无标签样本图像T3中的目标对象为一只狗,无标签样本图像T4中的目标对象为一朵花,则无标签样本图像T1和无标签样本图像T2划分为“女人”这一类图像,无标签样本图像T3划分到“狗”这一类图像,无标签样本图像T4划分到“植物”这一类图像。不同类的无标签样本图像区分后分开进行后续的目标检测,得到对应的检测结果。For example, the target object in the unlabeled sample image T1 and the unlabeled sample image T2 is a woman, the target object in the unlabeled sample image T3 is a dog, and the target object in the unlabeled sample image T4 is a flower. flower, then the unlabeled sample image T1 and the unlabeled sample image T2 are classified into the category "woman", the unlabeled sample image T3 is classified into the category "dog", and the unlabeled sample image T4 is classified into the category "plant". A type of image. Unlabeled sample images of different types are distinguished and then separately subjected to subsequent target detection to obtain corresponding detection results.
示例性的,在本实施例中,无标签样本图像中也可以包括有多个目标对象,不同的目标对象可以是同一个类别,例如无标签样本图像中包括两个目标对象,且两个目标对象均为女人。同样,不同的目标对象也可以是不同的类别,例如一张无标签样本图像中包括有三个目标对象,第一个目标对象为男人,第二个目标对象为女人,第三个目标对象为猫,则第一个目标对象和第二个目标对象同属于“人”这一类别(但是“性别”不同,即属于不同小类),而第三目标对象属于“动物”这一类别。For example, in this embodiment, the unlabeled sample image may also include multiple target objects, and the different target objects may be of the same category. For example, the unlabeled sample image may include two target objects, and the two targets The subjects are all women. Similarly, different target objects can also be of different categories. For example, an unlabeled sample image includes three target objects. The first target object is a man, the second target object is a woman, and the third target object is a cat. , then the first target object and the second target object both belong to the category "human" (but their "gender" is different, that is, they belong to different subcategories), while the third target object belongs to the category "animal".
步骤S302,对图像数据集中每个无标签样本图像进行N轮目标检测,得到每个无标签样本图像中的每个目标对象对应的伪标签数据集。伪标签数据集包括目标对象在每一轮目标检测中对应的伪标签。其中,伪标签至少包括类别标签,N为大于1的正整数。示例性的,N的取值为3-5轮。Step S302: Perform N rounds of target detection on each unlabeled sample image in the image data set to obtain a pseudo-label data set corresponding to each target object in each unlabeled sample image. The pseudo-label data set includes the pseudo-labels corresponding to the target objects in each round of object detection. Among them, the pseudo labels include at least category labels, and N is a positive integer greater than 1. For example, the value of N is 3-5 rounds.
在本实施例中,以无标签样本图像包括多个不同类别的目标对象为例,将无标签样本图像输入到例如教师模型进行N轮目标检测,针对无标签样本图像中的每个目标对象,在每一轮目标检测后都有一个对应的伪标签。In this embodiment, for example, the unlabeled sample image includes multiple target objects of different categories. The unlabeled sample image is input to, for example, a teacher model to perform N rounds of target detection. For each target object in the unlabeled sample image, There is a corresponding pseudo-label after each round of object detection.
例如,以某一张无标签样本图像包括人和猫两个目标对象为例,将该无标签样本图像输入到教师模型进行N轮目标检测时,目标对象“人”在第N轮目标检测得到了对应的伪标签,目标对象“猫”也在第N轮目标检测中得到了对应的伪标签。用字母R表示人,字母M表示猫为例,则在对该无标签样本图像进行第e轮目标检测时,得到的伪标签可以用如下集合表示,第N轮目标检测时,得到的伪标签可以用集合表示。For example, take an unlabeled sample image that includes two target objects, a human and a cat. When the unlabeled sample image is input to the teacher model for N rounds of target detection, the target object "person" is obtained in the Nth round of target detection. The corresponding pseudo-label was obtained, and the target object "cat" also obtained the corresponding pseudo-label in the Nth round of target detection. Taking the letter R to represent a person and the letter M to represent a cat as an example, when performing the e-th round of target detection on the unlabeled sample image, the pseudo labels obtained can be used as the following set means that in the Nth round of target detection, the pseudo labels obtained can be used as a set express.
其中,表示在第e轮目标检测时,目标对象“人”对应的伪标签,表示在第e轮目标检测时,目标对象“猫”对应的伪标签,同理,表示在第N轮目标检测时,目标对象“人”对应的伪标签,表示在第N轮目标检测时,目标对象“猫”对应的伪标签,e和N均为大于1的正整数。 in, Indicates the pseudo label corresponding to the target object "person" in the e-th round of target detection, Indicates the pseudo label corresponding to the target object "cat" in the e-th round of target detection. In the same way, Indicates the pseudo label corresponding to the target object "person" in the Nth round of target detection, Indicates the pseudo label corresponding to the target object "cat" in the Nth round of target detection. Both e and N are positive integers greater than 1.
由此,我们可以得到在完成N轮目标检测后,目标对象“人”对应的伪标签数据集为 目标对象“猫”对应的伪标签数据集为 From this, we can get that after completing N rounds of target detection, the pseudo-label data set corresponding to the target object "person" is The pseudo-label data set corresponding to the target object "cat" is
其中,对于给定的任意一张无标签样本图像,可以利用教师模型进行N轮次的目标检测并产生每一个目标对象在每一个轮次的预测结果(即伪标签),示例性的,预测结果中包括有该目标对象的所属类别。示例性的,图4为本申请实施例提供的教师模型预测流程示意图,如图4所示,可以通过教师模型40对无标签样本图像A进行目标检测,并记录无标签样本图像A的每一轮预测结果(图4中包括有N轮结果)。Among them, for any given unlabeled sample image, the teacher model can be used to perform N rounds of target detection and generate the prediction results (ie, pseudo labels) of each target object in each round. For example, prediction The results include the category to which the target object belongs. Exemplarily, Figure 4 is a schematic diagram of the teacher model prediction process provided by the embodiment of the present application. As shown in Figure 4, the teacher model 40 can be used to perform target detection on the unlabeled sample image A, and record each of the unlabeled sample images A. Round prediction results (Figure 4 includes N round results).
其中,以无标签样本图像A中包括有一个目标对象“人”为例,第1轮目标检测得到预测结果41中,该目标对象对应有一个类别标签,其中,类别标签指示该目标对象为女性。同理,在第N轮目标检测得到预测结果42中,该目标对象对应的类别标签指示该目标对象为男性。由此可以发现,教师模型在对同一张无标签样本图像进行多轮次的目标检测中,对于同一张无标签样本图像中的同一个目标对象,可能会给出不同的伪标签,即教师模型给出的伪标签有可能是不可靠的,由此可能导致最终用于训练学生模型的样本存在噪声。Among them, taking the unlabeled sample image A as including a target object "person" as an example, in the prediction result 41 obtained in the first round of target detection, the target object corresponds to a category label, where the category label indicates that the target object is female. . Similarly, in the prediction result 42 obtained in the Nth round of target detection, the category label corresponding to the target object indicates that the target object is male. It can be found that when the teacher model performs multiple rounds of target detection on the same unlabeled sample image, it may give different pseudo labels to the same target object in the same unlabeled sample image, that is, the teacher model The pseudo labels given may be unreliable, which may lead to noise in the samples ultimately used to train the student model.
步骤S303,根据伪标签数据集中的伪标签确定每个目标对象对应的伪标签的置信度。Step S303: Determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set.
在本实施例中,每一轮目标检测得到的目标对象的伪标签是可以互相比对的,示例性的,继续参考图4,将第一轮目标检测得到的预测结果41中的目标对象的类别标签(该类别标签指示的是女性)与第N轮目标检测得到的预测结果42中目标对象的类别标签(该类别标签指示的是男性)进行对比,通过对比发现两者的类别标签不相同(即一个类别标签为男,一个类别标签为女),即可以判定这个目标对象对应的类别标签存在有分歧,置信度不为1。其中,置信度的取值范围可以为[0,1],其中,当目标对象在每一轮检测得到的伪标签都相同时,则置信度为1,如果目标对象在每一轮检测得到的伪标签都不相同,则置信度为0。置信度可以理解为是衡量各轮目标检测所确定的伪标签的相似度高低,其取值越低,表明相似度越低。In this embodiment, the pseudo labels of the target objects obtained in each round of target detection can be compared with each other. For example, continuing to refer to Figure 4, the prediction results 41 of the target object obtained in the first round of target detection are compared with each other. The category label (the category label indicates female) is compared with the category label of the target object in the prediction result 42 obtained in the Nth round of target detection (the category label indicates male). Through comparison, it is found that the category labels of the two are different. (That is, one category label is male and the other category label is female), that is, it can be determined that there is a difference in the category label corresponding to the target object, and the confidence level is not 1. Among them, the value range of the confidence level can be [0, 1]. Among them, when the pseudo labels obtained by the target object in each round of detection are the same, the confidence level is 1. If the pseudo labels obtained by the target object in each round of detection are the same, the confidence level is 1. If the pseudo-labels are all different, the confidence level is 0. Confidence can be understood as a measure of the similarity of the pseudo-labels determined in each round of target detection. The lower the value, the lower the similarity.
步骤S304,根据置信度,在图像数据集中确定目标无标签样本图像,其中,目标无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值。Step S304: Determine the target unlabeled sample image in the image data set according to the confidence level, where the confidence level that there is at least one pseudo label corresponding to the target object in the target unlabeled sample image is greater than a preset threshold.
在本实施例中,继续参考图4,对于在N轮目标检测得到的目标对象的伪标签的置信度小于预设阈值(例如预设阈值取值为1)的无标签样本图像A,其会被筛选掉,不会作为目标无标签样本图像。In this embodiment, continuing to refer to Figure 4, for the unlabeled sample image A where the confidence of the pseudo-label of the target object obtained in N rounds of target detection is less than the preset threshold (for example, the preset threshold value is 1), it will are filtered out and will not be used as target unlabeled sample images.
示例性的,图5为本申请另一实施例提供的目标检测结果的示意图,如图5所示,教师模型40对无标签样本图像B进行N轮目标检测后发现第1轮目标检测得到的预测结果51中包含的目标对象对应的类别标签为女性,第N轮目标检测得到的预测结果52中包含的目标对象的类别标签也为女性,由于第1轮目标检测得到的目标对象对应的伪标签到第N轮目标检测得到的目标对象对应的伪标签均是相同的,由此可以给与该目标对象对应的伪标签的置信度为1,此时无标签样本图像B就可以作为目标无标签样本图像。Illustratively, Figure 5 is a schematic diagram of the target detection results provided by another embodiment of the present application. As shown in Figure 5, the teacher model 40 performs N rounds of target detection on the unlabeled sample image B and finds that the results obtained in the first round of target detection are The category label corresponding to the target object included in the prediction result 51 is female, and the category label corresponding to the target object included in the prediction result 52 obtained in the Nth round of target detection is also female. Since the pseudo object corresponding to the target object obtained in the first round of target detection is The pseudo labels corresponding to the target object obtained from the label to the Nth round of target detection are all the same. Therefore, the confidence level of the pseudo label corresponding to the target object can be given to 1. At this time, the unlabeled sample image B can be used as the target unlabeled Label sample image.
另外,在本实施例中,如果一张无标签样本图像中存在有多个不同的目标对象,例如既包括目标对象“人”,又包括目标对象“狗”,则目标对象“人”对应的伪标签存在一个置信度,目标对象“狗”对应的伪标签也存在一个置信度。此时,如果其中的任意一个目 标对象对应的伪标签的置信度大于预设阈值,则这张无标签样本图像可以作为目标无标签样本图像。In addition, in this embodiment, if there are multiple different target objects in an unlabeled sample image, for example, including both the target object "person" and the target object "dog", then the target object "person" corresponds to There is a confidence level for the pseudo-label, and there is also a confidence level for the pseudo-label corresponding to the target object "dog". At this time, if any of the objects If the confidence of the pseudo-label corresponding to the target object is greater than the preset threshold, then this unlabeled sample image can be used as the target unlabeled sample image.
例如存在一张无标签样本图像C中“人”对应的类别标签的置信度为1且大于预设阈值,而“狗”对应的类别标签的置信度为0且小于预设阈值,则这张无标签样本图像C依然可以作为目标无标签样本图像。For example, in an unlabeled sample image C, the confidence of the category label corresponding to "person" is 1 and greater than the preset threshold, and the confidence of the category label corresponding to "dog" is 0 and less than the preset threshold, then this The unlabeled sample image C can still be used as the target unlabeled sample image.
在本实施例中,目标无标签样本图像是作为训练数据,是用于训练学生模型的,其网络参数通过损失函数的梯度下降获得,而教师模型的参数则是基于学生模型的参数进行更新。具体的,可以通过指数移动平均(Exponential Moving Average,EMA)策略来对教师模型的参数进行更新。In this embodiment, the target unlabeled sample image is used as training data to train the student model. Its network parameters are obtained through the gradient descent of the loss function, and the parameters of the teacher model are updated based on the parameters of the student model. Specifically, the parameters of the teacher model can be updated through the Exponential Moving Average (EMA) strategy.
其中,在利用目标无标签样本图像训练学生模型时,如果该目标无标签样本图像中的某一个目标对象对应的类别标签的置信度小于预设阈值(例如上述的无标签样本图像C),则这个目标对象在训练时会被剔除来,只保留类别标签的置信度大于预设阈值的目标对象(例如无标签样本图像C中的“人”)。Among them, when using the target unlabeled sample image to train the student model, if the confidence of the category label corresponding to a certain target object in the target unlabeled sample image is less than the preset threshold (such as the above-mentioned unlabeled sample image C), then This target object will be eliminated during training, and only target objects whose category label confidence is greater than the preset threshold (such as "person" in unlabeled sample image C) will be retained.
本申请实施例通过对无标签样本图像进行N轮目标检测,得到每一轮目标检测中每个目标对象对应的伪标签,然后基于每个目标对象在每一轮得到的伪标签,确定出每个目标对象对应的伪标签的置信度,将置信度高的伪标签对应的无标签样本图像作为最后用于训练学生模型的样本,有效的滤除了噪声信息,以此来提高目标检测性能。The embodiment of this application performs N rounds of target detection on unlabeled sample images to obtain the pseudo labels corresponding to each target object in each round of target detection, and then determines each target object based on the pseudo labels obtained in each round. The confidence of the pseudo-label corresponding to each target object is used, and the unlabeled sample image corresponding to the pseudo-label with high confidence is used as the final sample for training the student model, which effectively filters out the noise information to improve the target detection performance.
在另一些实施例中,在执行上述步骤S301时,具体可以通过如下步骤实现:对所有无标签样本图像进行N轮目标检测,得到每个无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别,确定每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签,确定每个无标签样本图像中的每个目标对象对应的伪标签数据集。In other embodiments, when performing the above step S301, it can be implemented through the following steps: perform N rounds of target detection on all unlabeled sample images, and obtain each target object in each unlabeled sample image in each round. The attribute categories determined in target detection; based on the attribute categories determined in each round of target detection for each target object in each unlabeled sample image, determine the attributes of each target object in each unlabeled sample image. The corresponding category label in each round of target detection; according to the category label corresponding to each target object in each unlabeled sample image in each round of target detection, determine the corresponding category label of each target object in each unlabeled sample image. Pseudo-labeled dataset.
在本实施实施例中,继续参考图4和图5,以图像数据集中包括有无标签样本图像A和无标签样本图像B为例,无标签样本图像A和无标签样本图像B中均只具有单个目标对象“人”。其中,参考图4,无标签样本图像A在第1轮目标检测时确定出目标对象“人”对应的属性类别为“女”(在此处属性类别是指“人”的性别),此时可以给第1轮目标检测确定出的目标对象贴上一个类别标签“女”。在第N轮目标检测时确定出目标对象“人”对应的属性类别为“男”,此时可以给第N轮目标检测确定出的目标对象贴上一个类别标签“男”。In this embodiment, continuing to refer to Figures 4 and 5, taking the image data set including unlabeled sample image A and unlabeled sample image B as an example, both unlabeled sample image A and unlabeled sample image B only have Single target object "person". Among them, referring to Figure 4, unlabeled sample image A determines in the first round of target detection that the attribute category corresponding to the target object "person" is "female" (the attribute category here refers to the gender of "person"). At this time A category label "female" can be attached to the target object determined in the first round of target detection. In the Nth round of target detection, it is determined that the attribute category corresponding to the target object "person" is "male". At this time, a category label "male" can be attached to the target object determined in the Nth round of target detection.
同理,对于无标签样本图像B,第1轮目标检测得到的检测结果51中,目标对象“人”所属类别为女,则对应的类别标签为“女”,第N轮目标检测得到的检测结果52中,该目标对象“人”所属类别依然为女,则对应的类别标签不变,依然为“女”。In the same way, for the unlabeled sample image B, among the detection results 51 obtained in the first round of target detection, the category of the target object "person" is female, then the corresponding category label is "female", and the detection results obtained in the Nth round of target detection are In result 52, the category to which the target object "person" belongs is still female, and the corresponding category label remains unchanged and is still "female".
本申请实施例通过对无标签样本图像进行N轮目标检测,每一轮目标检测都可以得到无标签样本图像中的每个目标对象的一个预测结果,由此确定出各目标对象在每一轮目标检测中的类别标签,以及类别标签的置信度,能够基于置信度确定出该类别标签的可靠性,减少伪标签信息的噪音。In this embodiment of the present application, N rounds of target detection are performed on unlabeled sample images. Each round of target detection can obtain a prediction result for each target object in the unlabeled sample image, thereby determining the accuracy of each target object in each round. The category label in target detection, as well as the confidence of the category label, can determine the reliability of the category label based on the confidence and reduce the noise of pseudo-label information.
进一步的,在上述实施例的基础上,在另一些实施例中,伪标签还可以包括候选框标签。由此,上述步骤S302具体可以包括如下步骤:对所有无标签样本图像进行N轮目标 检测,得到每个无标签样本图像中的每个目标对象在每一轮目标检测中在无标签样本图像中的位置信息;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中在无标签样本图像中的位置信息,确定每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的候选框标签;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签和候选框标签,确定每个无标签样本图像中的每个目标对象对应的伪标签数据集。Further, based on the above embodiments, in other embodiments, the pseudo labels may also include candidate box labels. Therefore, the above-mentioned step S302 may specifically include the following steps: perform N rounds of target analysis on all unlabeled sample images. Detection, obtain the position information of each target object in each unlabeled sample image in each round of target detection in the unlabeled sample image; according to the position information of each target object in each unlabeled sample image in each round of target detection The position information in the unlabeled sample image during detection is used to determine the candidate box label corresponding to each target object in each unlabeled sample image in each round of target detection; according to each target in each unlabeled sample image The corresponding category labels and candidate box labels of the objects in each round of target detection are used to determine the pseudo-label data set corresponding to each target object in each unlabeled sample image.
在本实施例中,图6为本申请另一实施例提供的目标检测结果的示意图,如图6所示,无标签样本图像61中具有一个目标对象“人”,在教师模型40对无标签样本图像61进行一轮目标检测得到预测结果62后,可以为该目标对象贴上伪标签。其中,伪标签可以包括有类别标签和候选框标签。类别标签可以用于指示该目标对象的所属类别(例如该目标对象所属类别为女性),候选框标签用于指示该目标对象在无标签样本图像61中的位置,具体的,候选框标签的形式可以为(Z1,Z2,Z3,Z4),其中,Z1,Z2,Z3,Z4分别表示预测结果62中的虚线框的四个边角的位置坐标。In this embodiment, Figure 6 is a schematic diagram of the target detection result provided by another embodiment of the present application. As shown in Figure 6, there is a target object "person" in the unlabeled sample image 61. In the teacher model 40, there is a target object "person". After a round of target detection is performed on the sample image 61 to obtain the prediction result 62, a pseudo label can be attached to the target object. Among them, pseudo labels can include category labels and candidate box labels. The category label can be used to indicate the category to which the target object belongs (for example, the category to which the target object belongs is female), and the candidate box label is used to indicate the position of the target object in the unlabeled sample image 61. Specifically, the candidate box label is in the form of It can be (Z1, Z2, Z3, Z4), where Z1, Z2, Z3, and Z4 respectively represent the position coordinates of the four corners of the dotted box in the prediction result 62.
进一步的,在实际应用中,无标签样本图像中可能包括有多个目标对象,例如无标签样本图像中包括有人和动物(例如猫、狗等等)。示例性的,图7为本申请另一实施例提供的目标检测的结果示意图,如图7所示,无标签样本图像70中包括有一个女性目标对象和一个男性目标对象,通过教师模型40对无标签样本图像70进行目标检测后得到预测结果71。其中,预测结果71中包括有女性目标对象的伪标签和男性目标对象的伪标签,而伪标签又具体包括有类别标签和候选框标签。其中,类别标签用于指示目标对象的性别,而候选框标签用于指示目标对象在无标签样本图像中的位置。Furthermore, in practical applications, the unlabeled sample image may include multiple target objects. For example, the unlabeled sample image may include people and animals (such as cats, dogs, etc.). Exemplarily, Figure 7 is a schematic diagram of the results of target detection provided by another embodiment of the present application. As shown in Figure 7, the unlabeled sample image 70 includes a female target object and a male target object, and the teacher model 40 The unlabeled sample image 70 is subjected to target detection and a prediction result 71 is obtained. Among them, the prediction result 71 includes pseudo-labels of female target objects and pseudo-labels of male target objects, and the pseudo-labels specifically include category labels and candidate box labels. Among them, the category label is used to indicate the gender of the target object, and the candidate box label is used to indicate the position of the target object in the unlabeled sample image.
本申请实施例通过将伪标签分为类别标签和候选框标签两种,基于目标对象对应的两种标签的置信度来筛选目标无标签样本图像,可以筛选得到更加可靠的目标无标签样本图像,减少噪声信息的干扰,由此进一步提高后续训练得到的学生模型的目标检测性能。The embodiment of this application divides the pseudo labels into two categories: category labels and candidate box labels, and filters the target unlabeled sample images based on the confidence of the two labels corresponding to the target object, so that a more reliable target unlabeled sample image can be obtained. Reduce the interference of noise information, thereby further improving the target detection performance of the student model obtained by subsequent training.
进一步的,在一些实施例中,在获取每一轮目标检测得到的目标的伪标签的置信度时,需要根据伪标签数据集中的每个目标对象的所有伪标签,确定目标对象的类别标签对应的置信度以及目标对象的候选框标签对应的置信度。Further, in some embodiments, when obtaining the confidence of the pseudo-label of the target obtained in each round of target detection, it is necessary to determine the category label corresponding to the target object based on all the pseudo-labels of each target object in the pseudo-label data set. The confidence level and the confidence level corresponding to the candidate box label of the target object.
在本实施例中,可以继续参考图7,每一个目标对象都有其对应的伪标签,例如图7中的男性目标对象对应的伪标签有类别标签“男性”和候选框标签“(Z1’,Z2’,Z3’,Z4’)”,每一个标签都对应有一个置信度。In this embodiment, you can continue to refer to Figure 7. Each target object has its corresponding pseudo label. For example, the pseudo label corresponding to the male target object in Figure 7 includes the category label "male" and the candidate box label "(Z1' , Z2', Z3', Z4')", each label corresponds to a confidence level.
其中,在计算目标对象的每一个标签的置信度时,具体可以利用该目标对象在每一轮目标检测中得到的伪标签互相对比来确定出该目标对象对应伪标签的置信度。When calculating the confidence of each label of the target object, the pseudo-labels obtained by the target object in each round of target detection can be compared with each other to determine the confidence of the pseudo-label corresponding to the target object.
示例性的,以计算图7中无标签样本图像70中的男性目标对象对应的类别标签的置信度为例,可以先确定该男性目标对象每一轮目标检测中对应的类别标签所指示的性别。例如以N=3为例,该男性目标对象在第一轮目标检测中对应的类别标签为“男性”,在第二轮目标检测中对应的类别标签为“男性”,在第三轮目标检测中对应的类别标签为“女性”,则对比三个轮次中得到的各个类别标签可得类别标签为“男性”的概率为2/3,类别标签为“女性”的概率为1/3。取最大概率作为该类别标签的置信度,则男性目标对象对应的类别标签的置信度为2/3。 For example, taking the calculation of the confidence of the category label corresponding to the male target object in the unlabeled sample image 70 in Figure 7 as an example, the gender indicated by the category label corresponding to the male target object in each round of target detection can be determined first. . For example, taking N=3 as an example, the corresponding category label of the male target object is "male" in the first round of target detection, and the corresponding category label is "male" in the second round of target detection. In the third round of target detection, the corresponding category label is "male". The corresponding category label is "female", then comparing the category labels obtained in the three rounds, the probability that the category label is "male" is 2/3, and the probability that the category label is "female" is 1/3. Taking the maximum probability as the confidence of the category label, the confidence of the category label corresponding to the male target object is 2/3.
示例性的,图8为本申请又一实施例提供的目标检测的结果示意图,如图8所示,无标签样本图像70中包括有第一目标对象701和第二目标对象702。教师模型40通过对无标签样本图像70进行三轮目标检测,分别得到了第一伪标签图像81、第二伪标签图像82和第三伪标签图像83。其中,第一伪标签图像81中检测得到第一目标对象701包括有类别标签(即“女性”)和候选框标签(即(Z1”,Z2”,Z3”,Z4”)),第二目标对象702包括有类别标签(即“男性”)和候选框标签(即(Z1',Z2',Z3',Z4'))。第二伪标签图像82中检测得到的第一目标对象701包括有类别标签(即“女性”)和候选框标签(即(Z1”,Z2”,Z3”,Z4”)),第二目标对象702包括有类别标签(即“男性”)和候选框标签(即(Z1',Z2',Z3',Z4'))。第三伪标签图像83中检测得到的第一目标对象701包括有类别标签(即“男性”)和候选框标签(即(Z1”,Z2”,Z3”,Z4”)),第二目标对象702包括有类别标签(即“男性”)和候选框标签(即(Z1”',Z2”',Z3”',Z4”'))。Exemplarily, FIG. 8 is a schematic diagram of the results of target detection provided by yet another embodiment of the present application. As shown in FIG. 8 , the unlabeled sample image 70 includes a first target object 701 and a second target object 702 . The teacher model 40 performs three rounds of target detection on the unlabeled sample image 70 to obtain the first pseudo-label image 81, the second pseudo-label image 82, and the third pseudo-label image 83 respectively. Among them, the first target object 701 detected in the first pseudo-label image 81 includes a category label (i.e., "female") and a candidate box label (i.e., (Z1", Z2", Z3", Z4")), and the second target The object 702 includes a category label (ie, "male") and a candidate box label (ie, (Z1', Z2', Z3', Z4')). The first target object 701 detected in the second pseudo-label image 82 includes a category label (i.e., “female”) and a candidate box label (i.e., (Z1″, Z2″, Z3″, Z4″)). The second target object 702 includes category labels (i.e. "male") and candidate box labels (i.e. (Z1', Z2', Z3', Z4')). The first target object 701 detected in the third pseudo-label image 83 includes a category label (i.e., “male”) and a candidate box label (i.e., (Z1”, Z2”, Z3”, Z4”)), and the second target object 702 includes category labels (i.e., "male") and candidate box labels (i.e., (Z1"', Z2"', Z3"', Z4"')).
在本实施例中,是将各个轮次检测得到的同一个目标的类别标签进行对比,从而确定该目标的类别标签的置信度。具体的,可以将第一伪标签图像81中的第一目标对象701的类别标签与第二伪标签图像82中的第一目标对象701的类别标签、第三伪标签图像83中的第一目标对象701的类别标签分别进行对比,从对比中可以发现第三伪标签图像83中的第一目标对象701的类别标签是“男性”,而第一伪标签图像81中的第一目标对象701的类别标签、第二伪标签图像82中的第一目标对象701的类别标签均为“女性”,即第三伪标签图像83中的第一目标对象701的类别标签与第一伪标签图像81中的第一目标对象701的类别标签、第二伪标签图像82中的第一目标对象701的类别标签均不相同。由此可以确定第一目标对象701的类别标签的置信度。In this embodiment, the category labels of the same target detected in each round are compared to determine the confidence of the category label of the target. Specifically, the category label of the first target object 701 in the first pseudo label image 81 can be combined with the category label of the first target object 701 in the second pseudo label image 82 and the first target in the third pseudo label image 83 The category labels of the objects 701 are compared respectively. From the comparison, it can be found that the category label of the first target object 701 in the third pseudo-label image 83 is "male", while the category label of the first target object 701 in the first pseudo-label image 81 is The category label and the category label of the first target object 701 in the second pseudo-label image 82 are both “female”, that is, the category label of the first target object 701 in the third pseudo-label image 83 is the same as the category label in the first pseudo-label image 81 The category label of the first target object 701 and the category label of the first target object 701 in the second pseudo-label image 82 are both different. Thus, the confidence of the category label of the first target object 701 can be determined.
示例性的,还可以计算图7中男性目标对象对应的候选框标签的置信度。例如可以从伪标签数据集中获取男性目标对象在每轮目标检测中对应的候选框标签,基于候选框标签中的坐标信息,计算每两个候选区域间的交并比并求取均值,即可确定出候选框标签的置信度。For example, the confidence of the candidate box label corresponding to the male target object in Figure 7 can also be calculated. For example, the candidate box labels corresponding to the male target object in each round of target detection can be obtained from the pseudo-label data set. Based on the coordinate information in the candidate box labels, the intersection ratio between each two candidate areas can be calculated and the average value can be obtained. Determine the confidence of the candidate box label.
具体的,继续参考图8,以无标签样本图像70中的第二目标人物702为例,在伪标签图像81中第二目标对象702对应的候选框标签的坐标信息为(Z1',Z2',Z3',Z4'),在伪标签图像82中第二目标对象702对应的候选框标签的坐标信息为(Z1',Z2',Z3',Z4'),而伪标签图像83中第二目标对象702对应的候选框标签的坐标信息为(Z1”',Z2”',Z3”',Z4”')。在图8中,伪标签图像81中第二目标对象702对应的候选框标签的坐标信息与伪标签图像82中第二目标对象702对应的候选框标签的坐标信息相同(均为(Z1',Z2',Z3',Z4')),即候选框区域完全重叠,而伪标签图像81中第二目标对象702对应的候选框标签的坐标信息、伪标签图像82中第二目标对象702对应的候选框标签的坐标信息与伪标签图像83中第二目标对象702对应的候选框标签的坐标信息均不相同,只有部分重叠。其中,若伪标签图像81中第二目标对象702的候选框标签的位置区域与伪标签图像82中第二目标对象702的候选框标签的位置区域重合度为1,伪标签图像81中第二目标对象702的候选框标签的位置区域与伪标签图像83中第二目标对象702的候选框标签的位置区域重合度为0.5,伪标签图像82中第二目标对象702的候选框标签的位置区域与伪标签图像83中第二目标对象702的候选框标签的位置区域重合度也为0.5,则平均值为(0.5+0.5+1)/3=2/3。即第二目标对象702的候选框标签的置信度为2/3。 Specifically, continuing to refer to Figure 8, taking the second target person 702 in the unlabeled sample image 70 as an example, the coordinate information of the candidate box label corresponding to the second target object 702 in the pseudo-label image 81 is (Z1', Z2' , Z3', Z4'), in the pseudo label image 82, the coordinate information of the candidate box label corresponding to the second target object 702 is (Z1', Z2', Z3', Z4'), and the second target object in the pseudo label image 83 The coordinate information of the candidate box label corresponding to the target object 702 is (Z1"', Z2"', Z3"', Z4"'). In FIG. 8 , the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 81 is the same as the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 82 (both are (Z1', Z2', Z3', Z4')), that is, the candidate frame areas completely overlap, and the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 81, the coordinate information of the candidate frame label corresponding to the second target object 702 in the pseudo label image 82 The coordinate information of the candidate box label is different from the coordinate information of the candidate box label corresponding to the second target object 702 in the pseudo label image 83, and only partially overlaps. Among them, if the overlap degree of the position area of the candidate frame label of the second target object 702 in the pseudo label image 81 and the position area of the candidate frame label of the second target object 702 in the pseudo label image 82 is 1, the second target frame label in the pseudo label image 81 The coincidence degree between the position area of the candidate frame label of the target object 702 and the position area of the candidate frame label of the second target object 702 in the pseudo label image 83 is 0.5. The position area of the candidate frame label of the second target object 702 in the pseudo label image 82 is 0.5. The degree of overlap with the position area of the candidate frame label of the second target object 702 in the pseudo label image 83 is also 0.5, so the average value is (0.5+0.5+1)/3=2/3. That is, the confidence of the candidate box label of the second target object 702 is 2/3.
需要说明的是,在确定每一个目标对象对应的类别标签的置信度时,是根据该目标对象在每一轮目标检测得到的类别标签确定的,而在确定每一个目标对象对应的候选框标签的置信度时,则是根据该目标对象在每一轮目标检测得到的候选框标签确定的。It should be noted that when determining the confidence of the category label corresponding to each target object, it is determined based on the category label obtained by the target object in each round of target detection, and when determining the candidate box label corresponding to each target object The confidence level is determined based on the candidate box label obtained by the target object in each round of target detection.
本申请实施例通过利用每一个目标对象在每一轮目标检测得到的伪标签进行对比,可以快速的确定出每一个目标对象对应的伪标签的置信度,基于该置信度作为该伪标签的可靠性,从而有效的筛除出无标签样本图像中存在的伪标签噪音,提高筛选效率。By comparing the pseudo labels obtained by each target object in each round of target detection, the embodiment of the present application can quickly determine the confidence of the pseudo label corresponding to each target object, and use the confidence as the reliability of the pseudo label. property, thereby effectively filtering out the false label noise existing in unlabeled sample images and improving screening efficiency.
进一步的,在另一些实施例中,如果目标对象的伪标签只包括有类别标签,则只需要将该目标对象在每一轮目标检测得到的类别标签互相进行对比,就可以确定出该目标对象对应的伪标签的置信度。Furthermore, in other embodiments, if the pseudo-label of the target object only includes a category label, the target object can be determined by comparing the category labels obtained by the target object in each round of target detection with each other. The confidence of the corresponding pseudo label.
在本实施例中,对于某张给定的无标签样本图像,对于所有轮次检出目标对象的类别标签逐一判断该目标对象的类别标签的可靠性。对于第e轮目标检测,给定某张无标签样本图像xu,检出m个目标对象的类别标签表示为在第e+n个轮次中,对于无标签样本图片xu,检出m个目标对象的类别标签表示为比对这n+1个轮次的类别标签,统计每个目标对象的类别标签的变化其中,m表示无标签样本图像xu中的第m个目标对象,m为大于或等于1的正整数,e为大于或等于1的正整数,n为大于或等于1的正整数。In this embodiment, for a given unlabeled sample image, the reliability of the category labels of the target objects is determined one by one for all rounds of detected target objects. For the e-th round of target detection, given a certain unlabeled sample image x u , the category labels of the detected m target objects are expressed as In the e+nth round, for the unlabeled sample image x u , the category labels of the m target objects detected are expressed as Compare the category labels of these n+1 rounds and count the changes in the category labels of each target object. Among them, m represents the m-th target object in the unlabeled sample image x u , m is a positive integer greater than or equal to 1, e is a positive integer greater than or equal to 1, and n is a positive integer greater than or equal to 1.
本申请实施例通过对比每一轮目标检测得到的目标的类别标签,可以确定出该类别标签的相似度,相似度越高,则该类别标签的可靠性越高,基于此可以准确的筛选出哪些无标签样本图像可以作为后续学生模型的训练样本,提高训练样本的可靠性。The embodiment of the present application can determine the similarity of the category label by comparing the category labels of the targets obtained in each round of target detection. The higher the similarity, the higher the reliability of the category label. Based on this, it can be accurately screened out Which unlabeled sample images can be used as training samples for subsequent student models to improve the reliability of training samples.
在另一些实施例中,如果目标对象的伪标签只包括有候选框标签,则只需要基于每一轮目标检测得到的伪标签图像中该目标对象对应的候选框标签,就可以确定出该目标对象对应的伪标签的置信度。具体的,可以根据每一轮目标检测得到的目标对标的候选框标签,确定每一轮目标检测得到的目标对象在无标签样本图像中的位置,获取每一轮目标检测得到的目标对象在无标签样本图像中的位置重合度;根据重合度,获取目标对象的候选框标签的置信度。In other embodiments, if the pseudo label of the target object only includes the candidate box label, the target can be determined based on the candidate box label corresponding to the target object in the pseudo label image obtained in each round of target detection. The confidence of the pseudo-label corresponding to the object. Specifically, the position of the target object obtained in each round of target detection in the unlabeled sample image can be determined based on the candidate frame label of the target obtained in each round of target detection, and the position of the target object obtained in each round of target detection in the unlabeled sample image can be obtained. The position coincidence degree in the label sample image; based on the coincidence degree, the confidence of the candidate box label of the target object is obtained.
在本实施例中,对于某张给定的无标签样本图像,对于所有轮次检出目标对象的候选框标签,获取目标对象在每一轮目标检测中在无标签样本图像中的位置,基于目标对象在每一轮目标检测中在无标签样本图像中的位置逐一判断该目标对象的候选框伪标签的置信度。具体的,对于第e个轮次,给定某张无标签样本图像xu,检出m个目标对象对应的候选框标签表示为:
In this embodiment, for a given unlabeled sample image, for all rounds of detected candidate box labels of the target object, the position of the target object in the unlabeled sample image in each round of target detection is obtained, based on In each round of target detection, the position of the target object in the unlabeled sample image is used to determine the confidence of the candidate box pseudo-label of the target object one by one. Specifically, for the e-th round, given a certain unlabeled sample image x u , the candidate box labels corresponding to the detected m target objects are expressed as:
在第e+n个轮次中,对于无标签样本图片xu,检出m个目标对象对应的候选框标签表示为:
In the e+nth round, for the unlabeled sample image x u , the candidate box labels corresponding to the detected m target objects are expressed as:
上式中,m表示无标签样本图像中第m个目标对象,表示第m个目标对象在第e轮目标检测时目标对象的候选框的四个边角坐标。表示第m个目标对象在第e+n轮目标检测时目标对象的候选框的四个边角坐标。比对这n+1个轮次的检出结果,计算每两个候选区域间的交并比并求取均值,即可确定出每个目标对象对应的 候选框标签的置信度,其中,m表示无标签样本图像xu中的第m个目标对象,m为大于或等于1的正整数,e为大于或等于1的正整数,n为大于或等于1的正整数。In the above formula, m represents the m-th target object in the unlabeled sample image, Indicates the four corner coordinates of the candidate box of the m-th target object in the e-th round of target detection. Indicates the four corner coordinates of the candidate box of the m-th target object in the e+n-th round of target detection. Compare the detection results of these n+1 rounds, calculate the intersection ratio between each two candidate areas and calculate the average, then you can determine the corresponding location of each target object. The confidence of the candidate box label, where m represents the m-th target object in the unlabeled sample image x u , m is a positive integer greater than or equal to 1, e is a positive integer greater than or equal to 1, and n is greater than or equal to A positive integer of 1.
在另一些实施例中,当目标对象的伪标签包括有类别标签以及候选框标签时,在进行目标无标签样本图像提取时,具体可以通过如下步骤实现:将图像数据集中的第一无标签样本图像确定为目标无标签样本图。其中,第一无标签样本图像为无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第一类别标签阈值,以及至少一个目标对象对应的候选框标签的置信度大于预设第一候选框标签阈值的图像。In other embodiments, when the pseudo-label of the target object includes a category label and a candidate box label, when extracting the target unlabeled sample image, the following steps can be implemented: extract the first unlabeled sample in the image data set The image is determined as the target unlabeled sample image. Wherein, the first unlabeled sample image is that the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset first category label threshold, and the candidate box label corresponding to at least one target object is greater than the preset first category label threshold. Images with confidence greater than the preset first candidate box label threshold.
在本实施例中,目标无标签样本图像在用于学生模型的训练时,可以先通过教师模型进行预测,确定出目标无标签样本图像中的伪标签(这些伪标签经过前述的验证,可靠性较高),然后再参与到学生模型的训练中。In this embodiment, when the target unlabeled sample image is used for training the student model, the teacher model can first be used to predict the target unlabeled sample image to determine the pseudo labels in the target unlabeled sample image (these pseudo labels have undergone the aforementioned verification, and their reliability higher), and then participate in the training of the student model.
其中,可以设置两个阈值(即预设第一类别标签阈值和预设第一候选框标签阈值),例如预设第一类别标签阈值可以取值为0.8,预设第一候选框标签阈值可以取值为0.7。在图像数据集中,如果存在有一张无标签样本图像X且无标签样本图像X中的任意一个目标对象对应的类别标签的置信度大于0.8且候选框标签的置信度大于0.7,则该无标签样本图像X可以作为目标无标签样本图像。Among them, two thresholds can be set (ie, the preset first category label threshold and the preset first candidate box label threshold). For example, the preset first category label threshold can be 0.8, and the preset first candidate box label threshold can be The value is 0.7. In the image data set, if there is an unlabeled sample image X and the confidence of the category label corresponding to any target object in the unlabeled sample image X is greater than 0.8 and the confidence of the candidate box label is greater than 0.7, then the unlabeled sample Image X can be used as the target unlabeled sample image.
示例性的,无标签样本图像X中可以包括有两个目标对象(例如“人”和“狗”),目标对象“人”对应的类别标签的置信度为0.7,目标对象“人”对应的候选框标签的置信度为0.8,此时目标对象“人”不满足上述第一类别标签阈值。而如果目标对象“狗”对应的类别标签的置信度为0.9,目标对象“狗”对应的候选框标签的置信度为0.8,即目标对象“狗”满足上述第一类别标签阈值和第一候选框标签阈值,则该无标签样本图像X可以作为目标无标签样本图像。For example, the unlabeled sample image The confidence of the candidate box label is 0.8. At this time, the target object "person" does not meet the above-mentioned first category label threshold. And if the confidence of the category label corresponding to the target object "dog" is 0.9, and the confidence of the candidate box label corresponding to the target object "dog" is 0.8, that is, the target object "dog" satisfies the above first category label threshold and the first candidate box label threshold, then the unlabeled sample image X can be used as the target unlabeled sample image.
在另一些实施方式中,也可以从图像数据集中筛选出第二无标签样本图像;然后从第二无标签样本图像中筛选出第三无标签样本图像,将第三无标签样本图像确定为目标无标签样本图像。In other embodiments, the second unlabeled sample image can also be screened out from the image data set; then the third unlabeled sample image can be screened out from the second unlabeled sample image, and the third unlabeled sample image can be determined as the target. Unlabeled sample images.
其中,第二无标签样本图像为无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的无标签样本图像,第三无标签样本图像为第二无标签样本图像中的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像。The second unlabeled sample image is an unlabeled sample image in which the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset second category label threshold, and the third unlabeled sample image is An image in which the confidence of the candidate box label corresponding to at least one target object among all target objects in the second unlabeled sample image is greater than the preset second candidate box label threshold.
在本实施例中,以图像数据集中包括有无标签样本图像X和无标签样本图像X1为例,无标签样本图像X和无标签样本图像X1中目标对象的数量不受限制,例如无标签样本图像X包括有两个目标对象(例如“人”和“狗”),无标签样本图像X1包括一个目标对象(例如“花”)。当无标签样本图像X中的目标对象“人”对应的类别标签的置信度大于预设第二类别标签阈值,和/或无标签样本图像X中的目标对象“狗”对应的类别标签的置信度大于预设第二类别标签阈值时,则无标签样本图像X就可以作为第二无标签样本图像。In this embodiment, taking the image data set including unlabeled sample image X and unlabeled sample image X1 as an example, the number of target objects in the unlabeled sample image X and unlabeled sample image X1 is not limited, for example, unlabeled samples The image X includes two target objects (such as "person" and "dog"), and the unlabeled sample image X1 includes one target object (such as "flower"). When the confidence of the category label corresponding to the target object "human" in the unlabeled sample image X is greater than the preset second category label threshold, and/or the confidence of the category label corresponding to the target object "dog" in the unlabeled sample image X When the degree is greater than the preset second category label threshold, the unlabeled sample image X can be used as the second unlabeled sample image.
进一步的,如果无标签样本图像X中的目标对象“人”对应的候选框标签的置信度也大于预设第二候选框阈值;或者无标签样本图像X中的目标对象“狗”对应的候选框标签的置信度也大于预设第二候选框阈值;或者无标签样本图像X中的目标对象“人” 对应的候选框标签的置信度、目标对象“狗”对应的候选框标签的置信度均大于预设第二候选框阈值时,则该无标签样本图像X将作为目标无标签样本图像。Further, if the confidence of the candidate box label corresponding to the target object "person" in the unlabeled sample image X is also greater than the preset second candidate box threshold; or the candidate corresponding to the target object "dog" in the unlabeled sample image X The confidence of the box label is also greater than the preset second candidate box threshold; or the target object "person" in the unlabeled sample image X When the confidence of the corresponding candidate box label and the confidence of the candidate box label corresponding to the target object "dog" are both greater than the preset second candidate box threshold, then the unlabeled sample image X will be used as the target unlabeled sample image.
本申请实施例通过设置类别标签阈值和候选框标签阈值来对无标签样本图像进行筛选,能够筛选出可靠性更好的无标签样本图像,减少目标无标签样本图像中的伪标签噪音,提高样本提取的准确性。The embodiment of this application filters unlabeled sample images by setting category label thresholds and candidate box label thresholds, which can screen out unlabeled sample images with better reliability, reduce pseudo-label noise in target unlabeled sample images, and improve sample quality. Extraction accuracy.
图9为本申请实施例提供的模型训练方法的流程示意图,该方法可以应用于计算机设备中。如图9所示,该方法具体可以包括如下步骤:步骤S901,获取无标签样本图像和有标签样本图像。无标签样本图像的数量大于有标签样本图像的数量。步骤S902,对无标签样本图像进行N轮目标检测,获取每一轮目标检测得到的目标对象对应的伪标签。伪标签包括类别标签和候选框标签中的至少一种,N为大于1的正整数。步骤S903,根据每一轮目标检测得到的目标对象对应的伪标签,在无标签样本图像中提取得到目标无标签样本图像。步骤S904,根据目标样本图像和有标签样本图像,对学生模型进行训练。其中,训练后得到的学生模型用于对教师模型进行更新。Figure 9 is a schematic flowchart of a model training method provided by an embodiment of the present application. This method can be applied to computer equipment. As shown in Figure 9, the method may specifically include the following steps: Step S901, obtain unlabeled sample images and labeled sample images. The number of unlabeled sample images is greater than the number of labeled sample images. Step S902: Perform N rounds of target detection on the unlabeled sample image, and obtain pseudo labels corresponding to the target objects obtained in each round of target detection. Pseudo labels include at least one of category labels and candidate box labels, and N is a positive integer greater than 1. Step S903: Based on the pseudo labels corresponding to the target objects obtained in each round of target detection, extract the target unlabeled sample images from the unlabeled sample images. Step S904: Train the student model based on the target sample image and the labeled sample image. Among them, the student model obtained after training is used to update the teacher model.
在本实施例中,在对学生模型进行训练前,所需要的样本包括带有真实标签的有标签样本图像和带有伪标签的目标无标签样本图像。其中,有标签样本图像的数量少于带有伪标签的目标无标签样本图像。同时,还可以采用强增强方法(操作)对目标无标签样本图像进行强增强,其中,强增强方法具体可以包括颜色变换、随机消除、颜色填充等方法。在训练学生模型时计算整体训练损失,损失函数具体如下:


In this embodiment, before training the student model, the required samples include labeled sample images with real labels and target unlabeled sample images with pseudo labels. Among them, the number of labeled sample images is less than the target unlabeled sample images with pseudo labels. At the same time, a strong enhancement method (operation) can also be used to strongly enhance the target unlabeled sample image. The strong enhancement method can specifically include color transformation, random elimination, color filling and other methods. The overall training loss is calculated when training the student model. The loss function is as follows:


上式中,Nl为这一批次中有标签样本图像的数量,Nu为目标无标签样本图像的数量,表示第b个有标签样本图像,表示第b个有标签样本图像对应的标签,而表示第b个目标无标签样本图像,表示第b个目标无标签样本图的伪标签。表示分类损失,表示边界框回归损失,λu是一个超参数,用于平衡无监督损失的权重。其中,在完成对学生模型的训练之后,可以用EMA策略更新教师模型的参数。In the above formula, N l is the number of labeled sample images in this batch, N u is the number of target unlabeled sample images, Represents the b-th labeled sample image, represents the label corresponding to the b-th labeled sample image, and Represents the b-th target unlabeled sample image, Represents the pseudo label of the b-th target unlabeled sample image. represents the classification loss, represents the bounding box regression loss, and λ u is a hyperparameter used to balance the weight of the unsupervised loss. Among them, after completing the training of the student model, the EMA strategy can be used to update the parameters of the teacher model.
其中,在步骤S901-步骤S904中涉及到的目标无标签样本图像的提取可以参见上述实施例的说明,在此不再赘述。即,可以执行步骤S201-S204或步骤S301-S304来提取目标无标签样本图像,并结合获取的有标签样本图像一起来训练学生模型。需要说明的是,本申请实施例对提取目标无标签样本图像和获取有标签样本图像的先后顺序并不做限制。For the extraction of the target unlabeled sample image involved in steps S901 to S904, please refer to the description of the above embodiment, and will not be described again here. That is, steps S201-S204 or steps S301-S304 can be executed to extract target unlabeled sample images, and the student model can be trained together with the obtained labeled sample images. It should be noted that the embodiment of the present application does not limit the order in which the target unlabeled sample image is extracted and the labeled sample image is acquired.
本申请实施例通过使用携带有可靠性高的伪标签的无标签样本图像,同时结合少量的有标签样本图像进行训练,能够提高训练得到的学生模型的目标检测性能。The embodiments of this application can improve the target detection performance of the trained student model by using unlabeled sample images carrying highly reliable pseudo-labels and combining them with a small number of labeled sample images for training.
图10为本申请实施例提供的端到端半监督目标检测框架训练示意图。如图10所示,无标签样本图像分别通过强增强方法和弱增强方法进入至教师模型和学生模型,通过教师模型和学生模型分别产生预测结果。其中,强增强方法包括有颜色变换、随机消除、颜 色填充,弱增强方法包括有图像的剪切、旋转/反射/翻转变换、缩放变换、平移变换、尺度变换等。Figure 10 is a schematic diagram of the end-to-end semi-supervised target detection framework training provided by the embodiment of this application. As shown in Figure 10, unlabeled sample images enter the teacher model and the student model through the strong enhancement method and the weak enhancement method respectively, and the prediction results are generated through the teacher model and the student model respectively. Among them, strong enhancement methods include color transformation, random elimination, color Color filling and weak enhancement methods include image shearing, rotation/reflection/flip transformation, scaling transformation, translation transformation, scale transformation, etc.
在本实施例中,学生模型产生的预测结果直接参与到无监督损失计算中,而教师模型产生的预测结果可以记录下来,通过教师模型对弱增强后的无标签样本图像进行多轮目标检测,得到每一轮目标检测出的预测结果,其中包括有无标签样本图像中每个目标对象对应的伪标签,然后基于每一轮目标检测得到的预测结果,计算出每个目标对象对应的伪标签的置信度,基于每个无标签样本图像中每个目标对象的伪标签的置信度,筛选出符合置信度阈值要求的无标签样本图像,联合学生模型产生的预测结果进行无监督损失计算。在完成无监督损失计算反向更新学生模型的参数之后,可以利用学生模型的参数通过EMA策略对教师模型的参数进行更新。本申请实施例提供的图像处理方法和模型训练方法,可以适用于半监督目标检测任务中,通过记录多轮次教师模型的预测结果,计算无标签样本图像中每个目标对象对应的伪标签的置信度,进而依据其置信度,筛选出存在噪声的伪标签,改进半监督目标检测方法中无标签样本图像的利用效果,进而提升半监督目标检测性能。In this embodiment, the prediction results generated by the student model are directly involved in the unsupervised loss calculation, while the prediction results generated by the teacher model can be recorded, and multiple rounds of target detection are performed on the weakly enhanced unlabeled sample images through the teacher model. Obtain the prediction results of each round of target detection, including the pseudo labels corresponding to each target object in the sample image with and without labels, and then calculate the pseudo labels corresponding to each target object based on the prediction results of each round of target detection. Based on the confidence of the pseudo-label of each target object in each unlabeled sample image, unlabeled sample images that meet the confidence threshold requirements are selected, and the prediction results generated by the student model are combined for unsupervised loss calculation. After completing the unsupervised loss calculation and reversely updating the parameters of the student model, the parameters of the student model can be used to update the parameters of the teacher model through the EMA strategy. The image processing method and model training method provided by the embodiments of this application can be applied to semi-supervised target detection tasks. By recording the prediction results of multiple rounds of teacher models, the pseudo-label corresponding to each target object in the unlabeled sample image is calculated. Confidence, and then based on its confidence, false labels with noise are screened out, improving the utilization of unlabeled sample images in semi-supervised target detection methods, thereby improving the performance of semi-supervised target detection.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are device embodiments of the present application, which can be used to execute method embodiments of the present application. For details not disclosed in the device embodiments of this application, please refer to the method embodiments of this application.
图11为本申请实施例提供的图像处理装置的结构示意图,如图11所示,该图像处理装置1100包括:图像获取模块1101、标签数据集获取模块1120、置信度确定模块1130和图像确定模块1140。其中,图像获取模块1101用于获取无标签样本图像,其中,所述无标签样本图像包括至少一个目标对象;标签数据集获取模块1120用于对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,所述伪标签数据集包括所述目标对象在每一轮目标检测中对应的伪标签,所述N为大于1的正整数;置信度确定模块1130用于根据所述伪标签数据集中的伪标签确定所述每个目标对象对应的伪标签的置信度;图像确定模块1140用于若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像。Figure 11 is a schematic structural diagram of an image processing device provided by an embodiment of the present application. As shown in Figure 11, the image processing device 1100 includes: an image acquisition module 1101, a label data set acquisition module 1120, a confidence determination module 1130 and an image determination module. 1140. Among them, the image acquisition module 1101 is used to acquire an unlabeled sample image, wherein the unlabeled sample image includes at least one target object; the label data set acquisition module 1120 is used to perform N rounds of target detection on the unlabeled sample image, and obtain The pseudo label data set corresponding to each target object in the unlabeled sample image. The pseudo label data set includes the pseudo label corresponding to the target object in each round of target detection. The N is a positive number greater than 1. integer; the confidence determination module 1130 is used to determine the confidence of the pseudo label corresponding to each target object according to the pseudo label in the pseudo label data set; the image determination module 1140 is used to determine if there is at least one in the unlabeled sample image. If the confidence of the pseudo label corresponding to a target object is greater than the preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image.
在一些实施例中,图像获取模块1100具体可以用于:获取图像数据集,所述图像数据集包括至少一个无标签样本图像,每个无标签样本图像包括至少一个目标对象。In some embodiments, the image acquisition module 1100 may be specifically configured to: acquire an image data set, where the image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object.
在一些实施例中,标签数据集获取模块1120具体可以用于:针对所述图像数据集中的所述每个无标签样本图像,对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集。In some embodiments, the label data set acquisition module 1120 may be configured to: for each unlabeled sample image in the image data set, perform N rounds of target detection on the unlabeled sample image to obtain the unlabeled sample image. Label the pseudo-label dataset corresponding to each target object in the sample image.
在一些实施例中,所述伪标签包括类别标签;标签数据集获取模块1120具体可以用于:对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别;根据所述无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别,确定所述无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签;根据所述无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签,确定所述无标签样本图像中的每个目标对象对应的伪标签数据集。 In some embodiments, the pseudo labels include category labels; the label data set acquisition module 1120 can be specifically configured to perform N rounds of target detection on the unlabeled sample images to obtain each target in the unlabeled sample images. The attribute category of the object determined in each round of target detection; according to the attribute category of each target object in the unlabeled sample image determined in each round of target detection, determine each attribute category in the unlabeled sample image. The category labels corresponding to each target object in each round of target detection; according to the category labels corresponding to each target object in each round of target detection in the unlabeled sample image, determine each target object in the unlabeled sample image. A pseudo-label data set corresponding to each target object.
在一些实施例中,所述伪标签还包括候选框标签,标签数据集获取模块1120还可以用于:对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象在每一轮目标检测中在所述无标签样本图像中的位置信息;根据所述每个无标签样本图像中的每个目标对象在每一轮目标检测中在所述无标签样本图像中的位置信息,确定每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的候选框标签;根据所述无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签和候选框标签,确定所述无标签样本图像中的每个目标对象对应的伪标签数据集。In some embodiments, the pseudo labels also include candidate box labels, and the label data set acquisition module 1120 can also be used to: perform N rounds of target detection on the unlabeled sample images to obtain each of the unlabeled sample images. The position information of each target object in the unlabeled sample image in each round of target detection; according to the position information of each target object in each unlabeled sample image in the unlabeled sample in each round of target detection Based on the position information in the image, determine the candidate box label corresponding to each target object in each unlabeled sample image in each round of target detection; according to the target object in each unlabeled sample image in each round of target The corresponding category labels and candidate box labels in the detection are used to determine the pseudo label data set corresponding to each target object in the unlabeled sample image.
在一些实施例中,置信度确定模块1130具体可以用于:根据所述伪标签数据集中的所述每个目标对象的所有伪标签,确定所述目标对象的类别标签对应的置信度以及所述目标对象的候选框标签对应的置信度。In some embodiments, the confidence determination module 1130 may be configured to: based on all pseudo labels of each target object in the pseudo label data set, determine the confidence corresponding to the category label of the target object and the The confidence corresponding to the candidate box label of the target object.
在一些实施例中,图像确定模块1140具体可以用于:将所述图像数据集中的第一无标签样本图像确定为目标无标签样本图像,其中,所述第一无标签样本图像为每个无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第一类别标签阈值,以及所述至少一个目标对象对应的候选框标签的置信度大于预设第一候选框标签阈值的图像。In some embodiments, the image determination module 1140 may be configured to: determine the first unlabeled sample image in the image data set as the target unlabeled sample image, wherein the first unlabeled sample image is each unlabeled sample image. The confidence of the category label corresponding to at least one target object among all target objects in the label sample image is greater than the preset first category label threshold, and the confidence of the candidate box label corresponding to the at least one target object is greater than the preset first candidate Image with box label thresholding.
在一些实施例中,图像确定模块1140具体可以用于:从所述图像数据集中筛选出第二无标签样本图像,所述第二无标签样本图像为无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的无标签样本图像;从所述第二无标签样本图像中筛选出第三无标签样本图像,将所述第三无标签样本图像确定为所述目标无标签样本图像,所述第三无标签样本图像为第二无标签样本图像中的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像。In some embodiments, the image determination module 1140 may be configured to: filter out a second unlabeled sample image from the image data set, where the second unlabeled sample image is at least one of all target objects in the unlabeled sample image. An unlabeled sample image whose category label corresponding to a target object has a confidence greater than the preset second category label threshold; filter out a third unlabeled sample image from the second unlabeled sample image, and add the third unlabeled sample image to the The sample image is determined to be the target unlabeled sample image, and the third unlabeled sample image is a candidate box label whose confidence level corresponding to at least one target object among all target objects in the second unlabeled sample image is greater than the preset second Image of candidate box label threshold.
在一些实施例中,图像确定模块1140具体可以用于:从所述图像数据集中筛选出第四无标签样本图像,所述第四无标签样本图像为无标签样本图像对应的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像;从所述第四无标签样本图像中筛选出第五无标签样图像,将所述第五无标签样本图像确定为所述目标无标签样本图像,所述第五无标签样本图像为所述第四无标签样本图像对应的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的图像。In some embodiments, the image determination module 1140 may be configured to: filter out a fourth unlabeled sample image from the image data set, where the fourth unlabeled sample image is at least one of all target objects corresponding to the unlabeled sample image. An image in which the confidence of the candidate box label corresponding to the target object is greater than the preset second candidate box label threshold; filter out the fifth unlabeled sample image from the fourth unlabeled sample image, and use the fifth unlabeled sample The image is determined to be the target unlabeled sample image, and the fifth unlabeled sample image is a category label corresponding to at least one target object among all target objects corresponding to the fourth unlabeled sample image. The confidence level of the category label is greater than the preset second Image with category label threshold.
在一些实施例中,图像获取模块1101还用于:获取有标签样本图像;In some embodiments, the image acquisition module 1101 is also used to: acquire labeled sample images;
图像处理1100装置还包括:训练模块(未示出),用于根据所述目标样本图像和所述有标签样本图像,对学生模型进行训练。The image processing 1100 device further includes: a training module (not shown), configured to train the student model according to the target sample image and the labeled sample image.
在一些实施例中,训练模块具体可以用于:利用教师模型对所述无标签样本图像进行N轮目标检测;训练模块还可以用于利用训练后的学生模型,基于指数移动平均EMA策略对所述教师模型的参数进行更新。In some embodiments, the training module can be specifically used to: use the teacher model to perform N rounds of target detection on the unlabeled sample images; the training module can also be used to use the trained student model to detect the target based on the exponential moving average EMA strategy. The parameters of the teacher model are updated.
在一些实施例中,训练模块还可以用于:采用强增强操作对所述目标无标签样本图像进行强增强,所述强增强操作包括以下至少一项操作:颜色变换、随机消除、颜色填充。In some embodiments, the training module can also be used to perform strong enhancement on the target unlabeled sample image using a strong enhancement operation, where the strong enhancement operation includes at least one of the following operations: color transformation, random elimination, and color filling.
本申请实施例提供的装置,可用于执行上述图2所示实施例中的方法,其实现原理和技术效果类似,在此不再赘述。 The device provided by the embodiment of the present application can be used to perform the method in the embodiment shown in Figure 2. Its implementation principles and technical effects are similar and will not be described again here.
图12为本申请实施例提供的图像处理装置的结构示意图,如图12所示,该图像处理装置1200包括图像数据集获取模块1210、标签数据集获取模块1220、置信度确定模块1230和图像确定模块1240。其中,图像数据集获取模块1210用于获取图像数据集。图像数据集包括至少一个无标签样本图像,每个无标签样本图像包括至少一个目标对象。标签数据集获取模块1220用于对图像数据集中每个无标签样本图像进行N轮目标检测,得到每个无标签样本图像中的每个目标对象对应的伪标签数据集。伪标签数据集包括目标对象在每一轮目标检测中对应的伪标签,伪标签至少包括类别标签,N为大于1的正整数。置信度确定模块1230用于根据伪标签数据集中的伪标签确定每个目标对象对应的伪标签的置信度。图像确定模块1240用于根据置信度,在图像数据集中确定目标无标签样本图像。其中,目标无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值。Figure 12 is a schematic structural diagram of an image processing device provided by an embodiment of the present application. As shown in Figure 12, the image processing device 1200 includes an image data set acquisition module 1210, a label data set acquisition module 1220, a confidence determination module 1230 and an image determination module. Module 1240. Among them, the image data set acquisition module 1210 is used to acquire an image data set. The image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object. The label data set acquisition module 1220 is used to perform N rounds of target detection on each unlabeled sample image in the image data set, and obtain a pseudo label data set corresponding to each target object in each unlabeled sample image. The pseudo-label data set includes pseudo-labels corresponding to the target object in each round of target detection. The pseudo-labels at least include category labels, and N is a positive integer greater than 1. The confidence determination module 1230 is configured to determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set. The image determination module 1240 is used to determine the target unlabeled sample image in the image data set according to the confidence level. Wherein, the confidence that there is at least one pseudo-label corresponding to the target object in the target unlabeled sample image is greater than a preset threshold.
在一些实施例中,上述标签数据集获取模块1220具体可以用于:对所有无标签样本图像进行N轮目标检测,得到每个无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别,确定每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签,确定每个无标签样本图像中的每个目标对象对应的伪标签数据集。In some embodiments, the above-mentioned label data set acquisition module 1220 can be specifically configured to: perform N rounds of target detection on all unlabeled sample images, and obtain the results of each target object in each unlabeled sample image in each round of target detection. Determined attribute categories; Based on the attribute categories determined in each round of target detection for each target object in each unlabeled sample image, determine the target status of each target object in each unlabeled sample image in each round. The corresponding category label in the detection; according to the category label corresponding to each target object in each unlabeled sample image in each round of target detection, determine the pseudo label data corresponding to each target object in each unlabeled sample image. set.
在一些实施例中,若伪标签还包括候选框标签,则上述标签数据集获取模块1220具体还可以用于:对所有无标签样本图像进行N轮目标检测,得到每个无标签样本图像中的每个目标对象在每一轮目标检测中在无标签样本图像中的位置信息;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中在无标签样本图像中的位置信息,确定每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的候选框标签;根据每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签和候选框标签,确定每个无标签样本图像中的每个目标对象对应的伪标签数据集。In some embodiments, if the pseudo labels also include candidate box labels, the above-mentioned label data set acquisition module 1220 can also be used to: perform N rounds of target detection on all unlabeled sample images, and obtain the target in each unlabeled sample image. The position information of each target object in the unlabeled sample image in each round of target detection; according to the position information of each target object in the unlabeled sample image in each round of target detection , determine the candidate box label corresponding to each target object in each unlabeled sample image in each round of target detection; according to the corresponding category of each target object in each unlabeled sample image in each round of target detection Labels and candidate box labels, determine the pseudo-label data set corresponding to each target object in each unlabeled sample image.
在一些实施例中,置信度确定模块1230具体可以用于:根据伪标签数据集中的每个目标对象的所有伪标签,确定目标对象的类别标签对应的置信度以及目标对象的候选框标签对应的置信度。In some embodiments, the confidence determination module 1230 may be configured to: determine the confidence corresponding to the category label of the target object and the candidate box label corresponding to the target object based on all pseudo labels of each target object in the pseudo label data set. Confidence.
在一些实施例中,图像确定模块1240具体可以用于:将图像数据集中的第一无标签样本图像确定为目标无标签样本图像。其中,第一无标签样本图像为无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第一类别标签阈值,以及至少一个目标对象对应的候选框标签的置信度大于预设第一候选框标签阈值的图像。In some embodiments, the image determination module 1240 may be specifically configured to determine the first unlabeled sample image in the image data set as the target unlabeled sample image. Wherein, the first unlabeled sample image is that the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset first category label threshold, and the candidate box label corresponding to at least one target object is greater than the preset first category label threshold. Images with confidence greater than the preset first candidate box label threshold.
在一些实施例中,图像确定模块1240具体可以用于:从图像数据集中筛选出第二无标签样本图像;从第二无标签样本图像中筛选出第三无标签样本图像,将第三无标签样本图像确定为目标无标签样本图像。第二无标签样本图像为无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的无标签样本图像;第三无标签样本图像为第二无标签样本图像中的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像。In some embodiments, the image determination module 1240 can be specifically configured to: filter out the second unlabeled sample image from the image data set; filter out the third unlabeled sample image from the second unlabeled sample image, and combine the third unlabeled sample image with the third unlabeled sample image. The sample image is determined as the target unlabeled sample image. The second unlabeled sample image is an unlabeled sample image in which the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset second category label threshold; the third unlabeled sample image is the third unlabeled sample image. An image in which the confidence of the candidate box label corresponding to at least one target object among all target objects in the two unlabeled sample images is greater than the preset second candidate box label threshold.
在一些实施例中,图像确定模块1240具体可以用于:从图像数据集中筛选出第四无标签样本图像;从第四无标签样本图像中筛选出第五无标签样图像,将第五无标签样本图 像确定为目标无标签样本图像。其中,第四无标签样本图像为无标签样本图像对应的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像,第五无标签样本图像为第四无标签样本图像对应的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的图像。In some embodiments, the image determination module 1240 can be specifically configured to: filter out the fourth unlabeled sample image from the image data set; filter out the fifth unlabeled sample image from the fourth unlabeled sample image, and combine the fifth unlabeled sample image with the fifth unlabeled sample image. Sample graph The image is determined as the target unlabeled sample image. Among them, the fourth unlabeled sample image is an image in which the confidence of the candidate box label corresponding to at least one target object among all target objects corresponding to the unlabeled sample image is greater than the preset second candidate box label threshold, and the fifth unlabeled sample image is The confidence level of the category label corresponding to at least one target object among all target objects corresponding to the fourth unlabeled sample image is greater than the preset second category label threshold.
本申请实施例提供的装置,可用于执行上述图3所示实施例中的方法,其实现原理和技术效果类似,在此不再赘述。The device provided by the embodiment of the present application can be used to execute the method in the embodiment shown in Figure 3. Its implementation principles and technical effects are similar and will not be described again here.
图13为本申请实施例提供的模型训练装置的结构示意图,该模型训练装置可以集成在计算机设备中,也可以独立于计算机设备且与计算机设备协同实现本方案。如图13所示,该模型训练装置1300包括图像获取模块1310、目标检测模块1320、图像获取模块1330和模型训练模型1340。其中,图像获取模块1310用于获取无标签样本图像和有标签样本图像。无标签样本图像的数量大于有标签样本图像。目标检测模块1320用于对无标签样本图像进行N轮目标检测,获取每一轮目标检测得到的目标对象对应的伪标签。伪标签包括类别标签和候选框标签中的至少一种,N为大于1的正整数。图像获取模块1330用于根据每一轮目标检测得到的目标对象对应的伪标签,在无标签样本图像中提取得到目标无标签样本图像。模型训练模块1340用于根据目标样本图和有标签样本图像,对学生模型进行训练。其中,训练后得到的学生模型用于对教师模型进行更新。Figure 13 is a schematic structural diagram of a model training device provided by an embodiment of the present application. The model training device can be integrated in a computer device, or can be independent of the computer device and collaborate with the computer device to implement this solution. As shown in Figure 13, the model training device 1300 includes an image acquisition module 1310, a target detection module 1320, an image acquisition module 1330 and a model training model 1340. Among them, the image acquisition module 1310 is used to acquire unlabeled sample images and labeled sample images. The number of unlabeled sample images is larger than that of labeled sample images. The target detection module 1320 is used to perform N rounds of target detection on unlabeled sample images, and obtain pseudo labels corresponding to the target objects obtained in each round of target detection. Pseudo labels include at least one of category labels and candidate box labels, and N is a positive integer greater than 1. The image acquisition module 1330 is used to extract the target unlabeled sample image from the unlabeled sample image according to the pseudo label corresponding to the target object obtained in each round of target detection. The model training module 1340 is used to train the student model based on the target sample image and the labeled sample image. Among them, the student model obtained after training is used to update the teacher model.
本申请实施例提供的装置,可用于执行上述图9所示实施例中的方法,其实现原理和技术效果类似,在此不再赘述。The device provided by the embodiment of the present application can be used to perform the method in the embodiment shown in Figure 9. Its implementation principles and technical effects are similar and will not be described again here.
需要说明的是,应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,图像获取模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上图像获取模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。It should be noted that it should be understood that the division of each module of the above device is only a division of logical functions. In actual implementation, they can be fully or partially integrated into a physical entity, or they can also be physically separated. And these modules can all be implemented in the form of software calling through processing components; they can also all be implemented in the form of hardware; some modules can also be implemented in the form of software calling through processing components, and some modules can be implemented in the form of hardware. For example, the image acquisition module can be a separate processing element, or can be integrated into a chip of the above-mentioned device. In addition, it can also be stored in the memory of the above-mentioned device in the form of program code, and processed by one of the above-mentioned devices. The component calls and executes the functions of the above image acquisition module. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together or implemented independently. The processing element here may be an integrated circuit with signal processing capabilities. During the implementation process, each step of the above method or each of the above modules can be completed by instructions in the form of hardware integrated logic circuits or software in the processor element.
图14为本申请实施例提供的计算机设备的结构示意图。如图14所示,该计算机设备1400包括:至少一个处理器1410、存储器1420、总线1430及通信接口1440。其中:处理器1410、通信接口1430以及存储器1420通过总线1430完成相互间的通信。通信接口1430用于与其它设备进行通信。该通信接口包括用于进行数据传输的通信接口以及用于进行人机交互的显示界面或者操作界面等。处理器1410用于执行存储器1420中存储的计算机执行指令,具体可以执行上述图2和图3对应实施例中所描述的图像处理方法中的相关步骤。处理器可能是中央处理器,或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。存储器1420用于 存放计算机执行指令。存储器可能包含高速RAM存储器,也可能还包括非易失性存储器,例如至少一个磁盘存储器。Figure 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application. As shown in Figure 14, the computer device 1400 includes: at least one processor 1410, a memory 1420, a bus 1430 and a communication interface 1440. Among them: the processor 1410, the communication interface 1430 and the memory 1420 complete communication with each other through the bus 1430. Communication interface 1430 is used to communicate with other devices. The communication interface includes a communication interface for data transmission and a display interface or operation interface for human-computer interaction. The processor 1410 is configured to execute computer execution instructions stored in the memory 1420. Specifically, the processor 1410 can execute relevant steps in the image processing method described in the corresponding embodiments of FIG. 2 and FIG. 3. The processor may be a central processing unit, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computer device may be the same type of processor, such as one or more CPUs; or they may be different types of processors, such as one or more CPUs and one or more ASICs. Memory 1420 is used for Stores computer execution instructions. The memory may include high-speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
图15为本申请另一实施例提供的计算机设备的结构示意图。如图15所示,该计算机设备1500包括:至少一个处理器1510、存储器1520、总线1530及通信接口1540。其中:处理器1510、通信接口1540以及存储器1520通过总线1530完成相互间的通信。通信接口1540用于与其它设备进行通信。该通信接口包括用于进行数据传输的通信接口以及用于进行人机交互的显示界面或者操作界面等。处理器1510用于执行存储器1520中存储的计算机执行指令,具体可以执行上述图9对应实施例中所描述的模型训练方法中的相关步骤。处理器可能是中央处理器,或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个中央处理器(Central Processing Unit,CPU);也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。存储器1520用于存放计算机执行指令。存储器可能包含高速随机存取存储器(Random Access Memory,RAM)存储器,也可能还包括非易失性存储器,例如至少一个磁盘存储器。Figure 15 is a schematic structural diagram of a computer device provided by another embodiment of the present application. As shown in Figure 15, the computer device 1500 includes: at least one processor 1510, a memory 1520, a bus 1530 and a communication interface 1540. Among them: the processor 1510, the communication interface 1540 and the memory 1520 complete communication with each other through the bus 1530. Communication interface 1540 is used to communicate with other devices. The communication interface includes a communication interface for data transmission and a display interface or operation interface for human-computer interaction. The processor 1510 is configured to execute computer execution instructions stored in the memory 1520. Specifically, the processor 1510 can execute relevant steps in the model training method described in the corresponding embodiment of FIG. 9 above. The processor may be a central processing unit, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The computer equipment includes one or more processors, which can be the same type of processor, such as one or more central processing units (Central Processing Unit, CPU); or they can be different types of processors, such as one or more CPUs and one or more ASICs. Memory 1520 is used to store computer execution instructions. The memory may include high-speed random access memory (Random Access Memory, RAM) memory, and may also include non-volatile memory, such as at least one disk memory.
本申请实施例还提供一种可读存储介质,可读存储介质中存储有计算机指令,当计算机设备的至少一个处理器执行该计算机指令时,计算机设备执行上述的各种实施方式提供的图像处理方法或模型训练方法。Embodiments of the present application also provide a readable storage medium. Computer instructions are stored in the readable storage medium. When at least one processor of the computer device executes the computer instructions, the computer device performs the image processing provided by the various embodiments described above. method or model training method.
本申请实施例还提供一种程序产品,该程序产品包括计算机指令,该计算机指令存储在可读存储介质中。计算机设备的至少一个处理器可以从可读存储介质读取该计算机指令,至少一个处理器执行该计算机指令使得计算机设备实施上述的各种实施方式提供的图像处理方法或模型训练方法。Embodiments of the present application also provide a program product. The program product includes computer instructions, and the computer instructions are stored in a readable storage medium. At least one processor of the computer device can read the computer instructions from the readable storage medium, and the at least one processor executes the computer instructions so that the computer device implements the image processing method or model training method provided by the above-mentioned various embodiments.
本申请实施例还提供一种计算机程序,该计算机程序可由计算机设备的处理器执行以实现上述的各种实施方式提供的图像处理方法或模型训练方法。An embodiment of the present application also provides a computer program, which can be executed by a processor of a computer device to implement the image processing method or model training method provided by the various embodiments above.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系;在公式中,字符“/”,表示前后关联对象是一种“相除”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中,a,b,c可以是单个,也可以是多个。In this application, "at least one" refers to one or more, and "plurality" refers to two or more. "And/or" describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally means that the related objects before and after are an "or" relationship; in the formula, the character "/" means that the related objects before and after are a "division" relationship. "At least one of the following" or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items). For example, at least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple. indivual.
可以理解的是,在本申请实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。在本申请的实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。It can be understood that the various numerical numbers involved in the embodiments of the present application are only for convenience of description and are not used to limit the scope of the embodiments of the present application. In the embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not constitute the implementation process of the embodiments of the present application. Any limitations.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进 行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or to modify some or all of the technical features. Equivalent substitutions are made; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to depart from the scope of the technical solutions of the embodiments of the present application.

Claims (17)

  1. 一种图像处理方法,包括:An image processing method including:
    获取无标签样本图像,其中,所述无标签样本图像包括至少一个目标对象;Obtaining an unlabeled sample image, wherein the unlabeled sample image includes at least one target object;
    对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,所述伪标签数据集包括所述目标对象在每一轮目标检测中对应的伪标签,所述N为大于1的正整数;Perform N rounds of target detection on the unlabeled sample image to obtain a pseudo-label data set corresponding to each target object in the unlabeled sample image. The pseudo-label data set includes the target object in each round of target detection. The corresponding pseudo label in , the N is a positive integer greater than 1;
    根据所述伪标签数据集中的伪标签确定所述每个目标对象对应的伪标签的置信度;Determine the confidence of the pseudo-label corresponding to each target object according to the pseudo-label in the pseudo-label data set;
    若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像。If the confidence level of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than the preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image.
  2. 根据权利要求1所述的方法,其中,所述获取无标签样本图像,包括:The method according to claim 1, wherein said obtaining an unlabeled sample image includes:
    获取图像数据集,所述图像数据集包括至少一个无标签样本图像,每个无标签样本图像包括至少一个目标对象。An image data set is obtained, the image data set includes at least one unlabeled sample image, and each unlabeled sample image includes at least one target object.
  3. 根据权利要求2所述的方法,其中,所述对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,包括:The method according to claim 2, wherein said performing N rounds of target detection on the unlabeled sample image to obtain a pseudo-label data set corresponding to each target object in the unlabeled sample image includes:
    针对所述图像数据集中的所述每个无标签样本图像,对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集。For each unlabeled sample image in the image data set, perform N rounds of target detection on the unlabeled sample image to obtain a pseudo-label data set corresponding to each target object in the unlabeled sample image.
  4. 根据权利要求3所述的方法,其中,所述伪标签包括类别标签;The method of claim 3, wherein the pseudo labels include category labels;
    所述对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,包括:Performing N rounds of target detection on the unlabeled sample image to obtain a pseudo-label data set corresponding to each target object in the unlabeled sample image includes:
    对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别;Perform N rounds of target detection on the unlabeled sample image to obtain the attribute category of each target object in the unlabeled sample image determined in each round of target detection;
    根据所述无标签样本图像中的每个目标对象在每一轮目标检测中确定出的属性类别,确定所述无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签;According to the attribute category of each target object in the unlabeled sample image determined in each round of target detection, the category label corresponding to each target object in the unlabeled sample image in each round of target detection is determined. ;
    根据所述无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签,确定所述无标签样本图像中的每个目标对象对应的伪标签数据集。According to the category label corresponding to each target object in the unlabeled sample image in each round of target detection, a pseudo label data set corresponding to each target object in the unlabeled sample image is determined.
  5. 根据权利要求4所述的方法,其中,所述伪标签还包括候选框标签,所述对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,还包括:The method according to claim 4, wherein the pseudo label further includes a candidate box label, and the N rounds of target detection are performed on the unlabeled sample image to obtain the corresponding target object corresponding to each target in the unlabeled sample image. The pseudo-label dataset also includes:
    对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象在每一轮目标检测中在所述无标签样本图像中的位置信息;Perform N rounds of target detection on the unlabeled sample image to obtain the position information of each target object in the unlabeled sample image in each round of target detection in the unlabeled sample image;
    根据所述每个无标签样本图像中的每个目标对象在每一轮目标检测中在所述无标签样本图像中的位置信息,确定每个无标签样本图像中的每个目标对象在每一轮目标检测中对应的候选框标签;According to the position information of each target object in each unlabeled sample image in the unlabeled sample image in each round of target detection, determine the position of each target object in each unlabeled sample image in each round of target detection. The corresponding candidate box label in round target detection;
    根据所述无标签样本图像中的每个目标对象在每一轮目标检测中对应的类别标签和候选框标签,确定所述无标签样本图像中的每个目标对象对应的伪标签数据集。According to the category labels and candidate box labels corresponding to each target object in the unlabeled sample image in each round of target detection, a pseudo label data set corresponding to each target object in the unlabeled sample image is determined.
  6. 根据权利要求5所述的方法,其中,所述根据所述伪标签数据集中的伪标签确定所述每个目标对象对应的伪标签的置信度,包括:The method according to claim 5, wherein the determining the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set includes:
    根据所述伪标签数据集中的所述每个目标对象的所有伪标签,确定所述目标对象的类别标签对应的置信度以及所述目标对象的候选框标签对应的置信度。 According to all pseudo labels of each target object in the pseudo label data set, the confidence corresponding to the category label of the target object and the confidence corresponding to the candidate box label of the target object are determined.
  7. 根据权利要求5或6所述的方法,其中,所述若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像,包括:The method according to claim 5 or 6, wherein if the confidence of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than a preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image. Labeled sample images, including:
    将所述图像数据集中的第一无标签样本图像确定为目标无标签样本图像,其中,所述第一无标签样本图像为每个无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第一类别标签阈值,以及所述至少一个目标对象对应的候选框标签的置信度大于预设第一候选框标签阈值的图像。The first unlabeled sample image in the image data set is determined as the target unlabeled sample image, wherein the first unlabeled sample image is corresponding to at least one target object among all target objects in each unlabeled sample image. Images in which the confidence of the category label is greater than the preset first category label threshold, and the confidence of the candidate box label corresponding to the at least one target object is greater than the preset first candidate box label threshold.
  8. 根据权利要求5或6所述的方法,其中,所述若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像,包括:The method according to claim 5 or 6, wherein if the confidence of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than a preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image. Labeled sample images, including:
    从所述图像数据集中筛选出第二无标签样本图像,所述第二无标签样本图像为无标签样本图像中的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的无标签样本图像;Filter out a second unlabeled sample image from the image data set, where the confidence of the category label corresponding to at least one target object among all target objects in the unlabeled sample image is greater than the preset second Unlabeled sample images with category label threshold;
    从所述第二无标签样本图像中筛选出第三无标签样本图像,将所述第三无标签样本图像确定为所述目标无标签样本图像,所述第三无标签样本图像为第二无标签样本图像中的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像。A third unlabeled sample image is screened out from the second unlabeled sample image, and the third unlabeled sample image is determined as the target unlabeled sample image, and the third unlabeled sample image is the second unlabeled sample image. An image in which the confidence of the candidate box label corresponding to at least one target object among all target objects in the label sample image is greater than the preset second candidate box label threshold.
  9. 根据权利要求5或6所述的方法,其中,所述若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像,包括:The method according to claim 5 or 6, wherein if the confidence of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than a preset threshold, the unlabeled sample image is determined to be the target unlabeled sample image. Labeled sample images, including:
    从所述图像数据集中筛选出第四无标签样本图像,所述第四无标签样本图像为无标签样本图像对应的所有目标对象中至少一个目标对象对应的候选框标签的置信度大于预设第二候选框标签阈值的图像;A fourth unlabeled sample image is screened out from the image data set. The fourth unlabeled sample image is a candidate box label corresponding to at least one target object among all target objects corresponding to the unlabeled sample image. The confidence level is greater than the preset third Image with two candidate box label thresholds;
    从所述第四无标签样本图像中筛选出第五无标签样图像,将所述第五无标签样本图像确定为所述目标无标签样本图像,所述第五无标签样本图像为所述第四无标签样本图像对应的所有目标对象中至少一个目标对象对应的类别标签的置信度大于预设第二类别标签阈值的图像。A fifth unlabeled sample image is screened out from the fourth unlabeled sample image, and the fifth unlabeled sample image is determined as the target unlabeled sample image. The fifth unlabeled sample image is the third unlabeled sample image. The confidence level of the category label corresponding to at least one target object among all target objects corresponding to the four unlabeled sample images is greater than the preset second category label threshold.
  10. 根据权利要求1-9任一项所述的方法,还包括:The method according to any one of claims 1-9, further comprising:
    获取有标签样本图像;Get labeled sample images;
    在确定所述无标签样本图像为目标无标签样本图像之后,所述方法还包括:After determining that the unlabeled sample image is the target unlabeled sample image, the method further includes:
    根据所述目标样本图像和所述有标签样本图像,对学生模型进行训练。The student model is trained according to the target sample image and the labeled sample image.
  11. 根据权利要求10所述的方法,其中,所述对所述无标签样本图像进行N轮目标检测,包括:The method according to claim 10, wherein said performing N rounds of target detection on the unlabeled sample image includes:
    利用教师模型对所述无标签样本图像进行N轮目标检测;Use the teacher model to perform N rounds of target detection on the unlabeled sample image;
    所述根据所述目标样本图像和所述有标签样本图像,对学生模型进行训练之后,还包括:After training the student model according to the target sample image and the labeled sample image, it also includes:
    利用训练后的学生模型,基于指数移动平均EMA策略对所述教师模型的参数进行更新。Using the trained student model, the parameters of the teacher model are updated based on the exponential moving average EMA strategy.
  12. 根据权利要求10或11所述的方法,其中,在对学生模型进行训练之前,所述 方法还包括:The method according to claim 10 or 11, wherein before training the student model, the Methods also include:
    采用强增强操作对所述目标无标签样本图像进行强增强,所述强增强操作包括以下至少一项操作:颜色变换、随机消除、颜色填充。The target unlabeled sample image is strongly enhanced using a strong enhancement operation, which includes at least one of the following operations: color transformation, random elimination, and color filling.
  13. 一种图像处理装置,包括:An image processing device, including:
    图像获取模块,用于获取无标签样本图像,所述无标签样本图像包括至少一个目标对象;An image acquisition module, used to acquire unlabeled sample images, where the unlabeled sample images include at least one target object;
    标签数据集获取模块,用于对所述无标签样本图像进行N轮目标检测,得到所述无标签样本图像中的每个目标对象对应的伪标签数据集,所述伪标签数据集包括所述目标对象在每一轮目标检测中对应的伪标签,所述N为大于1的正整数;A label data set acquisition module, used to perform N rounds of target detection on the unlabeled sample image, and obtain a pseudo label data set corresponding to each target object in the unlabeled sample image, where the pseudo label data set includes the The pseudo label corresponding to the target object in each round of target detection, the N is a positive integer greater than 1;
    置信度确定模块,用于根据所述伪标签数据集中的伪标签确定所述每个目标对象对应的伪标签的置信度;A confidence determination module, configured to determine the confidence of the pseudo label corresponding to each target object based on the pseudo labels in the pseudo label data set;
    图像确定模块,用于若所述无标签样本图像中存在至少一个目标对象对应的伪标签的置信度大于预设阈值,确定所述无标签样本图像为目标无标签样本图像。An image determination module, configured to determine that the unlabeled sample image is the target unlabeled sample image if the confidence level of at least one pseudo-label corresponding to the target object in the unlabeled sample image is greater than a preset threshold.
  14. 一种计算机设备,包括:处理器,以及与所述处理器通信连接的存储器;A computer device, including: a processor, and a memory communicatively connected to the processor;
    所述存储器存储第一计算机执行指令和第二计算机执行指令;The memory stores first computer-executable instructions and second computer-executable instructions;
    所述处理器执行所述存储器存储的第一计算机执行指令,以实现如权利要求1-12任一项所述的方法。The processor executes the first computer execution instructions stored in the memory to implement the method according to any one of claims 1-12.
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,所述计算机指令被处理器执行时用于实现如权利要求1-12任一项所述的方法。A computer-readable storage medium. Computer instructions are stored in the computer-readable storage medium. When the computer instructions are executed by a processor, they are used to implement the method according to any one of claims 1-12.
  16. 一种计算机程序产品,包括计算机可执行指令,当处理器执行所述计算机可执行指令时,实现如权利要求1-12任一权利要求所述的方法。A computer program product includes computer-executable instructions. When a processor executes the computer-executable instructions, the method according to any one of claims 1-12 is implemented.
  17. 一种计算机程序,当处理器执行所述计算机程序时,实现如权利要求1-12任一权利要求所述的方法。 A computer program that, when executed by a processor, implements the method described in any one of claims 1-12.
PCT/CN2023/109269 2022-07-29 2023-07-26 Image processing method and apparatus, and device and medium WO2024022376A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210904939.5A CN117523327A (en) 2022-07-29 2022-07-29 Image processing method, device, equipment and medium
CN202210904939.5 2022-07-29

Publications (1)

Publication Number Publication Date
WO2024022376A1 true WO2024022376A1 (en) 2024-02-01

Family

ID=89705355

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109269 WO2024022376A1 (en) 2022-07-29 2023-07-26 Image processing method and apparatus, and device and medium

Country Status (2)

Country Link
CN (1) CN117523327A (en)
WO (1) WO2024022376A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666979A (en) * 2020-05-13 2020-09-15 北京科技大学 Underwater scene target detection integration method and system for label generation
CN112150478A (en) * 2020-08-31 2020-12-29 温州医科大学 Method and system for constructing semi-supervised image segmentation framework
CN113256646A (en) * 2021-04-13 2021-08-13 浙江工业大学 Cerebrovascular image segmentation method based on semi-supervised learning
CN114037876A (en) * 2021-12-16 2022-02-11 马上消费金融股份有限公司 Model optimization method and device
CN114298173A (en) * 2021-12-13 2022-04-08 上海高德威智能交通***有限公司 Data processing method, device and equipment
CN114331971A (en) * 2021-12-08 2022-04-12 之江实验室 Ultrasonic endoscope target detection method based on semi-supervised self-training

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666979A (en) * 2020-05-13 2020-09-15 北京科技大学 Underwater scene target detection integration method and system for label generation
CN112150478A (en) * 2020-08-31 2020-12-29 温州医科大学 Method and system for constructing semi-supervised image segmentation framework
CN113256646A (en) * 2021-04-13 2021-08-13 浙江工业大学 Cerebrovascular image segmentation method based on semi-supervised learning
CN114331971A (en) * 2021-12-08 2022-04-12 之江实验室 Ultrasonic endoscope target detection method based on semi-supervised self-training
CN114298173A (en) * 2021-12-13 2022-04-08 上海高德威智能交通***有限公司 Data processing method, device and equipment
CN114037876A (en) * 2021-12-16 2022-02-11 马上消费金融股份有限公司 Model optimization method and device

Also Published As

Publication number Publication date
CN117523327A (en) 2024-02-06

Similar Documents

Publication Publication Date Title
Singh et al. Image classification: a survey
JP7375006B2 (en) Image semantic segmentation network training method, device, equipment and computer program
JP7128022B2 (en) Form a dataset for fully supervised learning
JP6547069B2 (en) Convolutional Neural Network with Subcategory Recognition Function for Object Detection
CN111524106B (en) Skull fracture detection and model training method, device, equipment and storage medium
CN105122270B (en) The method and system of people is counted using depth transducer
WO2017113232A1 (en) Product classification method and apparatus based on deep learning
CA3066029A1 (en) Image feature acquisition
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN112528058B (en) Fine-grained image classification method based on image attribute active learning
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
US20230027723A1 (en) Stain-free detection of embryo polarization using deep learning
CN114692778B (en) Multi-mode sample set generation method, training method and device for intelligent inspection
CN115861400A (en) Target object detection method, training method and device and electronic equipment
CN117557784B (en) Target detection method, target detection device, electronic equipment and storage medium
CN114219936A (en) Object detection method, electronic device, storage medium, and computer program product
CN108509876A (en) For the object detecting method of video, device, equipment, storage medium and program
WO2024022376A1 (en) Image processing method and apparatus, and device and medium
CN111597937A (en) Fish gesture recognition method, device, equipment and storage medium
CN115937991A (en) Human body tumbling identification method and device, computer equipment and storage medium
JP2024516642A (en) Behavior detection method, electronic device and computer-readable storage medium
JP7479007B2 (en) Information processing device, program, system, and method for detecting grapes from an image
Mohsin et al. Convolutional neural networks for real-time wood plank detection and defect segmentation
CN114220041A (en) Target recognition method, electronic device, and storage medium
Wang et al. An object detection algorithm based on the feature pyramid network and single shot multibox detector

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845584

Country of ref document: EP

Kind code of ref document: A1