CN115050002A

CN115050002A - Image annotation model training method and device, electronic equipment and storage medium

Info

Publication number: CN115050002A
Application number: CN202210809767.3A
Authority: CN
Inventors: 祝露; 邵全全; 张松; 严玮; 别晓芳; 王汉超
Original assignee: Zero Beam Technology Co ltd
Current assignee: Zero Beam Technology Co ltd
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2022-09-13

Abstract

The embodiment of the application provides an image annotation model training method, an image annotation model training device, electronic equipment and a storage medium, wherein the image annotation model training method comprises the following steps: acquiring a data set to be annotated comprising at least two images to be annotated; determining the similarity between the images to be annotated in the data set to be annotated; labeling each image to be labeled in the data set to be labeled respectively through a first labeling model to be trained to obtain a first labeling result of each image to be labeled; determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled; and training the first labeling model through each image to be labeled and the corresponding artificial labeling result which are included in the sample subset. By applying the scheme, the problem of poor algorithm labeling effect when the algorithm assists in manual labeling can be solved, the labeled data in the data set is fully utilized, and the algorithm labeling effect is improved.

Description

Image annotation model training method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an image annotation model training method and device, electronic equipment and a storage medium.

Background

With the development of science and technology, automatic driving is more and more popular, and in order to enable an automatic driving algorithm to process more and more complex scenes, massive real image data are required to be supported, so that the image labeling requirement is higher and higher, and massive, high-quality and refined image labeling can improve the safety and the practicability of automatic driving of the automobile to a great extent.

At present, an algorithm-assisted manual annotation mode is adopted to label an image, an annotation model is used to label the image, and then an annotation result is manually corrected.

However, various complex scenes exist in the automatic driving scene, higher requirements are put on the quantity and quality of samples in the labeling model, and the model training data set and the data set to be labeled often have deviation, so that the algorithm labeling effect is poor, and the labeled data in the data set is not fully utilized in the labeling process.

Disclosure of Invention

In view of the above, embodiments of the present application provide an image annotation model training method, apparatus, electronic device and storage medium to at least partially solve the above problem.

According to a first aspect of embodiments of the present application, there is provided an image annotation model training method, including: acquiring a data set to be annotated comprising at least two images to be annotated; determining the similarity between the images to be labeled in the data set to be labeled; labeling each image to be labeled in the data set to be labeled respectively through a first labeling model to be trained to obtain a first labeling result of each image to be labeled, wherein the first labeling model is used for labeling a target object in the image; determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled, wherein the sample subset comprises at least two images to be labeled, the similarity between the images to be labeled in the sample subset is smaller than a similarity threshold, and the accuracy of the first labeling result of each image to be labeled in the sample subset is smaller than an accuracy threshold; acquiring an artificial labeling result corresponding to each image to be labeled in the sample subset; and training the first labeling model through each image to be labeled and the corresponding artificial labeling result included in the sample subset.

According to a second aspect of embodiments of the present application, there is provided an image annotation apparatus, the apparatus comprising: the acquisition module is used for acquiring a data set to be annotated comprising at least two images to be annotated and acquiring an artificial annotation result corresponding to each image to be annotated in the sample subset; the marking module is used for marking each image to be marked in the data set to be marked respectively through a first marking model to be trained to obtain a first marking result of each image to be marked; the image processing module is used for determining the similarity between the images to be labeled in the data set to be labeled and determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled; and the training module is used for training the marking model through each image to be marked included in the sample subset and the corresponding artificial marking result.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the method according to the first aspect.

According to a fourth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect.

According to a fifth aspect of embodiments herein, there is provided a computer program product, which, when executed by a processor, implements the method of the first aspect.

Based on the labeling method provided by the scheme, the similarity between the images to be labeled is determined, the images to be labeled are labeled through the first labeling model to obtain a first labeling result, the sample subset is selected according to the similarity between the images to be labeled and the first labeling result, then the manual labeling result of the sample subset is obtained, and the first labeling model is trained through each image to be labeled in the sample subset and the corresponding manual labeling result. Therefore, by determining the similarity between the images to be labeled and acquiring the first labeling result of the image to be labeled, a sample subset with low similarity and low labeling accuracy can be selected, then the manual labeling result of the sample subset is acquired, so that a representative sample subset with good labeling can be obtained, and then the labeling model is trained through the representative sample subset with good labeling, so that the labeling model has higher pertinence, the labeling result of the labeling model is more accurate, and the algorithm labeling effect in the labeling process is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flowchart of an image annotation model training method according to an embodiment of the present application;

FIG. 2 is a flowchart of a target detection task labeling model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an image annotation model training apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As shown in fig. 1, fig. 1 is a flowchart of an image annotation model training method provided in an embodiment of the present application, where the method includes the following steps 101 to 106:

step 101, acquiring a data set to be annotated comprising at least two images to be annotated.

Determining a whole data set to be annotated, wherein the data set comprises at least two images to be annotated, and the images to be annotated can be of any type, for example: landscape pictures, people pictures, road video frames, etc.

And 102, determining the similarity between the images to be annotated in the data set to be annotated.

And determining similarity measurement between the images to be annotated in the data set to be annotated, wherein the more similar the two images are, the higher the similarity measurement is, and the more dissimilar the two images are, the lower the similarity measurement is.

Step 103, labeling each image to be labeled in the data set to be labeled respectively through a first labeling model to be trained, and obtaining a first labeling result of each image to be labeled, wherein the first labeling model is used for labeling a target object in the image.

Labeling the image to be labeled through the first labeling model to be trained to obtain a first labeling result corresponding to each image to be labeled, wherein the first labeling results are different according to different labeling tasks, for example: if the annotation task is to identify the object type in the image, the first annotation result is used for indicating the object type in the image to be annotated; if the annotation task is to annotate all vehicles in the image, the first annotation result is used for indicating the positions of all vehicles in the image to be annotated and the like.

It should be understood that the labeling tasks are different, and the corresponding first labeling model to be trained is different. And determining a first labeling model to be trained according to the specific visual perception labeling task.

It should also be appreciated that the first annotation model to be trained is an annotation model pre-trained with either the generic data set or the self-owned data set, with annotation capabilities, rather than a blank model that is not trained at all.

And 104, determining a sample subset according to the similarity between the images to be annotated and the first annotation result of each image to be annotated, wherein the sample subset comprises at least two images to be annotated, the similarity between the images to be annotated in the sample subset is smaller than a similarity threshold, and the accuracy of the first annotation result of each image to be annotated in the sample subset is smaller than an accuracy threshold.

Setting a similarity threshold value between the images and an accuracy threshold value of the labeling result in advance, and determining the image to be labeled with the similarity smaller than the similarity threshold value and the accuracy smaller than the accuracy threshold value as a sample subset, so as to select a representative sample subset with lower labeling accuracy, wherein the sample subset comprises at least two images.

It should be understood that, since the first annotation model to be trained is an annotation model pre-trained by the generic data set or the self-owned data set, the annotation result output by the first annotation model may not be completely accurate for more complex images to be annotated.

It should also be understood that since the similarity between the images of the sample subset is less than the preset similarity threshold, the similarity of the images in the sample subset is low and therefore representative.

And 105, acquiring an artificial labeling result corresponding to each image to be labeled in the sample subset.

And outputting the representative sample subset with lower labeling accuracy to a manual end, manually correcting a labeling result according to the labeling task, and inputting a correct labeling result into a labeling system.

And 106, training the first labeling model through each image to be labeled and the corresponding artificial labeling result included in the sample subset.

And training the first marking model to be trained by using the sample subset after manual correction, wherein the sample subset is representative and accurately marked as the marking result is manually corrected.

In the embodiment of the application, the similarity between the images to be labeled is determined, the images to be labeled are labeled through a first labeling model to obtain a first labeling result, a sample subset is selected according to the similarity between the images to be labeled and the first labeling result, then the manual labeling result of the sample subset is obtained, and the first labeling model is trained through each image to be labeled in the sample subset and the corresponding manual labeling result. Therefore, by determining the similarity between the images to be labeled and acquiring the first labeling result of the image to be labeled, a sample subset with low similarity and low labeling accuracy can be selected, then the manual labeling result of the sample subset is acquired, so that a representative sample subset with good labeling can be obtained, and then the labeling model is trained through the representative sample subset with good labeling, so that the labeling model has higher pertinence, the labeling result of the labeling model is more accurate, and the algorithm labeling effect in the labeling process is improved.

In a possible implementation manner, when determining the similarity between the images to be labeled in the data set to be labeled, each image to be labeled included in the data set to be labeled may be encoded through a pre-trained encoding model to obtain a feature vector of each image to be labeled, and then distances between feature vectors of different images to be labeled in the data set to be labeled are calculated respectively as the similarity between two corresponding images to be labeled.

The image to be annotated is subjected to feature coding, the image to be annotated is converted into a 256-dimensional vector through a coding model, the distance between feature vectors in a data set to be annotated is calculated, the closer the distance is, the higher the similarity degree between the two images is, the farther the distance is, and the lower the similarity degree between the two images is.

It should be understood that the coding model is a model trained by self-supervised learning, which may take many forms, such as: simsim, SimCLR, MoCo, etc., and the examples of this application are not limited thereto.

It should be further understood that, when performing feature encoding, sampling may be performed by multiple sampling methods such as farthest point sampling, inverse density sampling, and the like, and this embodiment of the present application is not limited thereto.

In the embodiment of the application, feature coding is performed on the image to be annotated through the coding model, so that feature vectors of the image to be annotated can be obtained, and then distances between the feature vectors corresponding to different images to be annotated are respectively calculated, so that similarity between different images to be annotated can be obtained.

In a possible implementation manner, when a sample subset is determined according to the similarity between the images to be annotated and the first annotation result of each image to be annotated, deduplication processing may be performed on the images to be annotated in the data set to be annotated, of which the corresponding similarity is greater than a similarity threshold, to obtain alternative data sets, then the accuracy of the first annotation result of each image to be annotated in the alternative data sets is respectively calculated, then at least two images to be annotated, of which the corresponding accuracy is less than an accuracy threshold, are extracted from the alternative data sets, and the extracted sets of the images to be annotated are determined as the sample subset.

The method includes the steps of removing duplication of images to be annotated according to similarity between different images to be annotated, deleting one image in an image pair with similarity higher than a similarity threshold, wherein the similarity threshold can be set at any value, and the method is not limited in the application, for example: the similarity threshold value is 8, if the distance between the feature vectors corresponding to the two images to be labeled is smaller than 8, any one of the images is deleted until the distances between the feature vectors corresponding to all the images are larger than 8, the image set is used as an alternative data set, the accuracy of the first labeling result of the image to be labeled in the alternative data set is calculated, and the image with the accuracy lower than the accuracy threshold value is selected and used as a sample subset.

It should be understood that, since the first annotation model to be trained needs to be trained so as to have a good annotation effect, it is necessary to select a representative image to be annotated with inaccurate annotation as a sample subset, and the representative image can be screened by performing a deduplication operation on the image to be annotated in the data set to be annotated.

In the embodiment of the application, the image to be annotated is subjected to deduplication processing to obtain the alternative data set, then the accuracy of the image to be annotated in the alternative data set is calculated, and the image to be annotated corresponding to the accuracy lower than the accuracy threshold value is used as the sample subset, so that the sample subset with representativeness and poor annotation effect can be selected.

In a possible implementation manner, when the accuracy of the first labeling result of each image to be labeled in the candidate data set is respectively calculated, a probability distribution map of each image to be labeled in the candidate data set may be respectively determined, where the probability distribution map is used to indicate a probability that each pixel point in the corresponding image to be labeled is included in an image of a target object, then a candidate frame overlay map of each image to be labeled in the candidate data set is respectively determined, where the candidate frame overlay map is formed by overlaying at least two candidate frames, the candidate frames are used to indicate a region where a candidate object in the image to be labeled is located, then, for each image to be labeled in the candidate data set, an intersection ratio of the probability distribution map corresponding to the image to be labeled and the candidate frame overlay map is calculated, and the accuracy of the first labeling result of the image to be labeled is determined according to the intersection ratio.

The first labeling model outputs a first labeling result, a probability distribution map and a candidate frame overlay map in the labeling process, determines the probability distribution map and the candidate frame overlay map of each image to be labeled, and calculates the accuracy of intersection and comparison as the first labeling result of each image to be labeled.

In the embodiment of the application, the probability distribution map and the candidate frame overlay map of each image to be annotated are determined, and the accuracy of the intersection of the two maps compared with the first annotation result of each image to be annotated is calculated, so that a sample subset with low annotation accuracy can be selected from the candidate data set according to the accuracy of the first annotation result.

In a possible implementation manner, when at least two images to be annotated with corresponding accuracy smaller than an accuracy threshold are extracted from an alternative data set and a set of the extracted images to be annotated is determined as a sample subset, 2N images to be annotated with corresponding accuracy smaller than the accuracy threshold can be randomly extracted from the alternative data set, where N is a positive integer, then the 2N images to be annotated are sorted according to a sequence of the corresponding accuracy from small to large, and then a set of the first N images to be annotated in the sorted 2N images to be annotated is determined as a sample subset.

And (2) extracting 2N images with the accuracy smaller than an accuracy threshold from the sample subset by using a sampling algorithm, sequencing the accuracies corresponding to the extracted 2N images from small to large, taking the first N images with low accuracy as the sample subset, wherein N is a positive integer.

It should be understood that the ordering manner of the accuracy corresponding to the images to be labeled may also be an order from large to small, and if this ordering manner is adopted, the last N images with low accuracy are taken as the sample subset.

In the embodiment of the application, 2N images to be annotated in the alternative data set are extracted, the accuracy degrees corresponding to the images to be annotated are sequenced, N images with low accuracy are selected as a sample subset, and the alternative data set is the data set to be annotated after the de-duplication processing and has certain representativeness, so that the images to be annotated in the sample subset are representative images with low accuracy.

In a possible implementation manner, when determining a candidate frame overlay of each image to be annotated in a candidate data set, a probability that each candidate frame in the image to be annotated includes a candidate object image may be determined for each image to be annotated in the candidate data set, and then at least two candidate frames corresponding to the candidate object image with the probability greater than a preset probability threshold are overlaid to obtain the candidate frame overlay.

When the candidate frame overlay image output by the first annotation model is output, a large number of candidate frames exist in each image to be annotated, but the target object is not necessarily circled by the candidate frames, and some other objects may be circled, for example: and the labeling task is to identify vehicles on the road, and in the labeling process, the labeling model is used for circling vehicles on the road and possibly circling other objects such as pedestrians on the road, so that the probability that the candidate frames comprise the candidate object images is calculated, and then the candidate frames with the probability larger than the probability threshold value are superposed to form a candidate frame superposed graph.

It should be understood that the candidate frames in the candidate frame overlay are not necessarily selected according to the probability threshold, and all the candidate frames may be ranked from large to small in the probability that the candidate frames contain the object image, and the ranked previous part of the candidate frames may be overlaid to form the candidate frame overlay, for example: and respectively calculating the probability that the ten thousand candidate frames contain the target object, then sequencing the probabilities from large to small, and overlapping the first thousand candidate frames to form a candidate frame overlay.

In the embodiment of the application, the probability that the candidate frame comprises the object image is calculated, and the candidate frame with the probability higher than the preset probability threshold is superposed into the candidate frame superposed map, so that the result is more accurate when the accuracy of the first labeling result of the image to be labeled is calculated through the probability distribution map and the candidate frame superposed map.

In a possible implementation manner, each image to be labeled in the sample subset is labeled through a second labeling model to obtain a second labeling result of each image to be labeled, wherein the second labeling model is used for labeling a target object in the image, and then the second labeling result of each image to be labeled in the sample subset is displayed to assist a user in manually labeling the image to be labeled in the sample subset.

And carrying out secondary labeling on the screened sample subset by using a second labeling model to obtain a second labeling result, thereby reducing the workload of manual correction and improving the efficiency of labeling. The second annotation model can be the first annotation model or another annotation model.

In the embodiment of the application, before the manual labeling result is obtained, the sample subset is labeled through the second labeling model, so that manual correction of the labeling result of the image to be labeled in the sample subset is assisted, the manual workload is reduced, and the efficiency of overall labeling is improved.

In order to better understand the image annotation model training method disclosed in the present application, the following takes a target detection task as an example to describe in detail the image annotation model training method provided in the present application.

In the related technology, image data are marked mainly in a mode of algorithm-assisted manual marking, images are marked through a marking model, then marking results are corrected manually, but the model training data set and the data set to be marked often have deviation, so that the algorithm marking effect is poor.

In order to solve the above problems, the present application provides an image annotation model training method. The scheme provided by the application can be used in the above target detection task, and certainly, the scheme provided by the application can also be used in various tasks such as 3D target detection, lane line detection, semantic segmentation and the like, which are not repeated one by one.

As shown in fig. 2, fig. 2 is a flowchart of a method for training a target detection task labeling model according to an embodiment of the present application, where the method includes steps 201 to 208:

step 201, obtaining a data set to be marked of the vehicle-end perception street view.

And selecting a data set to be marked after the pretreatment such as cleaning and screening.

Step 202, labeling each image to be labeled in the data set to be labeled respectively through the first labeling model, and obtaining a first labeling result of each image to be labeled.

And labeling the image to be labeled through the first labeling model to be trained.

And 203, performing feature coding on the images to be annotated in the data set to be annotated through the coding model of the self-supervision learning, and calculating the similarity among the images.

Compressing the images to be labeled into 256-dimensional characteristic vectors through a coding model of self-supervision learning, and calculating the similarity between the images to be labeled pairwise.

And 204, determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled.

And selecting a representative annotation image with low similarity and inaccurate first annotation result as a sample subset so as to enable the sample subset to be representative.

And 205, labeling the sample subset through the second labeling model to obtain a second labeling result.

And labeling the sample subset through the second labeling model to reduce the workload of manual correction.

And step 206, outputting the image to be annotated contained in the sample subset and the first annotation result and the second annotation result corresponding to the image to be annotated to a manual end, and obtaining a manual annotation result.

And manually correcting the labeling result of the image to be labeled in the sample subset so as to obtain the well-labeled sample subset.

And step 207, training the first labeling model through each image to be labeled and the corresponding artificial labeling result included in the sample subset.

The first labeled model is trained by labeling the well-labeled data subset.

208, judging whether the data set to be marked is completely marked, if so, ending the current process, otherwise, executing step 209

And 209, labeling the unmarked image to be labeled in the data set to be labeled by using the trained first labeling model to obtain a first labeling result, and executing the step 203.

And re-labeling the remaining unlabeled data samples based on the iterated first labeling model.

As shown in fig. 3, fig. 3 is a schematic diagram of an image annotation model training apparatus provided in an embodiment of the present application, where the apparatus includes:

an obtaining module 301, configured to obtain a to-be-annotated dataset including at least two to-be-annotated images, and obtain an artificial annotation result corresponding to each to-be-annotated image in a sample subset;

the labeling module 302 is configured to label each image to be labeled in the data set to be labeled through a first labeling model to be trained, and obtain a first labeling result of each image to be labeled;

the image processing module 303 is configured to determine similarity between images to be annotated in the data set to be annotated, and determine a sample subset according to the similarity between the images to be annotated and the first annotation result of each image to be annotated;

the training module 304 is configured to train the annotation model according to each image to be annotated and the corresponding manual annotation result included in the sample subset.

In a possible implementation manner, through a pre-trained coding model, coding each image to be labeled included in a data set to be labeled respectively to obtain a feature vector of each image to be labeled; and respectively calculating the distance between the characteristic vectors of different images to be marked in the data set to be marked as the similarity between the two corresponding images to be marked.

In a possible implementation manner, the image to be annotated in the data set to be annotated, which corresponds to the similarity greater than the similarity threshold, is subjected to deduplication processing to obtain an alternative data set; respectively calculating the accuracy of the first labeling result of each image to be labeled in the alternative data set; and extracting at least two images to be annotated with corresponding accuracy smaller than an accuracy threshold from the alternative data set, and determining the set of the extracted images to be annotated as a sample subset.

In a possible implementation manner, a probability distribution map of each image to be labeled in the alternative data set is respectively determined, wherein the probability distribution map is used for indicating the probability that each pixel point in the corresponding image to be labeled is contained in the image of the target object; respectively determining a candidate frame overlay of each image to be annotated in the alternative data set, wherein the candidate frame overlay is formed by overlapping at least two candidate frames, and the candidate frames are used for indicating areas where candidate objects in the image to be annotated are located; and calculating the intersection and parallel ratio of the probability distribution map corresponding to the image to be labeled and the candidate frame overlay map aiming at each image to be labeled in the alternative data set, and determining the accuracy of the first labeling result of the image to be labeled according to the intersection and parallel ratio.

In a possible implementation manner, 2N images to be annotated with corresponding accuracy lower than an accuracy threshold are randomly extracted from an alternative data set, wherein N is a positive integer; sequencing the 2N images to be marked according to the sequence of the corresponding accuracy from small to large; and determining a set of the first N images to be labeled in the sequenced 2N images to be labeled as a sample subset.

In a possible implementation manner, for each image to be annotated in the candidate data set, determining a probability that each candidate frame in the image to be annotated includes a candidate object image; and overlapping at least two candidate frames corresponding to the candidate object images with the probability larger than a preset probability threshold value to obtain a candidate frame overlapping graph.

In a possible implementation manner, labeling each image to be labeled in the sample subset through a second labeling model respectively to obtain a second labeling result of each image to be labeled, wherein the second labeling model is used for labeling a target object in the image; and displaying the second labeling result of each image to be labeled in the sample subset so as to assist a user in manually labeling the images to be labeled in the sample subset.

It should be noted that, because the contents of information interaction, execution process, and the like between the modules in the image annotation device are based on the same concept as the simulation model calibration method embodiment, specific contents may refer to the description in the image annotation model training method embodiment, and are not described herein again.

Referring to fig. 4, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 4, the electronic device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein:

the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.

A communication interface 404 for communicating with other electronic devices or servers.

The processor 402 is configured to execute the program 410, and may specifically execute relevant steps in the above-described embodiment of the image annotation model training method.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; one or more GPUs; or may be different types of processors, such as one or more CPUs and one or more GPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically configured to enable the processor 402 to execute the image annotation model training method in any one of the embodiments.

For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in any of the foregoing embodiments of the image annotation model training method, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

In the embodiment of the application, the similarity between the images to be labeled is determined, the images to be labeled are labeled through the first labeling model to obtain a first labeling result, a sample subset is selected according to the similarity between the images to be labeled and the first labeling result, then the manual labeling result of the sample subset is obtained, and the first labeling model is trained through each image to be labeled and the corresponding manual labeling result in the sample subset. Therefore, by determining the similarity between the images to be labeled and acquiring the first labeling result of the image to be labeled, a sample subset with low similarity and low labeling accuracy can be selected, then the manual labeling result of the sample subset is acquired, so that a representative sample subset with good labeling can be obtained, and then the labeling model is trained through the representative sample subset with good labeling, so that the labeling model has higher pertinence, the labeling result of the labeling model is more accurate, and the algorithm labeling effect in the labeling process is improved.

The embodiment of the present application further provides a computer program product, which includes computer instructions for instructing a computing device to execute an operation corresponding to any one of the methods in the foregoing method embodiments.

It should be noted that, according to implementation needs, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the image annotation model training methods described herein. Further, when a general-purpose computer accesses code for implementing the image annotation model training methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the image annotation model training methods illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. An image annotation model training method is characterized by comprising the following steps:

acquiring a data set to be annotated comprising at least two images to be annotated;

determining the similarity between the images to be labeled in the data set to be labeled;

labeling each image to be labeled in the data set to be labeled respectively through a first labeling model to be trained to obtain a first labeling result of each image to be labeled, wherein the first labeling model is used for labeling a target object in the image;

determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled, wherein the sample subset comprises at least two images to be labeled, the similarity between the images to be labeled in the sample subset is smaller than a similarity threshold, and the accuracy of the first labeling result of each image to be labeled in the sample subset is smaller than an accuracy threshold;

acquiring an artificial labeling result corresponding to each image to be labeled in the sample subset;

and training the first labeling model through each image to be labeled and the corresponding artificial labeling result included in the sample subset.

2. The method according to claim 1, wherein the determining the similarity between the images to be labeled in the data set to be labeled comprises:

respectively coding each image to be marked included in the data set to be marked through a pre-trained coding model to obtain a characteristic vector of each image to be marked;

and respectively calculating the distance between the feature vectors of different images to be marked in the data set to be marked as the similarity between the two corresponding images to be marked.

3. The method according to claim 1, wherein the determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled comprises:

carrying out duplicate removal processing on the image to be labeled of which the corresponding similarity is greater than a similarity threshold value in the data set to be labeled to obtain an alternative data set;

respectively calculating the accuracy of the first labeling result of each image to be labeled in the alternative data set;

extracting at least two images to be annotated with corresponding accuracy smaller than an accuracy threshold from the alternative data set, and determining the extracted set of the images to be annotated as the sample subset.

4. The method according to claim 3, wherein the separately calculating the accuracy of the first labeling result of each image to be labeled in the alternative data set comprises:

respectively determining a probability distribution map of each image to be labeled in the alternative data set, wherein the probability distribution map is used for indicating the probability that each pixel point in the corresponding image to be labeled is contained in the image of the target object;

respectively determining a candidate frame overlay of each image to be annotated in the alternative data set, wherein the candidate frame overlay is formed by overlaying at least two candidate frames, and the candidate frames are used for indicating areas where candidate objects in the image to be annotated are located;

and calculating the intersection ratio of the probability distribution map corresponding to the image to be labeled and the candidate frame overlay map aiming at each image to be labeled in the alternative data set, and determining the accuracy of the first labeling result of the image to be labeled according to the intersection ratio.

5. The method according to claim 3, wherein the extracting at least two images to be annotated from the candidate data set with corresponding accuracies smaller than an accuracy threshold, and determining each extracted set of the images to be annotated as the sample subset comprises:

randomly extracting 2N images to be marked with corresponding accuracy smaller than an accuracy threshold from the alternative data set, wherein N is a positive integer;

sequencing the 2N images to be marked according to the sequence of the corresponding accuracy from small to large;

and determining a set of the first N images to be labeled in the sequenced 2N images to be labeled as the sample subset.

6. The method according to claim 4, wherein the separately determining the candidate frame overlay of each image to be annotated in the candidate data set comprises:

for each image to be annotated in the alternative data set, determining the probability that each candidate frame in the image to be annotated comprises a candidate object image;

and overlapping at least two candidate frames corresponding to the candidate object image with the probability larger than a preset probability threshold value to obtain the candidate frame overlapping graph.

7. The method of claim 1, further comprising:

labeling each image to be labeled in the sample subset through a second labeling model respectively to obtain a second labeling result of each image to be labeled, wherein the second labeling model is used for labeling the target object in the image;

and displaying the second labeling result of each image to be labeled in the sample subset so as to assist a user in manually labeling the images to be labeled in the sample subset.

8. An image annotation model training apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a data set to be annotated comprising at least two images to be annotated and acquiring an artificial annotation result corresponding to each image to be annotated in the sample subset;

the labeling module is used for labeling each image to be labeled in the data set to be labeled through a first labeling model to be trained to obtain a first labeling result of each image to be labeled;

the image processing module is used for determining the similarity between the images to be labeled in the data set to be labeled and determining a sample subset according to the similarity between the images to be labeled and the first labeling result of each image to be labeled;

and the training module is used for training the marking model through each image to be marked included in the sample subset and the corresponding artificial marking result.

9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;

the memory is used for storing at least one executable instruction which causes the processor to execute the corresponding operation of the method according to any one of claims 1-7.

10. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.

11. A computer program product, characterized in that the computer program, when being executed by a processor, carries out the image annotation model training method according to any one of claims 1 to 7.