CN115705685A

CN115705685A - Image data set labeling method and device and electronic equipment

Info

Publication number: CN115705685A
Application number: CN202110895391.8A
Authority: CN
Inventors: 魏晓林
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd; CM Intelligent Mobility Network Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd; CM Intelligent Mobility Network Co Ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-02-17

Abstract

The invention discloses an image data set labeling method, an image data set labeling device and electronic equipment, and belongs to the technical field of automatic driving. The method comprises the following steps: acquiring annotation task demand information and an image set to be annotated, wherein the image set to be annotated comprises at least two frames of images to be annotated, and the annotation task demand information comprises scene demand information and annotation object demand information; performing image recognition processing on each image to be annotated in the image set to be annotated through a preset model to determine a pre-annotation result of each image to be annotated in the image set to be annotated, and then updating to obtain a target annotation result of the target image to be annotated through first input of the pre-annotation result of the target image to be annotated; and finally, performing error correction detection on target labeling results of all images to be labeled in the first image set, and outputting error correction results. The invention can improve the efficiency of the image annotation process.

Description

Image data set labeling method and device and electronic equipment

Technical Field

The invention belongs to the technical field of automatic driving, and particularly relates to an image data set labeling method and device and electronic equipment.

Background

In the prior art, in the field of intelligent traffic automatic driving, a large amount of labeled data is needed for training an Artificial Intelligence (AI) model based on roadside visual perception in AI application research, so that the recognition and detection precision is improved to support the specific requirements of the intelligent traffic automatic driving application field. In the labeling process, a manual labeling mode is often adopted to label a large amount of data, and the labeling efficiency is low.

Disclosure of Invention

The invention aims to provide an image data set annotation method, an image data set annotation device and electronic equipment, which can solve the problem of low annotation efficiency in a data annotation process in the related art.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, the present invention provides a method for annotating an image data set, the method comprising:

acquiring annotation task demand information and an image set to be annotated, wherein the image set to be annotated comprises at least two frames of images to be annotated, and the annotation task demand information comprises scene demand information and annotation object demand information;

according to the scene demand information, image recognition processing is carried out on each image to be annotated in the image set to be annotated by adopting a preset model so as to determine a pre-annotation result of each image to be annotated in the image set to be annotated;

receiving a first input of a pre-annotation result of a target image to be annotated, wherein the target image to be annotated is any image in a first image set, and the first image set is an image set matched with the annotation object demand information and the scene demand information in the image set to be annotated;

responding to the first input, updating a pre-labeling result of the target image to be labeled to obtain a target labeling result of the target image to be labeled;

and carrying out error correction detection on the target labeling results of all the images to be labeled in the first image set, and outputting error correction results.

In a second aspect, the present invention also provides an image data set labeling apparatus, including:

the system comprises a first acquisition module, a second acquisition module and a marking module, wherein the first acquisition module is used for acquiring marking task demand information and an image set to be marked, the image set to be marked comprises at least two frames of images to be marked, and the marking task demand information comprises scene demand information and marking object demand information;

the first determining module is used for performing image recognition processing on each image to be annotated in the image set to be annotated by adopting a preset model according to the scene demand information so as to determine a pre-annotation result of each image to be annotated in the image set to be annotated;

the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving first input of a pre-annotation result of a target image to be annotated by a user, the target image to be annotated is any image in a first image set, and the first image set is an image set matched with the annotation object demand information and the scene demand information in the image set to be annotated;

the updating module is used for responding to the first input and updating the pre-labeling result of the target image to be labeled so as to obtain the target labeling result of the target image to be labeled;

and the error correction module is used for carrying out error correction detection on the target labeling results of all the images to be labeled in the first image set and outputting error correction results.

In a third aspect, the present invention also provides an electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, which when executed by the processor implements the steps of the method according to the first aspect.

In a fourth aspect, the invention also provides a computer readable storage medium on which a program or instructions are stored, which program or instructions, when executed by a processor, implement the steps of the method according to the first aspect.

In the embodiment of the invention, image recognition processing is firstly carried out on each image to be annotated in the image set to be annotated through a preset model so as to determine the pre-annotation result of each image to be annotated in the image set to be annotated, and then the target annotation result of the target image to be annotated is obtained through updating by first input of the pre-annotation result of the target image to be annotated. Therefore, the annotation personnel only need to manually label or adjust the part of the images to be labeled in the image set to be labeled, and do not need to manually label all the images to be labeled in the image set to be labeled, so that the efficiency of the image labeling process is improved. In addition, in the embodiment of the present invention, error correction detection is performed on the target labeling results of all the images to be labeled in the first image set, and an error correction result is output, so that the reliability of the labeling result can be improved.

Drawings

FIG. 1 is a flow chart of a method for annotating an image data set according to the present invention;

FIG. 2 is a flow chart of another method for annotating an image data set according to the present invention;

FIG. 3 is a block diagram of an image dataset annotation apparatus according to the present invention;

fig. 4 is a structural diagram of an electronic device provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The image data set labeling method, the image data set labeling device, the electronic device and the computer readable storage medium provided by the present invention are described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.

Referring to fig. 1, which is a flowchart of an image data set annotation method provided by the present invention, as shown in fig. 1, the image data set annotation method can include the following steps:

step 101, obtaining annotation task demand information and an image set to be annotated, wherein the image set to be annotated comprises at least two frames of images to be annotated, and the annotation task demand information comprises scene demand information and annotation object demand information.

In a specific implementation, the image set to be annotated may be images acquired by roadside image acquisition equipment at different times and in different weather scenes (for example, sunny days and cloudy days), and each frame of image to be annotated may include: people, cars, street lights, bicycles, motorcycles, and the like.

At this time, the scene requirement information may include: the scene requirements of the images to be annotated (for example, different images to be annotated need to cover scenes such as cloudy days, sunny days, rainy days, daytime, dark days, urban roads and suburban roads) and the scene distribution (for example, the ratio of cloudy days, sunny days and rainy days is assumed to be: 3. The annotation object requirement information may include: the image to be annotated needs to include the categories of objects (such as people or vehicles, etc.) and the quantity distribution of each category of objects (for example, the ratio of people, automobiles and bicycles is assumed to be: 3.

It should be noted that the annotation task requirement information may further include other conditions that need to be met in order to implement AI model training, such as the number of images to be annotated that need to be included in the set of images to be annotated, which is not exhaustive here.

And step 102, according to the scene requirement information, performing image recognition processing on each image to be annotated in the image set to be annotated by adopting a preset model so as to determine a pre-annotation result of each image to be annotated in the image set to be annotated.

In a specific implementation, the above, according to the scene demand information, performing image recognition processing on each image to be annotated in the image set to be annotated by using a preset model to determine a pre-annotation result of each image to be annotated in the image set to be annotated, which can be understood as: dividing the to-be-annotated images in the to-be-annotated image set according to the corresponding scenes, for example: the scenario requirement information is assumed to include that the image to be annotated in the image set to be annotated needs to be covered: if the number of the scenes is 5, the image set to be annotated can be divided into 5 image subsets which are in one-to-one correspondence with the 5 scenes, and the image to be annotated in each image subset is an image which is obtained by shooting in the scene corresponding to the image subset; then, image recognition is carried out on the divided images to be annotated by adopting the existing model one by one, so that the pre-annotation result of the currently recognized image to be annotated is determined according to the recognized result (for example, whether the image comprises people, cars or other animals, and the like).

In implementation, a mode of automatic matching semantic analysis can be adopted at a system background to label the consistency classification condition of the scene corresponding to the image to be labeled in the image set to be labeled, and in the subsequent process of adopting a preset model to perform image recognition processing on each image to be labeled in the image set to be labeled, different preset models or preset models with different parameters can be adopted to perform pre-labeling processing on the image to be labeled in the scene according to different scenes.

The process of performing image recognition on the divided images to be labeled one by using the existing model may include: in the process of carrying out image identification on the target image to be annotated, carrying out reverse annotation on the target image to be annotated by adopting an image identification model matched with the annotation task requirement information so as to obtain a pre-annotation result of the target image to be annotated.

The image recognition model matched with the annotation task requirement information may include a plurality of image recognition models to respectively perform image recognition on a plurality of types of objects, for example: the annotation task requirement information indicates that image recognition needs to be performed on people, vehicles and bicycles, and the image recognition model matched with the annotation task requirement information may include: human body recognition models, vehicle recognition models, bicycle recognition models, and the like.

Of course, when a certain image recognition model can recognize a plurality of objects, one image recognition model may be used to recognize a plurality of object types indicated in the annotation task requirement information, and the method is not particularly limited herein.

103, receiving a first input of a pre-annotation result of a target image to be annotated, wherein the target image to be annotated is any image in a first image set, and the first image set is an image set in the image set to be annotated, which is matched with the annotation object requirement information and the scene requirement information.

In a specific implementation, the operation type of the first input may be a mouse input, a keyboard input, or even a touch input, and the first input may be an input performed by a user or an electronic device such as a computer, and is not limited in particular herein.

In addition, the number of the images to be labeled in the image set to be labeled is large, only one part of the images to be labeled can meet the requirement information of the labeled object and the requirement information of the scene, and only the part of the images to be labeled is subjected to manual intervention of labeling information, so that the labor consumption in the labeling process can be reduced.

In practical application, when a certain image to be labeled in an image to be labeled is labeled, the labeling quantity of a certain object in the image to be labeled is detected, and when the quantity of the objects in the labeling object demand information is reached, first prompt information can be output to prompt that the subsequent labeling of the objects is not needed; or when the labeling quantity of the scene to which the image to be labeled belongs reaches the required quantity of the scene in the scene required information, outputting second prompt information to prompt a user that the image to be labeled in the scene does not need to be labeled any more subsequently.

In the implementation, at the task start stage of labeling a frame of image to be labeled, information such as object type, initial position, scene and the like of the frame of image to be labeled can be determined through user input or according to a pre-labeling result of a labeled object, so that the system can review the validity of the frame of image to be labeled according to the information, and perform subsequent labeling tasks on the frame of image to be labeled under the condition of passing the review, wherein the pass of the review can represent that: the scene or object type corresponding to the frame of image to be labeled is the demand quantity which does not reach the requirement information of the labeling task.

And 104, responding to the first input, and updating the pre-labeling result of the target image to be labeled to obtain the target labeling result of the target image to be labeled.

In a specific implementation, the updating of the pre-annotation result of the target image to be annotated may be adjusting the pre-annotation result of the target image to be annotated, or performing supplementary annotation on an object without the pre-annotation result in the target image to be annotated.

And 105, performing error correction detection on the target labeling results of all the images to be labeled in the first image set, and outputting error correction results.

In practical applications, the error correction detection may be to determine similarity between objects of the same type according to target labeling results of all the to-be-labeled images in the first image set, and determine that the class of a certain object may be labeled incorrectly when the similarity between the certain object and an object of the other same type is significantly low.

Preferably, the feature calculation may be performed on the category information in the target labeling result to obtain the features of the labeled object, and 5 features of the same type of object are selected to calculate an average value of feature distances between the features of the remaining objects of the type and the 5 features, for example: assume that a third object of the remaining objects includes a feature: t1, the 5 features of the above objects of the same type are respectively: t1', T2', T3', T4' and T5', the average of the feature distances between the third object and the 5 features is: [ | T1-T1' + | T1-T2' | + | T1-T3' | + | T1-T4' | + | T1-T5' | ]/5, wherein, | T1-T1' | represents to solve the characteristic distance between T1 and T1', so, can express the similarity between the third object and other objects of the same type through the mean value of the characteristic distance, and the lower the similarity, represent that the type of the third object marking at present may be wrong. Therefore, when the average value of the characteristic distance is smaller than a certain threshold value, the conclusion that the labeling type of the object corresponding to the average value of the characteristic distance is wrong is obtained.

In practical application, when the conclusion that the object with the wrong labeling type is obtained, the labeling frame of the object can be highlighted on the display screen, so that a user can know that the labeling type of the object is wrong based on the highlighted labeling frame.

Further, when the conclusion of the object with the wrong annotation category is obtained, the annotation information of the object with the wrong annotation category may be adjusted based on the conclusion, or the target annotation result of the object with the wrong annotation category may be automatically deleted, another frame of image to be annotated is reselected from the image set to be annotated, and the processes from step 101 to step 104 are performed on the selected another frame of image to be annotated, so as to use the another frame of image to be annotated and the target annotation result thereof to replace the annotation information for adjustment or delete the annotation result of the object with the wrong annotation category, so as to ensure that the finally obtained target annotation results of all the images to be annotated meet the annotation task requirement information.

In the embodiment of the invention, image recognition processing is firstly carried out on each image to be annotated in the image set to be annotated through a preset model so as to determine the pre-annotation result of each image to be annotated in the image set to be annotated, and then the target annotation result of the target image to be annotated is obtained through updating by first input of the pre-annotation result of the target image to be annotated. Therefore, the annotation personnel only need to manually label or adjust the part of the images to be labeled in the image set to be labeled, and do not need to manually label all the images to be labeled in the image set to be labeled, so that the efficiency of the image labeling process is improved. In addition, in the embodiment of the present invention, error correction detection is performed on the target annotation results of all the images to be annotated in the first image set, and an error correction result is output, so that the reliability of the annotation result can be improved.

As an optional implementation manner, before the receiving the first input of the pre-annotation result of the target image to be annotated, the method further includes:

receiving a second input of a static object in the first target image to be annotated;

responding to the second input, and determining a manual annotation result of the static object in the first target image to be annotated;

according to the artificial annotation result, determining an artificial annotation result of a static object in each frame of image to be annotated in a second image set, wherein the image set to be annotated comprises the second image set, the second image set comprises the first target image to be annotated, the images to be annotated in the second image set are acquired by the same image acquisition device in the same scene, and the first image set comprises the second image set;

the receiving of the first input of the pre-annotation result of the target image to be annotated comprises:

receiving a first input of a pre-labeling result of a first object in a target image to be labeled, wherein the artificial labeling result of the first object is not matched with the pre-labeling result of the first object, and a static object in the target image to be labeled comprises the first object.

In a specific implementation, the second input may be a click input through a mouse at a position of a static object in the first target image to be labeled, and the second input may be an input performed by a user or an electronic device such as a computer, which is not limited in this respect.

Of course, in practical applications, the second input may also be a touch input, and the like, and is not limited specifically herein.

In addition, the first target image to be annotated may be any frame of image acquired by a camera with a fixed position, and preferably, the first target image to be annotated may be a first frame of image acquired by a camera with a fixed position in a certain scene. In practical applications, the fixed camera can capture multiple frames of images, and at this time, the static objects in the multiple frames of images may be the same.

On this basis, the above determining the artificial labeling result of the static object in each frame of the image to be labeled in the second image set according to the artificial labeling result can be understood as follows: for the images obtained under the camera with the same scene and the same fixed position, when the static object is marked, the position of the static object on the images is considered to be unchanged, so that one of the images is marked once, and the position of the static object marked by the subsequent images can be directly marked by using the manual marking information obtained by the first marking, thereby avoiding repeated manual marking operation.

In this embodiment, only a partial image in the multi-frame image collected by the fixed-position camera may be manually labeled, where the partial image may be an image collected by the fixed-position camera in different scenes, that is, each frame image in the second image set is an image obtained by shooting the same fixed-position camera in the same scene, so that repeated labeling of a static object in the image collected by the fixed-position camera by a user is avoided, the number of manual labeling may be reduced, and the efficiency of the image labeling process may be further improved.

As an optional implementation manner, before the receiving the second input of the static object in the first target image to be annotated, the method further includes:

receiving a third input of the first target image to be annotated;

in response to the third input, determining a target area in the first target image to be annotated;

determining a target area in each frame of image to be annotated in the second image set according to the target area in the first target image to be annotated;

the receiving of the second input of the static object in the first target image to be annotated includes:

and receiving a second input of the static object positioned in the target area in the first target image to be annotated.

Also, the above-mentioned second input may be an input for acquiring position information of the static object in the first target image to be annotated.

In a specific implementation, the target region may also be referred to as a "region of interest," which may be understood as: the object in the image region is beneficial to the AI model to be trained, or the AI model to be trained can identify the object in the image region, for example: in the process of training the intelligent driving AI model, pedestrians, automobiles and bicycles on roads are beneficial to training the intelligent driving AI model, and objects such as telegraph poles and greening on roadside are not beneficial to training the intelligent driving AI model. In addition, the image acquired by the camera may include an object which is too far away to be effectively recognized, and at this time, the target area may also be an image area where the object is located within a certain distance range from the camera.

On this basis, the above-mentioned determining the target area in the image to be annotated of each frame in the second image set according to the target area in the image to be annotated of the first target is similar to that described in the previous embodiment: according to the artificial labeling result, determining that the meanings of the artificial labeling results of the static objects in each frame of the image to be labeled in the second image set are similar, which can be specifically understood as follows: for images acquired under the same scene and the same fixed-position camera, in the process of defining the line target area, the position of the target area on the pictures is considered to be unchanged, so that a user manually defines the target area for one picture in the pictures, and the target area of the subsequent pictures can be directly marked by the first marking, so that the repeated definition of the target area is avoided.

Further, a mask tool may be used to define the region of interest, so that when the user labels the objects outside the region of interest, a prompt message may be output to prompt the user, and the objects outside the region of interest do not need to be labeled, or the region of interest is directly highlighted on the display screen, so that the user only labels the objects inside the region of interest.

In addition, in implementation, there may be a case that some objects are located in a part of the region of interest and a part of the objects are located outside the region of interest, at this time, an edge detection algorithm may be adopted to perform accuracy scoring on the history labeling task, and the labeling box is the accuracy of the minimum outsourcing matrix, where the accuracy unit is: a pixel; the semantic notation is the boundary contact ratio of the target object, and the unit of the boundary contact ratio is: %, wherein one part of the target object is positioned in the region of interest, and the other part is positioned outside the region of interest.

The process of performing accuracy scoring on the historical labeling tasks by adopting the edge detection algorithm and marking the precision of the minimum outsourcing matrix as the labeling box comprises the following steps of:

1) Respectively extracting a semantic segmentation image of an image to be annotated and a pixel set of different vertexes with a sequential relation in the semantic segmentation image;

2) Carrying out edge detection on the image to be marked according to an edge detection algorithm to obtain an edge detection image of the image to be marked;

3) Sequentially determining the coincidence degree of a plurality of connecting lines between pixel points and the edge detection image in the pixel set of every two adjacent vertexes;

4) And respectively marking the connecting lines with the highest coincidence degree with the edge detection image among the connecting lines among the pixel points in the pixel sets of every two adjacent vertexes as target edges, and marking intersection points among the target edges as target vertexes.

The method adopts the edge detection algorithm to carry out accuracy grading on the historical annotation task, and realizes optimization of the annotation of the image edge and the image vertex according to the coincidence degree of the edge image and the semantic segmentation image by adopting the mode that the annotation frame is the accuracy of the minimum outsourcing matrix, thereby improving the accuracy of the image data annotation.

In the embodiment, the number of the objects to be labeled in the image to be labeled can be reduced, so that the labeling of invalid objects is avoided, and the labeling efficiency can be improved.

As an optional implementation manner, the performing error correction detection on the target labeling results of all the images to be labeled in the first image set includes:

selecting N frames of first images from the first image set, and acquiring N pieces of feature data of N second objects in the N frames of first images, wherein the target labeling results of the N second objects all comprise target classification sub-results, and N is an integer greater than 1;

acquiring a second image in the first image set, and acquiring feature data of a third object in the second image, wherein a target labeling result of the third object comprises the target classification sub-result;

determining target similarity between the third object and the N second objects according to the feature data of the third object and the feature data of the N second objects;

and determining that the target labeling result of the third object is wrong under the condition that the target similarity is smaller than a target threshold, wherein the target threshold is inversely related to the model maturity of the image recognition model pre-associated with the target classification sub-result.

In a specific implementation, the N frames of first images may be N images including the second object randomly selected from the first image set.

In addition, the feature data of the second object or the third object may be expressed as: the labeling feature of the second object or the third object can be extracted from the target labeling result of the second object or the third object by using a feature extraction algorithm, and the feature extraction process is the same as that of the feature algorithm in the prior art and is not specifically described herein.

The determining of the target similarity between the third object and the N second objects according to the feature data of the third object and the feature data of the N second objects may be to calculate feature similarities between the feature data of the third object and the feature data of each of the N second objects, respectively, to obtain N similarity values, and take an average value of the N similarity values as the target similarity.

Of course, in practical applications, the target similarity may also be calculated in other manners, such as: acquiring average feature data of the N second objects, and then taking the calculated similarity between the third object and the average feature data as a target similarity, etc., where a process of calculating the target similarity is not particularly limited.

In addition, the target labeling results of the N second objects each include a target classification sub-result, and the target labeling result of the third object includes the target classification sub-result, which can be understood as: the N second objects and the third object belong to the same labeling class of objects.

That is, in the present embodiment, feature data of objects belonging to the same labeling category are compared with each other, similarity between the objects is measured according to a feature distance between the feature data, and it is determined that there is an error in the labeling result of an object whose similarity with the feature data of other objects is lower than a target threshold, so that the error can be corrected, and after an error correction result is obtained, the error correction result is output as a final labeling result.

In practical applications, in view of the fact that the image recognition models of different types of objects have different degrees of maturity, in the present embodiment, the target threshold and the model maturity of the image recognition model pre-associated with the target classification sub-result are inversely correlated, so that different error correction judgment criteria are adopted for the labeling results recognized by different image recognition models.

For example: assume that the object categories include: people, cars, cats and pigs, and the maturity of the image recognition models of various objects may be different from each other, then the target threshold associated with each image recognition model is as shown in table 1 below:

TABLE 1

Object classes	Model maturity rating	Target threshold
			Human being	1	0.18
Vehicle with wheels	1	0.21
			Cat (cat)	2	0.35
Pig	3	0.35

As can be seen from table 1 above, the model maturity of the human body recognition model is high, the model maturity level is 1, and the associated target threshold is: 0.18 of; the model maturity of the pig identification model is low, the model maturity level is 3, and the associated target threshold is: 0.35. i.e. the value of the target threshold is inversely related to the model maturity of the image recognition model with which it is associated.

The higher the model maturity of a certain recognition model is, the higher the accuracy of the result recognized by the recognition model is, so that the target threshold value associated with the recognition model is reduced, and the probability of judging the labeling result recognized based on the recognition model as an incorrect labeling result can be reduced; accordingly, by increasing the target threshold value associated with the recognition model having the lower model maturity, the probability of determining the labeling result recognized based on the recognition model as an erroneous labeling result can be increased. In other words, in the present embodiment, different error correction schemes are adopted for different types of objects, so as to improve the pertinence of the error correction process.

It should be noted that, in practical applications, there may be object categories that cannot be identified by the preset model, for example: the annotation task requirement information indicates that 90 types of objects need to be annotated, the existing models can only identify 60 types of objects, and the remaining 30 types of objects do not have corresponding preset models, so that the corresponding preset models cannot be used to determine the feature data of the 30 types of objects, and there is no pre-associated target threshold, and at this time, the 30 types of objects may be corrected by means of manual error correction, for example: manual sampling inspection, etc., and will not be elaborated upon herein.

For convenience of understanding, the following describes another image data set annotation method provided by the present invention by taking the flowchart shown in fig. 2 as an example, and as shown in fig. 2, the another image data set annotation method may include the following steps:

step 201, initializing a labeling task requirement.

The step may specifically include the following four initialization processes:

1. confirming the requirement of the same batch of tasks and initializing system tasks;

wherein the same batch of tasks represents: the labeling task requirements indicate a batch of image data to be labeled for a certain training scene task, the quantity of the batch of image data, the respective conditions of objects and the like, which are completely subject to the training requirements of the task.

2. Initializing the category and quantity distribution requirement of the labeled objects;

the process may specifically include:

1) And storing the overall task category and number in a dictionary form, and initializing the overall task category and number.

In the labeling process, each time the labeling of one task is completed, the category and the quantity of the whole task are updated until the labeling quantity of each task indicated in the labeling task requirement reaches the corresponding quantity.

2) And storing the single-frame category and the single-frame number in a dictionary form, and initializing the single-frame category and the single-frame number.

In the labeling process, each time the labeling of one frame of image is completed, the types and the number of the single frames are updated until the number of the labeled images of each type indicated in the labeling task requirement reaches the corresponding number.

3. Initializing a labeling scene requirement;

the process can be specifically understood as follows: the number and the distribution of the images to be annotated required in various scenes such as sunny days, cloudy days, rainy days, daytime, dark days, urban roads, suburban roads and the like are set, for example: by the following program code:

scene _ init = { sunny day: n1; in cloudy days: n2; in rainy days: n3; day time: m1; in black days: m2, urban road: s1, suburban road: s2}

The quantity distribution of the images to be annotated required in sunny days, cloudy days, rainy days, daytime, dark days, urban roads and suburban road scenes is set to be N1, N2, N3, M1, M2, S1 and S2.

Similar to the initialization of the category and quantity distribution requirements of the labeled objects, the scene requirement data is updated each time the labeling of the image to be labeled in a certain scene is completed until the labeled quantity of the scene reaches the required quantity of the scene.

4. Setting a characteristic distance threshold _ω 。

Above characteristic distance threshold _ω There may be a one-to-one correspondence with the image recognition models in the model library for use in determining whether the recognition type of the object is erroneous in a subsequent error detection (i.e., error correction) process.

Step 202, the system automatically divides a task scene set.

In this step, the system divides the image to be labeled in the image set to be labeled into different sub-image sets according to different scenes, so as to divide the scenes for labeling in the subsequent labeling process.

And step 203, the system model library automatically performs pre-labeling processing on the image.

In this step, the system model library stores a preset model in advance, and at this time, the preset model may be used to perform reverse recognition processing on the matched image to be annotated, so as to obtain a pre-annotation result of the image to be annotated.

And step 204, applying for a frame of marking tasks.

The step shows that a single frame of image is used as an annotation object of one frame of annotation task, and after the annotation task of one frame of image is completed, the annotation task of the next frame of image is applied until all images meeting the requirement information of the annotation task are annotated.

In the implementation, the single-frame annotation task is also audited in real time, at the moment, a user can manually click information such as target types, positions and scenes of the single-frame image to submit the audit, and if the audit is passed, namely the current single-frame annotation task is matched with the annotation task requirement information, the subsequent steps are executed; otherwise, applying for the next frame of labeling task, or manually clicking the target category, position, scene and other information of the single frame image again.

Step 205, judging whether a scene interesting region is defined.

If the determination result in this step is "no", step 206 is executed; if the determination result in this step is yes, step 207 is executed.

In addition, in this step, the region of interest is the target region in the embodiment of the method shown in fig. 1, and is not described herein again.

And step 206, delineating a first frame of interest region of the scene.

In this step, a mask masking tool may be used to cut the road of the image collected by the roadside unit in the same scene, and a road masking image is set to cover the road first, so as to avoid the invalidity of the subsequent automatic detection and manual review operations of the model, that is, for multi-frame images in the same scene and in the same fixed position sensing device, only the first frame image is manually and accurately defined with the region of interest, and for subsequent other frame images, the system may automatically migrate the region of interest multiplexing the first frame image as the region of interest of other frame images.

And step 207, judging whether the scene static object is marked.

If the determination result in this step is "no", step 208 is executed; if the determination result in this step is yes, step 209 is executed.

And step 208, marking the static object of the first frame of the scene.

In this step, a mask tool can be adopted to perform manual accurate initial labeling on the static object, and for a plurality of frames of images acquired by sensing equipment in the same scene and the same fixed position, only the static object in the first frame of image is manually and accurately initially labeled, and for other subsequent frames of images, the system can automatically migrate and multiplex the labeling result of the static object in the first frame of image as the labeling result of the static object in other frames of images.

And step 209, preprocessing the category information and the position information of the target object.

It can be understood that in the above step 205 and step 208, if the determination result is "yes", the region of interest, the static object label, and the like of the first frame image captured by a certain scene and the same fixed position sensing device are copied to other frame images captured by the scene and the same fixed position sensing device to determine the region of interest and the static object label in the other frame images.

And step 210, submitting the audit.

And step 211, judging whether the audit is passed.

If the determination result in this step is "no", step 209 is executed again; if the determination result in this step is yes, step 212 is executed.

The step shows that the validity of the single picture marking task is checked reasonably, and subsequent marking operation is carried out only after the validity check is passed.

Step 212, task adjustment and supplementary annotation of the frame.

This step can be understood as: and the labeling personnel performs manual intervention on the pre-labeling result so as to perform manual adjustment and supplementary operation on the category, the initial position, the scene and the like of the object.

In this step, only the objects with incomplete annotation need to be artificially annotated on the basis of the annotation results obtained in steps 203 to 209, that is, the embodiment of the present invention can narrow the range of artificial annotation on the basis of the annotation results obtained in steps 203 to 209.

And step 213, submitting the task.

Step 214, the system automatically submits an audit.

Step 215, determine whether the labeling is completed.

If the determination result in this step is "no", step 212 is executed again; if the determination result in this step is yes, step 216 is executed.

The step specifically refers to judging whether the labeling task of the single-frame image is completed.

In addition, when the annotation task of the single-frame image is completed, the single-frame image can be classified into a corresponding category directory, for example: if a picture has a 'car', 'bicycle' and 'pedestrian', the picture is classified into a catalog of a corresponding category, and the catalogues of the car, the bicycle and the person respectively comprise the picture.

And step 216, automatically calculating the marking precision by the system.

In this step, the system may use an edge detection algorithm to perform accuracy scoring on the historical annotation task. The specific working principle is the same as that of the edge detection algorithm in the embodiment of the method shown in fig. 1, and is not described herein again.

Step 217, determine whether the batch task is completed.

If the determination result in this step is "no", step 204 is executed again; if the determination result in this step is yes, step 218 is executed.

This step represents judging whether the labeling tasks of all the frame images included in the same batch of tasks are completed.

Step 218, freeze task and automatically perform tag classification error detection.

In the case of completing the batch task, multiple frames of images may be stored under each category of directory. In this step, the error correction detection can be performed on the images in each classified directory to determine whether the images in each classified directory have classification errors.

In this step, different categories may correspond to different image recognition models, and an error detection scheme matching the corresponding image recognition model is performed on the image of the category. Specifically, the similarity may be determined according to the labeling feature data of the multi-frame image in the category corresponding to a certain image recognition model, and the target threshold corresponding to the maturity of the image recognition model is obtained, so that it may be determined that the classification of the single-frame image with the similarity smaller than the target threshold is possibly wrong, and thus the labeling result of the single-frame image needs to be adjusted or replaced.

Of course, if there is no corresponding image recognition model in a certain category, a manual sampling detection mode may be adopted for performing classification error detection, which is not described herein again.

The specific implementation process of this step is the same as the error correction process in the embodiment of the method shown in fig. 1, and is not described herein again.

Step 219, judge whether the label classification is completely correct.

Under the condition that the judgment result in the step is yes, the labeling process is ended, and a labeling result is output; if the determination result in this step is "no", step 220 is executed.

And step 220, automatically classifying the task list, and performing task parameter reduction and activation.

After step 220, step 204 is performed again in accordance with the activated task.

And deleting the wrong labeling results when the labeling results of part of the objects in a certain batch of labeling tasks are wrong, wherein the number of the remaining labeling results is less than that in the labeling task requirements. In contrast, in this step, the number of corresponding categories in the annotation task requirement is reduced according to the category, the number, and the like of the objects with the wrong annotation result, so that the reactivated annotation task can make up for the category, the number, and the like of the objects with the wrong annotation result. For example: if L labeling results of M target objects in a certain batch of labeling tasks are wrong, the requirements of the remaining labeling tasks can be modified as follows: and the task requirement information is used for indicating that L labeling results of the M target objects need to be labeled, and activating to label the M target objects for L times based on the task requirement information.

As can be seen from the above, the another image data set annotation method provided by the embodiment of the present invention has the following beneficial effects:

1) Through a mask tool, in the image set to be marked under the sensing equipment with the same scene and the same fixed position, only aiming at the image to be marked of the first frame, the system can automatically migrate and multiplex the marking result of one frame of image to be marked by utilizing a manual accurate static object marking and other frames of images to be marked subsequently. The method comprises the following steps of utilizing a mask tool to cut a road for an image to be marked, collected by a road side unit under the same scene, setting a road mask image to cover the road first, and avoiding the invalidity of automatic detection and manual rechecking operation of a subsequent model;

2) A reasonable model labeling pretreatment mechanism is set, a mask tool is used for defining an interested region, and the total number and the distribution condition of the categories of a single frame of image to be labeled and the labeling requirement of the whole task are preliminarily screened in a category identification model and category pre-confirmation mode; the effectiveness of the marked image to be marked can be ensured. For the image with the label, adding a class label error detection mechanism to improve the training effect of the model;

3) For the same batch of labeling tasks, classification is carried out according to common object categories under a traffic scene, the system carries out different error detection schemes according to whether different categories have corresponding image identification models or not, a target threshold value is automatically set for each category according to the maturity of the corresponding image identification model, label error detection is carried out based on the average distance of random sample characteristics, and intelligent error detection of labeling categories is achieved.

Referring to fig. 3, an embodiment of the present invention further provides an image dataset annotation apparatus, as shown in fig. 3, the image dataset annotation apparatus 300 may include the following modules:

the first obtaining module 301 is configured to obtain annotation task demand information and an image set to be annotated, where the image set to be annotated includes at least two frames of images to be annotated, and the annotation task demand information includes scene demand information and annotation object demand information;

a first determining module 302, configured to perform image recognition processing on each image to be annotated in the set of images to be annotated by using a preset model according to the scene requirement information, so as to determine a pre-annotation result of each image to be annotated in the set of images to be annotated;

a first receiving module 303, configured to receive a first input of a pre-annotation result of a target image to be annotated, where the target image to be annotated is any image in a first image set, and the first image set is an image set of the image set to be annotated, which is matched with the annotation object requirement information and the scene requirement information;

an updating module 304, configured to update a pre-annotation result of the target image to be annotated in response to the first input, so as to obtain a target annotation result of the target image to be annotated;

and the error correction module 305 is configured to perform error correction detection on the target annotation results of all the images to be annotated in the first image set, and output an error correction result.

Optionally, the image data set annotation apparatus 300 further includes:

the second receiving module is used for receiving second input of the static object in the first target image to be annotated;

the second determination module is used for responding to the second input and determining the artificial annotation result of the static object in the first target image to be annotated;

a third determining module, configured to determine, according to the artificial annotation result, an artificial annotation result of a static object in each frame of image to be annotated in a second image set, where the image set to be annotated includes the second image set, the second image set includes the first target image to be annotated, images to be annotated in the second image set are acquired by a same image acquisition device in a same scene, and the first image set includes the second image set;

the first receiving module is specifically configured to:

receiving a first input of a pre-labeling result of a first object in a target image to be labeled, wherein the artificial labeling result of the first object is not matched with the pre-labeling result of the first object, and a static object in the first target image to be labeled comprises the first object.

Optionally, the image dataset labeling apparatus 300 further includes:

the third receiving module is used for receiving third input of the first target image to be annotated;

a fourth determining module, configured to determine, in response to the third input, a target region in the first target image to be annotated;

a fifth determining module, configured to determine, according to a target area in the first target image to be annotated, a target area in each frame of image to be annotated in the second image set;

the second receiving module is specifically configured to:

Optionally, the error correction module 305 includes:

a first obtaining unit, configured to select N frames of first images from the first image set, and obtain N pieces of feature data of N second objects in the N frames of first images, where the N first images all include the second objects, target labeling results of the N second objects all include target classification sub-results, and N is an integer greater than 1;

a second obtaining unit, configured to obtain a second image in the first image set, and obtain feature data of a third object in the second image, where the second image includes the third object, and a target labeling result of the third object includes the target classification sub-result;

a first determining unit, configured to determine target similarities between the third object and the N second objects according to the feature data of the third object and the feature data of the N second objects;

and the second determining unit is used for determining that the target labeling result of the third object is wrong under the condition that the target similarity is smaller than a target threshold, wherein the target threshold is inversely related to the model maturity of the image recognition model pre-associated with the target classification sub-result.

The image data set labeling apparatus 300 according to the embodiment of the present invention can implement the processes in the method embodiments shown in fig. 1 or fig. 2, and can obtain the same beneficial effects, which are not described herein again.

Optionally, as shown in fig. 4, an electronic device 400 according to an embodiment of the present invention is further provided, and includes a processor 401, a memory 402, and a program or an instruction stored in the memory 402 and executable on the processor 401, where the program or the instruction is executed by the processor 401 to implement the processes in the method embodiments shown in fig. 1 or fig. 2, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here.

It should be noted that the electronic device in the embodiment of the present invention includes the mobile electronic device and the non-mobile electronic device described above.

An embodiment of the present invention further provides a computer-readable storage medium, where a program or an instruction is stored on the computer-readable storage medium, and when the program or the instruction is executed by a processor, the process of the method embodiment shown in fig. 1 or fig. 2 is implemented, and the same technical effect can be achieved, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of embodiments of the present invention is not limited to performing functions in the order illustrated or discussed, but may include performing functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An image dataset labeling method, comprising:

according to the scene demand information, performing image recognition processing on each image to be annotated in the image set to be annotated by adopting a preset model so as to determine a pre-annotation result of each image to be annotated in the image set to be annotated;

and performing error correction detection on target labeling results of all images to be labeled in the first image set, and outputting error correction results.

2. The method according to claim 1, wherein before the receiving the first input of the pre-labeling result of the target image to be labeled, the method further comprises:

according to the artificial labeling result, determining an artificial labeling result of a static object in each frame of image to be labeled in a second image set, wherein the image set to be labeled comprises the second image set, the second image set comprises the first target image to be labeled, the images to be labeled in the second image set are acquired by the same image acquisition device in the same scene, and the first image set comprises the second image set;

the receiving of the first input of the pre-annotation result of the target image to be annotated includes:

3. The method according to claim 2, wherein prior to said receiving a second input for a static object in the first target image to be annotated, the method further comprises:

receiving a third input of the first target image to be annotated;

4. The method according to claim 1, wherein the performing error correction detection on the target labeling results of all the images to be labeled in the first image set comprises:

5. An image dataset labeling apparatus, comprising:

the first determining module is used for carrying out image identification processing on each image to be annotated in the image set to be annotated by adopting a preset model according to the scene demand information so as to determine a pre-annotation result of each image to be annotated in the image set to be annotated;

the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving first input of a pre-annotation result of a target image to be annotated, the target image to be annotated is any image in a first image set, and the first image set is an image set of the image set to be annotated, which is matched with the annotation object requirement information and the scene requirement information;

6. The apparatus of claim 5, further comprising:

the first receiving module is specifically configured to:

7. The apparatus of claim 6, further comprising:

the third receiving module is used for receiving a third input of the first target image to be annotated;

a fourth determining module, configured to determine, in response to the third input, a target area in the image to be annotated of the first target;

the second receiving module is specifically configured to:

8. The apparatus of claim 5, wherein the error correction module comprises:

9. An electronic device, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps in the image dataset annotation method according to any one of claims 1 to 4.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps in the image dataset labeling method according to any one of claims 1 to 4.