CN112488222B

CN112488222B - Crowdsourcing data labeling method, system, server and storage medium

Info

Publication number: CN112488222B
Application number: CN202011418175.6A
Authority: CN
Inventors: 何云; 熊迹; 何豪杰; 罗跃军
Original assignee: Heading Data Intelligence Co Ltd
Current assignee: Heading Data Intelligence Co Ltd
Priority date: 2020-12-05
Filing date: 2020-12-05
Publication date: 2022-07-01
Anticipated expiration: 2040-12-05
Also published as: CN112488222A

Abstract

The invention relates to a crowdsourcing data annotation method, a system, a server and a readable storage medium, which are used for identifying a target image on an image to be processed, extracting image characteristics, matching the image characteristics with the recorded processing characteristics of an annotator and selecting the annotator who is good at processing the image characteristics to annotate the image to be processed, thereby solving the technical problem of low annotation efficiency caused by different background technologies, professional knowledge and interests of the annotator in the prior art because the selection mode of the existing annotator does not consider the difference among the annotators.

Description

Crowdsourcing data labeling method, system, server and storage medium

Technical Field

The invention relates to the technical field of internet, in particular to a crowdsourcing data labeling method, a crowdsourcing data labeling system, a server and a storage medium.

Background

Data annotation is an action of processing artificial intelligence learning data by a data processing person with the help of a labeling tool, and generally, the types of data annotation include: image annotation, voice annotation, text annotation, video annotation, and the like. The basic forms of the marks are annotated picture frames, 3D picture frames, text transcription, image dotting, target object contour lines and the like.

The image annotation and the video annotation are classified according to the working content of the data annotation, and can be called image annotation in a unified way, because the video is also formed by continuously playing the images. In a real application scene, the method is often applied to face recognition, automatic driving vehicle recognition and the like of image data annotation. In terms of automatic driving, how can an automobile recognize vehicles, pedestrians, obstacles, greenbelts, and even sky when it is automatically driven? Image annotation is different from voice annotation, because an image comprises morphology, target points and structural division, and the data requirement cannot be met only by marking by characters. Therefore, the data annotation of the graph requires a relatively complex process, and the data annotator needs to perform contour marking on different target markers with different colors, then label the corresponding contours, and use the labels to summarize the contents in the contours. In order to allow the model to identify the different markers of the image.

The traditional manual labeling is that a manager arranges a labeling standard, a labeling person carries out data labeling, then the normalization of the labeled data is checked and modified, and common crowdsourcing labeling is that mass data is divided into a plurality of simple subtasks and then distributed to a large number of volunteers through a network platform for labeling. However, data labeling is actually a relatively complex process, and a great deal of time is consumed for data sorting, publishing, labeling, quality inspection and submission; in addition, since the professional knowledge background and the interested field of each annotator are different, the understanding and cognition of the annotation specification are different, and the annotated data result and the annotated data annotation time are different from person to person.

Disclosure of Invention

The invention provides a crowd-sourced data labeling method, a system, a server and a storage medium, which are used for solving the technical problems that in the prior art, due to the fact that professional knowledge backgrounds and interested fields of each label maker are different, understanding and cognition of labeling specifications are different, and therefore the result deviation of partial labeled data is large and the time consumption of partial data labeling is long.

The invention solves the technical problem and provides a crowdsourcing data marking method, which comprises the following steps:

identifying a target image on an image to be processed, cutting the target image on the image to be processed, and extracting image characteristics of the target image;

recording processing characteristics when an annotator marks an image, and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics and the image characteristics;

and calculating the matching degree of the image to be processed and all the annotators, and selecting the annotator with the highest matching degree to finish the annotation of the image to be processed.

Preferably, the step of identifying the target image on the image to be processed, cutting the target image on the image to be processed, and extracting the image feature of the target image specifically includes:

identifying a target image on an image to be processed by using a target detection algorithm, and generating positioning information of the target image;

cutting the target image according to the positioning information, preprocessing the target image, and storing the target image in a target database;

and extracting the target image in the target database by using a residual error network to obtain image characteristics.

Preferably, the step of recording the processing characteristics of the annotator when annotating the image and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics with the image characteristics further includes:

performing primary classification on the target image according to the image characteristics;

selecting target images of different categories from the target database, pushing the target images to a annotator, and prompting the annotator to select familiar category images;

and responding to the category image selected by the annotator, and pushing the to-be-processed image corresponding to the target image of the selected category to the annotator.

and identifying the target image on the image to be processed by using a full connection layer of a convolutional neural network model, extracting the image characteristics of the target image, and cutting the target image on the image to be processed.

Preferably, the step of recording the processing feature when the annotator annotates the image, and matching the processing feature with the image feature to obtain the matching degree between the processing feature and the image feature specifically includes:

recording processing characteristics of the annotators when the images are annotated, and calculating the processing capacity of the annotators on the images of different categories by using a crowdsourcing layer of a convolutional neural network model according to the processing characteristics of all the annotators;

calculating the similarity degree of the target image and different types of images according to the image characteristics of the target image by using an output layer of a convolutional neural network model;

and multiplying the similarity degree of any type of the target image and the processing capacity of the type of the annotator to obtain the matching degree of the type of the target image, and adding the matching degrees of the target image and all types of the annotator to obtain the matching degree of the processing characteristics and the image characteristics.

Preferably, the method further comprises the following steps:

extracting images in a sample database by using a residual error network to obtain sample characteristics, and sequentially passing the sample characteristics through a full connection layer, an output layer and a crowdsourcing layer to obtain a training result;

and comparing the training result with the corresponding image category data in the sample database, calculating the accuracy, and repeatedly carrying out forward propagation and backward propagation to obtain the convolutional neural network model with the highest accuracy.

Preferably, the method further comprises the following steps:

and the annotation of the image to be processed and the image to be processed, which are finished by the annotator with the highest matching degree, are saved to the sample database.

The invention also provides a crowdsourcing data labeling system, which comprises the following components:

the characteristic extraction unit is used for identifying a target image on an image to be processed, cutting the target image on the image to be processed and extracting the image characteristic of the target image;

the characteristic matching unit is used for recording the processing characteristics of the annotator when the annotator marks the image and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics and the image characteristics;

and the crowdsourcing labeling unit is used for calculating the matching degree of the image to be processed and all the labels, and selecting the label with the highest matching degree to finish the labeling of the image to be processed.

The invention also provides a crowdsourcing data annotation server, which comprises: the system comprises a memory, a processor and a crowdsourced data annotation program stored on the memory and capable of running on the processor, wherein the crowdsourced data annotation program when executed by the processor realizes the steps of the crowdsourced data annotation method.

The invention further provides a readable storage medium, on which a crowdsourcing data annotation program is stored, and when being executed by a processor, the crowdsourcing data annotation program implements the steps of the crowdsourcing data annotation method as described above.

According to the method and the device, different flow-free rules are set for different accounts, the flow type of the terminal device in use is counted, free rewarding for partial flow is achieved, inconvenience caused by limitation of a blacklist mode on network use in the prior art is solved, the degree of fine management is improved, the effect of guiding students to use the internet to improve learning efficiency is achieved, and user experience is improved.

Drawings

FIG. 1 is a schematic diagram of a server structure of a hardware operating environment according to an embodiment of the crowdsourced data annotation method of the invention;

FIG. 2 is a flowchart illustrating a method for annotating crowdsourced data according to another embodiment of the invention;

FIG. 3 is a flowchart illustrating a method for annotating crowdsourced data according to another embodiment of the invention;

FIG. 4 is a flowchart illustrating a method for annotating crowdsourced data according to another embodiment of the invention;

FIG. 5 is a functional block diagram of the crowd-sourced data annotation system of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with specific embodiments, the examples given are intended to illustrate the invention and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a server structure of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the server may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may comprise a Display screen (Display), and the optional user interface 1003 may also comprise a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage server separate from the processor 1001.

Those skilled in the art will appreciate that the architecture shown in FIG. 1 does not constitute a limitation on the servers, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a crowdsourced data annotation program.

In the network device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting peripheral equipment; the network device invokes, via the processor 1001, the crowdsourced data tagging program stored in the memory 1005, and performs the following operations:

Further, the steps of identifying a target image on the image to be processed, cutting the target image on the image to be processed, and extracting the image features of the target image specifically include:

Further, the step of recording the processing characteristics of the annotator when annotating the image and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics with the image characteristics further comprises the following steps:

Further, the step of recording the processing characteristics of the annotator when annotating the image, and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics and the image characteristics specifically includes:

recording the processing characteristics of the annotators when the annotators mark the images, and calculating the processing capacity of the annotators on the images of different categories according to the processing characteristics of all the annotators by using a crowdsourcing layer of a convolutional neural network model;

Further, the crowd-sourced data annotation method further comprises the following steps:

The embodiment is through the processing characteristic of gathering the annotator and the image characteristic of annotated image, then carry out the mark to it according to the most suitable annotator of image characteristic matching, greatly reduced because every annotator professional knowledge background is different with the field of interest, also there is the discrepancy to the understanding and the cognition of annotation standard, the data result deviation that leads to part to annotate is great and the long technical problem of part data annotation consuming time has reached the effect that improves annotation quality and annotation speed, the labeling time has been reduced, the efficiency of annotating has been promoted.

Based on the hardware structure, the embodiment of the crowdsourcing data annotation method is provided.

The crowd-sourced data annotation method described with reference to fig. 2 includes the following steps:

s10, identifying a target image on the image to be processed, cutting the target image on the image to be processed, and extracting the image characteristics of the target image;

it is easy to understand that the existing image recognition algorithm is very complete, and for the image type needing to be labeled, the process of image recognition can be quickly completed only by training the corresponding model in advance, and because a plurality of parts needing to be labeled may exist in the image to be processed, the embodiment cuts the recognized target image, and then extracts the features independently, thereby ensuring the accuracy of the corresponding relation between the image features and the single image category.

S20, recording processing characteristics of the annotator when the annotator marks the image, and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics and the image characteristics;

it should be noted that, in this embodiment, the image characteristics of the annotator during the annotation of the image at least include time consumption for annotation and availability for annotation, where statistics of time consumption for annotation is easy, and statistics of availability for annotation may be: and sending the image to be processed to a plurality of annotators, taking the annotation with the same annotation being more than or equal to 2 as an available annotation, and sending the rest annotations to other annotators with higher matching degree to select the available annotation.

And S30, calculating the matching degree of the image to be processed and all the annotators, and selecting the annotator with the highest matching degree to finish the annotation of the image to be processed.

It is worth emphasizing that whether the annotators can complete the available annotation or not is considered preferentially under the normal condition, strong correlation exists between the matching degree and the available annotation, when the annotators are judged to be 100% capable of completing the available annotation, the matching degree of the annotators can reach the maximum value, when the number of the annotators with the annotation experience is large, the annotators with the highest matching degree exist in a plurality of situations, at the moment, the annotators are matched according to the parameters of the residual workload, the annotation activity and the like, and the annotators which can complete the annotation can be selected to issue the annotation task as soon as possible.

In the embodiment, the image features in the target image identified by the image to be labeled are matched with the processing features of the label, and the label with the highest matching degree is selected to issue the labeling task, so that the professional knowledge background and the interested field of the label are closer to the content of the image to be labeled, the data deviation of the label is reduced, the usability of the label is improved, the workload of quality testing personnel is reduced when the label is labeled, and the labeling experience of the label is improved.

Referring to fig. 3, the identifying a target image on an image to be processed, cutting the target image on the image to be processed, and extracting image features of the target image specifically includes:

s11, identifying a target image on the image to be processed by using a target detection algorithm, and generating positioning information of the target image;

it is easy to understand that, in the embodiment, a target image on an image to be processed is identified by using a target detection algorithm yolov3 (a name of an object detection algorithm), the algorithm is an end-to-end detection algorithm, and has high real-time performance and accuracy.

S12, cutting the target image according to the positioning information, preprocessing the target image, and storing the target image in a target database;

it should be noted that, because the pixels of the clipped target image are different in size and not beneficial to subsequent processing, the pixels of the clipped target image need to be preprocessed to unify the pixel size and the file format, and then the preprocessed pixels are stored in specific data, which is convenient for subsequent use and storage.

And S13, extracting the target image in the target database by using a residual error network to obtain image characteristics.

It should be emphasized that in this embodiment, the feature extraction is performed by using a residual connection method using resnet34 (a residual network algorithm), which can solve the problem of gradient disappearance or gradient explosion to some extent, and the feature extraction is performed on the preprocessed target image by using the resnet34 that removes the last full connection layer and only uses the previous 33-layer convolution.

Referring to fig. 4, the step of recording the processing characteristics of the annotator when annotating the image and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics with the image characteristics further includes:

s21, performing primary classification on the target image according to the image characteristics;

it is easy to understand that the preliminary classification is to classify the objects recognized in the image into large classes, for example, the parking prohibition, the speed limit 30, the speed limit 60, and the like are equally divided into the large classes of the traffic signboards, and the specific signboard information is classified by the annotator.

S22, selecting target images of different categories from the target database, pushing the target images to a annotator, and prompting the annotator to select familiar category images;

it should be noted that, because some classification information needs to have certain related knowledge, random classification may cause the situation that the annotator cannot label or label wrongly, which wastes the time of the annotator and reduces the availability of labeling.

S23, responding to the category image selected by the annotator, and pushing the to-be-processed image corresponding to the target image of the selected category to the annotator.

It is worth emphasizing that, due to the defect of the target detection algorithm, a part of objects in the image to be processed may be identified incorrectly or unsuccessfully, and a situation of inaccurate positioning may occur in a part of scenes, so that the complete image to be processed is pushed to the annotator, and the annotator corrects the image, thereby further ensuring the annotating usability.

Specifically, the steps of identifying a target image on an image to be processed, cutting the target image on the image to be processed, and extracting image features of the target image specifically include:

It should be noted that, in the embodiment, the to-be-processed image is identified and the features of the to-be-processed image are extracted by using the convolutional neural network model trained in advance, and along with the continuous increase of the number of samples, the identification accuracy of the image identification can be effectively improved.

Specifically, the step of recording the processing feature when the annotator annotates the image, and matching the processing feature with the image feature to obtain the matching degree between the processing feature and the image feature specifically includes:

it is easy to understand that, in general, the maximum value of the processing capacity of the same annotator for a certain category is 1 by default, that is, all completed annotations are available annotations, and the minimum value is 0, that is, all completed annotations are unavailable annotations.

it should be noted that the output layer calculates the probability of which type of tag the object belongs to by using the softmax function; the crowd-layer is to train the output layer and the image type to obtain a relation mapping, and convert the probability value of the image belonging to a certain class into the labeling availability of the labeling person for each class of targets.

Multiplying the similarity degree of any type of the target image and the processing capacity of the type of the annotator to obtain the matching degree of the type of the target image, and adding the matching degrees of the target image and all types of the annotator to obtain the matching degree of the processing characteristics and the image characteristics

It is worth emphasizing that, in the present embodiment, the matching degree is strongly related to the annotation availability, and the present embodiment uses the product of the similarity between the image and a certain category and the processing capability of the annotator on the image of the category as the coefficient of the annotation availability of the annotator on the image of the category, and then adds the coefficients of all categories of the image to obtain the overall matching degree of the image.

Specifically, the method further comprises the following steps:

it should be noted that, in this embodiment, the residual error network is introduced into the convolutional neural network model, and the sample features are extracted through the residual error network, which can approach the problem of gradient disappearance or gradient explosion to a certain extent, so as to improve the identification accuracy of the convolutional neural network model.

It can be understood that, because the data in the sample database is used, the image features have corresponding labels and can be used for training the convolutional neural network model, so that the technical scheme is perfected and the identification accuracy is improved by disclosing the establishment of the sample database and the training method of the convolutional neural network model.

Specifically, the method further comprises the following steps:

It is emphasized that, in the embodiment, the number of samples of the convolutional neural network model is increased and the identification accuracy of the convolutional neural network model is improved by adding the labeled image and the labeled data into the sample database;

it should be added that, in this embodiment, the processing characteristics of the new annotator can be identified and the processing capability of part of the annotators can be improved by using the data in the sample database, for example, when the new annotator performs the previous annotation, the data in the sample database can be used, the annotation result is compared with the result in the sample database, the processing characteristics and the processing capability of the new annotator can be quickly obtained, and meanwhile, when the annotation result is found to be unavailable, the annotation result in the sample database is pushed to the new annotator, which helps to improve the processing capability of the annotator.

The embodiment perfects the technical scheme by disclosing the specific methods of image identification, feature extraction and matching degree calculation, improves the identification capability and feature acquisition capability of a target image, and simultaneously improves the matching accuracy of a marker, thereby achieving the identification of professional knowledge background and interested fields of the marker, and the acquisition of understanding and cognition of a marking standard, effectively solving the technical problems of larger data result deviation of partial marking and longer time consumption of partial data marking caused by the fact that the marker pushes a to-be-processed image without referring to the features, achieving the technical effects of improving the marking quality of the marker, improving the marking speed of the marker and improving the marking experience of the marker.

the feature extraction unit 10 is configured to identify a target image on an image to be processed, cut the target image on the image to be processed, and extract an image feature of the target image;

the feature matching unit 20 is configured to record a processing feature when the annotator marks an image, and match the processing feature with the image feature to obtain a matching degree between the processing feature and the image feature;

and the crowdsourcing marking unit 30 is configured to calculate matching degrees of the image to be processed and all markers, and select the marker with the highest matching degree to complete marking of the image to be processed.

Since the system adopts all the technical solutions of all the embodiments, all the beneficial effects brought by the technical solutions of the embodiments are achieved above, and are not described in detail herein.

The invention also provides a crowdsourcing data annotation server, which comprises: the server adopts all technical schemes of all the embodiments, so that all beneficial effects brought by the technical schemes of the embodiments are achieved, and the description is omitted.

The invention further provides a readable storage medium, where a crowdsourcing data tagging program is stored, and when the crowdsourcing data tagging program is executed by a processor, the steps of the crowdsourcing data tagging method are implemented.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A crowdsourcing data labeling method is characterized by comprising the following steps:

calculating the matching degree of the image to be processed and all the annotators, and selecting the annotator with the highest matching degree to finish the annotation of the image to be processed;

the step of recording the processing characteristics of the annotator when the annotator marks the image, and matching the processing characteristics with the image characteristics to obtain the matching degree of the processing characteristics and the image characteristics specifically comprises:

2. The method according to claim 1, wherein the identifying a target image on the image to be processed, clipping the target image on the image to be processed, and extracting image features of the target image specifically comprises:

and extracting the target image in the target database by using a convolutional neural network model to obtain image characteristics.

3. The method as claimed in claim 2, wherein the step of recording a processing feature when the annotator annotates the image, matching the processing feature with the image feature, and obtaining a matching degree between the processing feature and the image feature further comprises:

4. The method according to claim 3, wherein the step of identifying the target image on the image to be processed, cutting the target image on the image to be processed, and extracting the image feature of the target image specifically comprises:

5. The method of claim 1, further comprising:

extracting images in a sample database by using a convolutional neural network model to obtain sample characteristics, and sequentially passing the sample characteristics through a full connection layer, an output layer and a crowdsourcing layer to obtain a training result;

6. The method of claim 5, further comprising:

7. A crowd-sourced data annotation system, comprising:

the crowdsourcing marking unit is used for calculating the matching degree of the image to be processed and all the markers, and selecting the marker with the highest matching degree to finish marking of the image to be processed;

8. A server, characterized in that the server comprises: memory, a processor and a crowdsourced data annotation program stored on the memory and executable on the processor, the crowdsourced data annotation program when executed by the processor implementing the steps of the crowdsourced data annotation method of any one of claims 1 to 6.

9. A readable storage medium, on which a crowdsourced data annotation program is stored, the crowdsourced data annotation program, when executed by a processor, implementing the steps of the crowdsourced data annotation method of any one of claims 1 to 6.