CN110222791B

CN110222791B - Sample labeling information auditing method and device

Info

Publication number: CN110222791B
Application number: CN201910538177.XA
Authority: CN
Inventors: 徐青松; 李青
Original assignee: Hangzhou Glority Software Ltd
Current assignee: Hangzhou Ruisheng Software Co Ltd
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2020-12-04
Anticipated expiration: 2039-06-20
Also published as: CN110222791A; WO2020253636A1

Abstract

The invention provides a method and a device for auditing sample labeling information, wherein the method comprises the following steps: acquiring a labeled sample to be examined and forming a training sample set; dividing the training sample set into a preset number of sub-sample sets, respectively training different sub-sample sets, and establishing different first recognition models; acquiring an identification sample set used for testing, identifying each identification sample in the identification sample set through different established first identification models to obtain an identification result of each first identification model on the identification sample, counting the occurrence frequency of each identification result, and determining the first identification model corresponding to the identification result with the occurrence frequency smaller than a preset threshold value as a target identification model when the identification result with the occurrence frequency not smaller than the preset threshold value exists; and performing annotation information examination on the annotated samples in the sub-sample set corresponding to the target identification model. By applying the scheme provided by the invention, the labeling result of the sample can be quickly examined.

Description

Sample labeling information auditing method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for auditing sample labeling information, electronic equipment and a computer-readable storage medium.

Background

In the field of artificial intelligence model training, samples need to be labeled, for example, the samples are labeled manually, or the samples are identified and labeled automatically through a pre-established neural network identification model. In order to ensure the accuracy of model training, whether the labeling information of the sample is accurate or not needs to be checked.

Currently, the annotation information of all annotated samples is usually reviewed manually. However, since the number of samples in the sample set is large, it takes much time and labor to review the labeled information of the samples.

Disclosure of Invention

The invention aims to provide a method and a device for auditing sample labeling information, electronic equipment and a computer readable storage medium, which are used for quickly auditing the labeling information of a sample. The specific technical scheme is as follows:

in a first aspect, the present invention provides a method for auditing sample labeling information, where the method includes:

acquiring a labeled sample to be examined and forming a training sample set; wherein, the labeling sample is labeled with labeling information in advance;

dividing the training sample set into a preset number of sub-sample sets, respectively training different sub-sample sets, and establishing different first recognition models; the first recognition model is a neural network-based model;

acquiring an identification sample set used for testing, identifying each identification sample in the identification sample set through different established first identification models to obtain an identification result of each first identification model on the identification sample, counting the occurrence frequency of each identification result, and determining the first identification model corresponding to the identification result with the occurrence frequency smaller than a preset threshold value as a target identification model when the identification result with the occurrence frequency not smaller than the preset threshold value exists;

and performing annotation information examination on the annotated samples in the sub-sample set corresponding to the target identification model.

Optionally, when there are a plurality of recognition results whose occurrence times are not less than the preset threshold, the method further includes:

determining a first recognition model corresponding to the target recognition result as a target recognition model;

and the target recognition result is a recognition result except the recognition result with the largest occurrence frequency in the plurality of recognition results.

Optionally, when the occurrence frequency of each recognition result is less than the preset threshold, the method further includes:

and auditing the identification sample to obtain an auditing result of the identification sample.

Optionally, after the audit is performed on the identification sample, the method further includes:

judging whether the auditing result of the identification sample exists in the identification results of different first identification models for the identification sample;

if the identification result is different from the auditing result, determining a first identification model with the identification result different from the auditing result as a target identification model;

if not, all of the first recognition models are determined to be target recognition models.

Optionally, the preset number is greater than or equal to 3.

Preferably, the preset number is greater than or equal to 5.

Optionally, the dividing the training sample set into a preset number of sub-sample sets includes:

and averagely dividing the training sample set into a preset number of sub-sample sets, wherein the difference between the number of samples in any two sub-sample sets is less than or equal to 1.

Optionally, the obtaining of the identification sample set used as the test includes:

and acquiring part of the labeled samples from the labeled samples needing to be examined to form an identification sample set used for testing.

Optionally, the examining and verifying labeled information of labeled samples in the sub-sample set corresponding to the target recognition model includes:

and sending the labeled samples in the sub-sample set corresponding to the target identification model to a verification client so that the verification client can perform labeled information verification on the received labeled samples.

Optionally, the verification client is a client that audits the received marked sample through a second recognition model established through pre-training, and the recognition accuracy of the second recognition model is higher than a certain threshold; or

The verification client is a client for carrying out manual examination on the received labeling sample.

In a second aspect, the present invention further provides an apparatus for auditing sample labeling information, where the apparatus includes:

the acquisition module is used for acquiring the marked samples needing to be audited and forming a training sample set; wherein, the labeling sample is labeled with labeling information in advance;

the training module is used for dividing the training sample set into a preset number of sub sample sets, respectively training different sub sample sets and establishing different first recognition models; the first recognition model is a neural network-based model;

the identification module is used for acquiring an identification sample set used for testing, identifying each identification sample in the identification sample set through different established first identification models respectively to obtain an identification result of each first identification model on the identification sample, counting the occurrence frequency of each identification result, and determining the first identification model corresponding to the identification result with the occurrence frequency smaller than a preset threshold value as a target identification model when the identification result with the occurrence frequency not smaller than the preset threshold value exists;

and the auditing module is used for auditing the labeling information of the labeling samples in the sub-sample set corresponding to the target identification model.

Optionally, the identification module is further configured to:

when a plurality of recognition results with the occurrence times not less than a preset threshold exist, determining a first recognition model corresponding to the target recognition result as a target recognition model;

Optionally, the identification module is further configured to:

and when the occurrence frequency of each identification result is smaller than the preset threshold value, auditing the identification sample to obtain an auditing result of the identification sample.

Optionally, the identification module is further configured to:

after the identification sample is audited, judging whether the audit result of the identification sample exists in the identification results of the identification sample of different first identification models; if the identification result is different from the auditing result, determining a first identification model with the identification result different from the auditing result as a target identification model; if not, all of the first recognition models are determined to be target recognition models.

Optionally, the preset number is greater than or equal to 3.

Preferably, the preset number is greater than or equal to 5.

Optionally, the training module divides the training sample set into a preset number of sub-sample sets, specifically:

Optionally, the identification module obtains an identification sample set used for the test, specifically:

Optionally, the auditing module performs annotation information auditing on the annotated samples in the sub-sample set corresponding to the target identification model, specifically:

In a third aspect, the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the steps of the method for auditing sample labeling information according to the first aspect when executing the program stored in the memory.

In a fourth aspect, the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for auditing the sample annotation information according to the first aspect.

Compared with the prior art, the invention firstly obtains the marked samples to be checked and forms the training sample set, then divides the training sample set into a preset number of sub-sample sets, respectively training different sub-sample sets, establishing different first recognition models, then obtaining recognition sample sets used for testing, respectively recognizing each recognition sample in the recognition sample sets through the established different first recognition models to obtain the recognition result of each first recognition model to the recognition sample, counting the occurrence frequency of each recognition result, when the recognition result with the occurrence frequency not less than the preset threshold exists, determining a first recognition model corresponding to the recognition result with the occurrence frequency less than the preset threshold as a target recognition model, and then performing annotation information auditing on the annotated samples in the subset sample set corresponding to the determined target identification model. Compared with the mode of manually auditing the labeling information of all samples in a sample set in the prior art, the method and the device can realize the quick auditing of the labeling information of the samples, and reduce the time and labor cost; meanwhile, the training sample set is divided into a plurality of sub-sample sets and a plurality of first recognition models are trained and established, and the method is particularly suitable for scenes in which the training sample set contains a large number of marked samples needing to be audited.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an auditing method for sample annotation information according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an apparatus for auditing sample annotation information according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following describes a method, an apparatus, an electronic device, and a computer-readable storage medium for auditing sample annotation information according to embodiments of the present invention in further detail with reference to the accompanying drawings. The advantages and features of the present invention will become more fully apparent from the appended claims and the following description. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.

In order to solve the problems in the prior art, embodiments of the present invention provide an auditing method and apparatus for sample annotation information, an electronic device, and a computer-readable storage medium.

It should be noted that the method for auditing sample annotation information according to the embodiment of the present invention can be applied to an apparatus for auditing sample annotation information according to the embodiment of the present invention, and the apparatus for auditing sample annotation information can be configured on an electronic device. The electronic device may be a personal computer, a mobile terminal, and the like, and the mobile terminal may be a hardware device having various operating systems, such as a mobile phone and a tablet computer.

Fig. 1 is a schematic flowchart of an auditing method for sample annotation information according to an embodiment of the present invention. Referring to fig. 1, an auditing method for sample annotation information may include the following steps:

step S101, obtaining a labeled sample needing to be examined and forming a training sample set; and the labeling sample is labeled with labeling information in advance.

The labeled sample to be audited may be a sample identified and labeled by a human client, or may be a sample automatically identified and labeled by a recognition model established through pre-training, which is not limited in this embodiment.

The marked sample can be pictures of various objects of different types, such as test paper, animals and plants, scenic spots, vehicles, human faces or parts of human body components, articles, bills and the like. Taking a test paper as an example, the process of labeling the test paper sample may be: the method comprises the steps of identifying the areas of all questions on a test paper by using an area identification model, segmenting all the areas to form area sample pictures, identifying the character content of each area sample picture by using a character identification model, and carrying out labeling processing.

The type of the labeled sample is not limited in this embodiment, but the types of the labeled samples constituting the same training sample set must be the same, and the types of the labeling information of the labeled samples must also be the same. For example, each labeled sample forming the training sample set a is a picture containing characters, and the labeled information is the content of the characters on the picture. For another example, all the labeled samples forming the training sample set B are face images, and the labeled information is gender. For another example, all the labeled samples forming the training sample set C are face images, and the labeled information is age. In practical applications, for the training sample sets B and C, the samples in the two sample sets may be the same, but because the type of the labeled information is different, two different training sample sets are formed.

Step S102, dividing the training sample set into a preset number of sub-sample sets, respectively training different sub-sample sets, and establishing different first recognition models; the first recognition model is a neural network-based model.

In this embodiment, the labeled samples in the training sample set may be distributed to a preset number of sub-sample sets as uniformly as possible according to the actual number, and the added remainder is sequentially distributed to one sample to the corresponding sub-sample set in sequence, respectively, until all the samples are completely distributed.

Specifically, the training sample set is equally divided into a preset number of sub-sample sets, and the difference between the numbers of samples in any two sub-sample sets is less than or equal to 1. For example, the number of samples in the training sample set is 1002, and the number of the sub-sample sets is set to 10, then according to the above allocation principle, 1000 samples are first equally allocated to 10 sub-sample sets, and then the remaining 2 samples are allocated to 2 of the sub-sample sets, so that the difference between the numbers of samples in any two sub-sample sets does not exceed 1.

The distribution formula can enable the number of samples in each sub-sample set to be approximately the same, and further enables the recognition accuracy of each first recognition model established by training not to be different due to the number difference of training samples when each sub-sample set is trained to establish a plurality of first recognition models respectively.

As will be appreciated by those skilled in the art, the first recognition model is trained based on the samples in the subset and the label information of each sample. And if the sample types are different, or the sample types are the same but the types of the labeling information are different, the first recognition models established by training are different. For example, if the sample picture is a picture containing characters and the labeling information is character content, the first recognition model established by training is the character recognition model. If the sample is a face image and the annotation information is gender, the trained and established first recognition model is used for recognizing the gender of the person in the face image, and if the sample is the face image and the annotation information is the age of the person, the trained and established first recognition model is used for recognizing the age of the person in the image. In step S101, the types of the labeled samples in the training sample set are the same, and the types of the labeled information are the same, so that the types of the first recognition models established by training are the same.

Each first recognition model established by training is a model based on a neural network, and further can be a deep convolutional neural network or other neural network models, such as R-CNN, Fast R-CNN, SPP-net, R-FCN, FPN, YOLO, SSD, DenseBox, RetinaNet, and RRC detection combined with RNN algorithm, Deformable CNN combined with DPM, and the like.

In general, the preset number may be set to 3 or more, and preferably, the number is set to 5 or more. The preset number can be determined according to the number of the labeled samples in the training sample set in practical application.

Step S103, obtaining an identification sample set used for testing, identifying each identification sample in the identification sample set through different established first identification models to obtain an identification result of each first identification model to the identification sample, counting the occurrence frequency of each identification result, and determining the first identification model corresponding to the identification result with the occurrence frequency smaller than a preset threshold value as a target identification model when the identification result with the occurrence frequency not smaller than the preset threshold value exists.

In this embodiment, part of the labeled samples may be obtained from the labeled samples that need to be audited to form an identification sample set used for testing, for example, part of the labeled samples may be randomly extracted from the training sample set, the extraction ratio may be 5% to 20%, and the ratio of the number of the samples in the identification sample set to the total number of the samples in the training sample set may be adjusted according to the audit result.

For each identification sample, a plurality of different first identification models established in step S102 are respectively identified to obtain an identification result of each first identification model for the identification sample. For example, the number of the first recognition models established in step S102 is 10, and the recognition result of each first recognition model for three recognition samples X, Y, Z is shown in table one below:

watch 1

First recognition model	Identifying sample X	Identifying sample Y	Identifying sample Z
				Model 1	A	A	A
Model 2	B	B	A
				Model 3	C	C	A
Model 4	A	A	B
				Model 5	A	A	B
Model 6	A	A	C
				Model 7	A	A	C
Model 8	A	C	C
				Model 9	A	C	D
Model 10	C	C	D

The occurrence frequency of each identification result corresponding to the identification sample X is counted, so that the identification result A appears 7 times, the identification result B appears 1 time, and the identification result C appears 2 times. If the preset threshold is set to 4, because the number of occurrences of the recognition result a is the largest and the number of occurrences 7 exceeds the preset threshold, the recognition result a can be considered to be a correct recognition result of the recognition sample X, that is, the first recognition model with the recognition result a can correctly recognize the recognition sample X, and the first recognition model with the recognition result B or C (models 2, 3, 10) cannot correctly recognize the recognition sample X, and further, the labeling information used for training the labeling samples in the subset sample set of the established models 2, 3, 10 may not be accurate, so that the recognition accuracy of the three models is low. Therefore, in the process for the recognition sample X, the models 2, 3, 10 are determined as the target recognition models.

Further, when a plurality of recognition results of which the occurrence times are not less than a preset threshold value exist, the first recognition model corresponding to the target recognition result can be determined as the target recognition model; the target recognition result is a recognition result other than the recognition result with the largest occurrence number in the plurality of recognition results, that is, the occurrence number of the target recognition result is the largest in the recognition results with the occurrence number not less than a preset threshold.

For example, in the recognition results of the 10 first recognition models in the table i on the recognition sample Y, the recognition result a appears 5 times, the recognition result B appears 1 time, and the recognition result C appears 4 times. If the preset threshold is set to 4, the occurrence frequency of the recognition result a and the occurrence frequency of the recognition result C both exceed the preset threshold, but the occurrence frequency of the recognition result a is greater than the occurrence frequency of the recognition result C, the probability that the recognition result a is the correct recognition result of the recognition sample Y may be considered to be greater, and the recognition accuracy of the first recognition model with the recognition result C is lower, so that the labeling information of the labeled samples in the sub-training set corresponding to the first recognition model (models 3, 8, 9, 10) with the recognition result C may not be accurate. Therefore, in the processing for the recognition sample Y, not only the model 2 but also the models 3, 8, 9, and 10 may be determined as the target recognition model.

In practical application, if the occurrence frequency of each recognition result is found to be smaller than the preset threshold after counting the occurrence frequency of each recognition result, the recognition sample can be audited to obtain the audit result of the recognition sample. Further, whether the auditing result of the identification sample exists in the identification results of the identification sample of different first identification models is judged; if the identification result is different from the auditing result, determining a first identification model with the identification result different from the auditing result as a target identification model; if not, all of the first recognition models are determined to be target recognition models.

For example, in the recognition results of the 10 first recognition models on the recognition sample Z in the table, the recognition result a appears 3 times, the recognition result B appears 2 times, the recognition result C appears 3 times, and the recognition result D appears 2 times. If the preset threshold is set to 4, the occurrence frequency of each recognition result is smaller than the preset threshold, which indicates that each first recognition model cannot correctly recognize the recognition sample Z, and therefore, the recognition sample Z needs to be audited to obtain the audit result of the recognition sample Z. On the other hand, if the audit result of the recognition sample Z is a, the first recognition model whose recognition result is a is considered to be able to correctly recognize the recognition sample Z, and the first recognition models (models 4 to 10) whose recognition results are B, C or D are not able to correctly recognize the recognition sample Z, so that the models 4 to 10 are determined as the target recognition models. On the other hand, if the audit result of the recognition sample Z is E and there is no E in the recognition results of the recognition sample Z by the 10 first recognition models, it indicates that no correct recognition can be performed on the recognition sample Z by any of the 10 first recognition models, and therefore all of the 10 first recognition models are determined as the target recognition model.

And step S104, performing annotation information audit on the annotated samples in the sub-sample set corresponding to the target identification model.

In this embodiment, the labeled samples in the sub-sample set corresponding to the target identification model determined in step S103 may be sent to the verification client, so that the verification client performs labeled information auditing on the received labeled samples.

In this embodiment, the training sample set is divided into a plurality of sub-sample sets, a plurality of first recognition models are obtained through training, then the first recognition model with low recognition accuracy is obtained as the target recognition model, only the labeled samples in the sub-sample set corresponding to the target recognition model are examined, and all the samples are not required to be examined, so that the examination efficiency can be improved.

In one implementation manner, the verification client is a client that audits the received labeled sample through a second recognition model established through pre-training, and the recognition accuracy of the second recognition model is higher than a certain threshold. For example, the identification accuracy of the second identification model exceeds 99% to ensure the auditing accuracy of the verification client for the labeling information of the labeled sample, the labeled sample can be automatically identified and verified by the verification client of the second identification model, and the second identification model and the first identification model are the same in type. Or, the verification client may also be a manual client, and performs manual review on the received labeled sample.

To sum up, in the embodiment, firstly, the labeled samples to be examined are obtained and form a training sample set, then the training sample set is divided into a preset number of sub-sample sets, respectively training different sub-sample sets, establishing different first recognition models, then obtaining recognition sample sets used for testing, respectively recognizing each recognition sample in the recognition sample sets through the established different first recognition models to obtain the recognition result of each first recognition model to the recognition sample, counting the occurrence frequency of each recognition result, when the recognition result with the occurrence frequency not less than the preset threshold exists, determining a first recognition model corresponding to the recognition result with the occurrence frequency less than the preset threshold as a target recognition model, and then performing annotation information auditing on the annotated samples in the subset sample set corresponding to the determined target identification model. Compared with the mode of manually auditing the labeling information of all samples in a sample set in the prior art, the embodiment can realize the quick auditing of the labeling information of the samples, and reduce the time and labor cost; meanwhile, the training sample set is divided into a plurality of sub-sample sets and a plurality of first recognition models are trained and established, and the method is particularly suitable for scenes in which the training sample set contains a large number of marked samples needing to be audited.

Corresponding to the embodiment of the method for auditing the sample annotation information, an embodiment of the present invention further provides an auditing apparatus for sample annotation information, and fig. 2 is a schematic structural diagram of an auditing apparatus for sample annotation information according to an embodiment of the present invention. Referring to fig. 2, an apparatus for auditing sample annotation information may include:

an obtaining module 201, configured to obtain labeled samples that need to be examined and form a training sample set; wherein, the labeling sample is labeled with labeling information in advance;

the training module 202 is configured to divide the training sample set into a preset number of sub-sample sets, train different sub-sample sets, and establish different first recognition models; the first recognition model is a neural network-based model;

the identification module 203 is configured to obtain an identification sample set used for testing, identify each identification sample in the identification sample set through different established first identification models, obtain an identification result of each first identification model for the identification sample, count the occurrence frequency of each identification result, and determine, as a target identification model, a first identification model corresponding to an identification result whose occurrence frequency is smaller than a preset threshold value when there is an identification result whose occurrence frequency is not smaller than the preset threshold value;

and the auditing module 204 is configured to perform annotation information auditing on the annotated samples in the sub-sample set corresponding to the target identification model.

Optionally, the identifying module 203 is further configured to:

Optionally, the preset number is greater than or equal to 3.

Preferably, the preset number is greater than or equal to 5.

Optionally, the identification module 203 acquires an identification sample set used for the test, specifically:

Optionally, the auditing module 204 performs annotation information auditing on the annotated samples in the sub-sample set corresponding to the target recognition model, specifically:

An embodiment of the present invention further provides an electronic device, and fig. 3 is a schematic structural diagram of the electronic device according to the embodiment of the present invention. Referring to fig. 3, an electronic device includes a processor 301, a communication interface 302, a memory 303 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 communicate with each other via the communication bus 304,

a memory 303 for storing a computer program;

the processor 301, when executing the program stored in the memory 303, implements the following steps:

For specific implementation and related explanation of each step of the method, reference may be made to the method embodiment shown in fig. 1, which is not described herein again.

Compared with the mode of manually auditing the labeling information of all samples in a sample set in the prior art, the embodiment can realize the quick auditing of the labeling information of the samples, and reduce the time and labor cost; meanwhile, the training sample set is divided into a plurality of sub-sample sets and a plurality of first recognition models are trained and established, and the method is particularly suitable for a scene that the training sample set contains a large number of marked samples needing to be audited.

In addition, other implementation manners of the method for auditing the sample labeling information, which are realized by the processor 301 executing the program stored in the memory 303, are the same as the implementation manners mentioned in the foregoing method embodiment section, and are not described herein again.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of the method for auditing the sample annotation information.

It should be noted that, in the present specification, all the embodiments are described in a related manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the computer-readable storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and for the relevant points, reference may be made to the partial description of the embodiments of the method.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art based on the above disclosure are within the scope of the appended claims.

Claims

1. A method for auditing sample labeling information is characterized by comprising the following steps:

2. The method for auditing sample labeling information according to claim 1, when there are a plurality of recognition results whose occurrence number is not less than a preset threshold, the method further comprising:

3. The method for auditing sample annotation information according to claim 1, when the number of occurrences of each recognition result is less than the preset threshold, the method further comprising:

4. The method for auditing of sample annotation information of claim 3, wherein after auditing the identified sample, the method further comprises:

5. An auditing method for sample annotation information according to claim 1, in which the predetermined number is 3 or more.

6. An auditing method for sample annotation information according to claim 5, in which the predetermined number is greater than or equal to 5.

7. The method for auditing of sample labeling information of claim 1, where dividing the training sample set into a preset number of sub-sample sets comprises:

8. The method for auditing sample labeling information according to claim 1, wherein the obtaining of the identified sample set for use as a test includes:

9. The method for auditing of sample annotation information according to claim 1, wherein the auditing of annotation information for annotated samples in a set of subsamples corresponding to the target recognition model comprises:

10. The method for auditing sample annotation information of claim 9, wherein the verification client is a client that audits a received annotated sample through a second recognition model established by pre-training, and the recognition accuracy of the second recognition model is higher than a certain threshold; or

11. An auditing device for sample labeling information, the device comprising:

12. An auditing apparatus for sample annotation information according to claim 11, wherein the identification module is further configured to:

13. An auditing apparatus for sample annotation information according to claim 11, wherein the identification module is further configured to:

14. An auditing apparatus for sample annotation information according to claim 13, wherein the identification module is further configured to:

15. An auditing device for sample annotation information according to claim 11, in which the predetermined number is 3 or more.

16. The apparatus for auditing of sample labeling information according to claim 11, wherein the identification module obtains an identification sample set used for testing, specifically:

17. The apparatus for auditing of sample annotation information according to claim 11, wherein the auditing module performs annotation information auditing for annotated samples in the sub-sample set corresponding to the target identification model, specifically:

18. The apparatus for auditing of sample annotation information of claim 17, wherein the verification client is a client that audits a received annotated sample through a second recognition model established by pre-training, and the recognition accuracy of the second recognition model is higher than a certain threshold; or

19. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-10.

20. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-10.